Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-647

Null Pointer Exception in Hive upon reading Parquet

    XMLWordPrintableJSON

Details

    Description

      When I write Parquet files from Spark Job, and try to read it in Hive as an External Table , I get Null Pointer Exception. After further analysis , I found I had some Null values in my transformation(used Dataset and DataFrame API's) before saving to parquet. These 2 fields which contains NULL are float data types. When I removed these two columns from the parquet datasets, I was able to read it in hive. Contrastingly , with all NULL columns I was able to read it Hive when I write my job to ORC format.
      When a datatype is anything other than String , which is completely empty(NULL) written in parquet is not been able to read by Hive and throws NP Exception.

      Attachments

        1. Screen Shot 2016-06-24 at 11.03.56 AM.png
          136 kB
          Mahadevan Sudarsanan
        2. Screen Shot 2016-06-24 at 11.02.50 AM.png
          71 kB
          Mahadevan Sudarsanan
        3. Screen Shot 2016-06-24 at 11.01.55 AM.png
          22 kB
          Mahadevan Sudarsanan
        4. Screen Shot 2016-06-24 at 11.01.46 AM.png
          903 kB
          Mahadevan Sudarsanan

        Activity

          People

            Unassigned Unassigned
            msudarsanan Mahadevan Sudarsanan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: