Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11657

Bad Dataframe data read from parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.5.1, 1.5.2
    • 1.5.3, 1.6.0
    • Spark Core, SQL
    • None
    • EMR (yarn)

    Description

      I get strange behaviour when reading parquet data:

      scala> val data = sqlContext.read.parquet("hdfs:///sample")
      data: org.apache.spark.sql.DataFrame = [clusterSize: int, clusterName: string, clusterData: array<string>, dpid: int]
      scala> data.take(1)    /// this returns garbage
      res0: Array[org.apache.spark.sql.Row] = Array([1,56169A947F000101????????,WrappedArray(164594606101815510825479776971????????),813]) 
      scala> data.collect()    /// this works
      res1: Array[org.apache.spark.sql.Row] = Array([1,6A01CACD56169A947F000101,WrappedArray(77512098164594606101815510825479776971),813])
      

      I've attached the "hdfs:///sample" directory to this bug report

      Attachments

        1. sample.tgz
          1.0 kB
          Virgil Palanciuc

        Issue Links

          Activity

            People

              davies Davies Liu
              virgilp Virgil Palanciuc
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: