Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.5.1, 1.5.2
-
None
-
EMR (yarn)
Description
I get strange behaviour when reading parquet data:
scala> val data = sqlContext.read.parquet("hdfs:///sample") data: org.apache.spark.sql.DataFrame = [clusterSize: int, clusterName: string, clusterData: array<string>, dpid: int] scala> data.take(1) /// this returns garbage res0: Array[org.apache.spark.sql.Row] = Array([1,56169A947F000101????????,WrappedArray(164594606101815510825479776971????????),813]) scala> data.collect() /// this works res1: Array[org.apache.spark.sql.Row] = Array([1,6A01CACD56169A947F000101,WrappedArray(77512098164594606101815510825479776971),813])
I've attached the "hdfs:///sample" directory to this bug report
Attachments
Attachments
Issue Links
- is duplicated by
-
SPARK-11737 String may not be serialized correctly with Kyro
- Resolved
-
SPARK-11331 Kryo serializer broken with StringTypes
- Resolved