Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.4.0, 3.0.0
-
None
-
None
Description
from_avro function produces wrong output of a struct field. See the output at the bottom of the description
import org.apache.spark.sql.types._ import org.apache.spark.sql.avro._ import org.apache.spark.sql.functions._ spark.version val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25), (3, "Josh Duke", 50)).toDF("id", "name", "age") val dfStruct = df.withColumn("value", struct("name","age")) dfStruct.show dfStruct.printSchema val dfKV = dfStruct.select(to_avro('id).as("key"), to_avro('value).as("value")) val expectedSchema = StructType(Seq(StructField("name", StringType, true),StructField("age", IntegerType, false))) val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString val avroTypeStr = s""" |{ | "type": "int", | "name": "key" |} """.stripMargin dfKV.select(from_avro('key, avroTypeStr)).show dfKV.select(from_avro('value, avroTypeStruct)).show // output for the last statement and that is not correct +---------------------------------------------+ |from_avro(value, struct<name:string,age:int>)| +---------------------------------------------+ | [Josh Duke, 50]| | [Josh Duke, 50]| | [Josh Duke, 50]| +---------------------------------------------+
Attachments
Issue Links
- duplicates
-
SPARK-27798 ConvertToLocalRelation should tolerate expression reusing output object
- Resolved