Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
Description
Reading Hudi Table using TimestampBasedKeyGenerator and date format 'yyyy-MM-dd' giving Exception
Exception
```
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
at org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:72)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:245)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:264)
at org.apache.spark.sql.execution.datasources.parquet.Spark32LegacyHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32LegacyHoodieParquetFileFormat.scala:314)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
```
Code-
```
columns = ["ts","uuid","rider","driver","fare","dt"]
data =[(1695159649087,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"2012-01-01"),
(1695091554788,"e96c4396-3fad-413a-a942-4cb36106d721","rider-B","driver-L",27.70 ,"2012-01-01"),
(1695046462179,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-C","driver-M",33.90 ,"2012-01-01"),
(1695516137016,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-C","driver-N",34.15,"2012-01-01")]
inserts = spark.createDataFrame(data).toDF(*columns)
hudi_options =
{ 'hoodie.table.name': tableName, 'hoodie.datasource.write.recordkey.field' : 'uuid', 'hoodie.datasource.write.precombine.field' : 'ts', 'hoodie.datasource.write.partitionpath.field': 'dt', 'hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled' : 'true', 'hoodie.datasource.write.keygenerator.class' : 'org.apache.hudi.keygen.TimestampBasedKeyGenerator', 'hoodie.keygen.timebased.timestamp.type' : 'SCALAR', 'hoodie.keygen.timebased.timestamp.scalar.time.unit' : 'DAYS', 'hoodie.keygen.timebased.input.dateformat' : 'yyyy-MM-dd', 'hoodie.keygen.timebased.output.dateformat' : 'yyyy-MM-dd', 'hoodie.keygen.timebased.timezone' : 'GMT+8:00', 'hoodie.datasource.write.hive_style_partitioning' : 'true', }- Insert data
inserts.withColumn("dt", expr("CAST(dt as date)")).write.format("hudi"). \
options(**hudi_options). \
mode("overwrite"). \
save(basePath)
deleteDF=spark.read.format("hudi").load(basePath)
deleteDF.show()
```