Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
Description
DPP has data issue when pruning on non-atomic type. for example:
spark.range(1000) .select(col("id"), col("id").as("k")) .write .partitionBy("k") .format("parquet") .mode("overwrite") .saveAsTable("df1"); spark.range(100) .select(col("id"), col("id").as("k")) .write .partitionBy("k") .format("parquet") .mode("overwrite") .saveAsTable("df2") spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio=2") spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=false") spark.sql("SELECT df1.id, df2.k FROM df1 JOIN df2 ON struct(df1.k) = struct(df2.k) AND df2.id < 2").show
It should return two records, but it returns empty.