Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
spark.sql( s"""insert into $tableName values |(5, 'a', 35, 1000, '2021-01-05'), |(1, 'a', 31, 1000, '2021-01-05'), |(3, 'a', 33, 1000, '2021-01-05'), |(4, 'b', 16, 1000, '2021-01-05'), |(2, 'b', 18, 1000, '2021-01-05'), |(6, 'b', 17, 1000, '2021-01-05'), |(8, 'a', 21, 1000, '2021-01-05'), |(9, 'a', 22, 1000, '2021-01-05'), |(7, 'a', 23, 1000, '2021-01-05') |""".stripMargin) // Insert overwrite static partition spark.sql( s""" | insert overwrite table $tableName partition(dt = '2021-01-05') | select * from (select 13 , 'a2', 12, 1000) limit 10 """.stripMargin) spark.sql( s""" | insert into $tableName values | (5, 'a3', 35, 1000, '2021-01-05'), | (3, 'a3', 33, 1000, '2021-01-05') """.stripMargin)
After running the above case, we expect the result of the snapshot would be (13, "a3", 12.0, 1000, "2021-01-05"), (5, "a3", 35, 1000, "2021-01-05"), (3, "a3", 33, 1000, "2021-01-05").
But the actual result is (13,a2,12.0,1000,2021-01-05).
The root cause is that after running insert overwrite into a consistent bucket index, the file groups in consistent_hashing_metadata does not match file groups on storage any more.