Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.3, 2.3.4, 2.4.5
Description
When we drop a partition of a external table and then overwrite it, if we set CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this partition.
But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate result.
Here is a reproduce code below(you can add it into SQLQuerySuite in hive module):
test("spark gives duplicate result when dropping a partition of an external partitioned table" + " firstly and they overwrite it") { withTable("test") { withTempDir { f => sql("create external table test(id int) partitioned by (name string) stored as " + s"parquet location '${f.getAbsolutePath}'") withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> false.toString) { sql("insert overwrite table test partition(name='n1') select 1") sql("ALTER TABLE test DROP PARTITION(name='n1')") sql("insert overwrite table test partition(name='n1') select 2") checkAnswer( sql("select id from test where name = 'n1' order by id"), Array(Row(1), Row(2))) } withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) { sql("insert overwrite table test partition(name='n1') select 1") sql("ALTER TABLE test DROP PARTITION(name='n1')") sql("insert overwrite table test partition(name='n1') select 2") checkAnswer( sql("select id from test where name = 'n1' order by id"), Array(Row(2))) } } } }
create external table test(id int) partitioned by (name string) stored as parquet location '/tmp/p'; set spark.sql.hive.convertMetastoreParquet=false; insert overwrite table test partition(name='n1') select 1; ALTER TABLE test DROP PARTITION(name='n1'); insert overwrite table test partition(name='n1') select 2; select id from test where name = 'n1' order by id;
Attachments
Issue Links
- is related to
-
HIVE-18702 INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
- Closed
- relates to
-
SPARK-25271 Creating parquet table with all the column null throws exception
- Resolved
- links to