Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Overview:
We found hoodie.properties will keep the empty preCombineKey if the table does not have preCombineKey. And the empty preCombineKey will cause the exception when insert data:
Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found in record. Acceptable fields were :[id, name, price] at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Steps to Reproduce:
-- 1. create a table without preCombineKey CREATE TABLE default.test_hudi_default_cm ( uuid int, name string, price double ) USING hudi options ( primaryKey='uuid'); -- 2. config write operation to insert set hoodie.datasource.write.operation=insert; set hoodie.merge.allow.duplicate.on.inserts=true; -- 3. insert data insert into default.test_hudi_default_cm select 1, 'name1', 1.1; -- 4. insert overwrite insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1; -- 5. insert data will occur exception insert into default.test_hudi_default_cm select 1, 'name3', 1.1;
Root Cause:
Hudi re-construct the table when insert overwrite table in sql but the configured operation is not, then it stores the default empty preCombineKey in hoodie.properties.
Attachments
Issue Links
- links to