Description
Now hudi just support write、compaction concurrency control. But some scenario need write concurrency control.Such as two spark job with different data source ,need to write to the same hudi table.
I have two Proposal:
1. first step :support write concurrency control on different partition
but now when two client write data to different partition, will meet these error
a、Rolling back commits failed
b、instants version already exist
[2020-05-25 21:20:34,732] INFO Checking for file exists ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight (org.apache.hudi.common.table.timeline.HoodieActiveTimeline) Exception in thread "main" org.apache.hudi.exception.HoodieIOException: Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290) at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183) at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142) at org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
c、two client's archiving conflict
d、the read client meets "Unable to infer schema for Parquet. It must be specified manually.;"
2. second step:support insert、upsert、compaction concurrency control on different isolation level such as Serializable、WriteSerializable.
hudi can design a mechanism to check the confict in AbstractHoodieWriteClient.commit()
Attachments
Issue Links
- is a child of
-
HUDI-1456 [UMBRELLA] Concurrency Control for Hudi writers and table services
- Open
- is depended upon by
-
HUDI-648 Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes
- Closed
- is related to
-
HUDI-839 Implement rollbacks using marker files instead of relying on commit metadata
- Closed