Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-944

Support more complete concurrency control when writing data

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.8.0, 0.9.0
    • None
    • None

    Description

      Now hudi just support write、compaction concurrency control. But some scenario need write concurrency control.Such as two spark job with different data source ,need to write to the same hudi table.

      I have two Proposal:

      1. first step :support write concurrency control on different partition
      but now when two client write data to different partition, will meet these error

      a、Rolling back commits failed

      b、instants version already exist

       [2020-05-25 21:20:34,732] INFO Checking for file exists ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)
       Exception in thread "main" org.apache.hudi.exception.HoodieIOException: Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean
       at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437)
       at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
       at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290)
       at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183)
       at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142)
       at org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
       at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
       

      c、two client's archiving conflict

      d、the read client meets "Unable to infer schema for Parquet. It must be specified manually.;"

      2. second step:support insert、upsert、compaction concurrency control on different isolation level such as Serializable、WriteSerializable.

      hudi can design a mechanism to check the confict in AbstractHoodieWriteClient.commit()

       

      Attachments

        Issue Links

          Activity

            People

              309637554 liwei
              309637554 liwei
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: