Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2400

Allow timeline server correctly sync when concurrent write to timeline

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • compaction

    Description

      Firstly, assume HUDI-1847 is available and we can have an ingestion spark job and a compaction job running at the same time.
      Assume we have a timestamp for each HoodieTimeLine object which represent the time it generated from hdfs.
      Considering following case,
      1. ingestion schedule compaction inline. Now we have a timeline: 1.deltaCommit.Completed, 2.Compaction.Requested (TimeStamp: 1L)
      2. Then ingestion keep move on. We now have 1.deltaCommit.Completed, 2.Compaction.Requested 3.deltaCommit.Inflight (TimeStamp: 2L) in ingestion job.
      3. We have an independent Spark job run compaction 2. We now have 1.deltaCommit.Completed, 2.Compaction.Inflight 3.deltaCommit.Inflight (TimeStamp: 3L)
      4. Executors in ingestion job send request to timeline server, now they hold timeline with TimeStamp 2L. But Timeline Server have timestamp 3L which is later than client.

      According to the logic in https://github.com/apache/hudi/blob/master/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java#L137,
      we thought local view of table's timeline is behind that of client's view as long as the timeline hashes are different. However this may not be true in the case mentioned above.
      Here the hashes are different because client view is behind local view.

      A simple solution is to add an attribute to timeline which is the timestamp we used above.
      And timeline server may determine whether to sync fileSystemView by comparing timestamps between client and local rather than the difference between timeline hashes.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              guanziyue ZiyueGuan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0.5h
                  0.5h
                  Remaining:
                  Remaining Estimate - 0.5h
                  0.5h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified