Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18772

Make Acid Cleaner use MIN_HISTORY_LEVEL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • 4.0.0-alpha-2
    • Transactions
    • None
    • n/a

    Description

      Instead of using Lock Manager state as it currently does.
      This will eliminate possible race conditions

      See this comment

      Suppose A is the set of all ValidTxnList across all active readers. Each ValidTxnList has minOpenTxnId.
      MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all currently active readers

      This means that no active transaction in the system sees any txn with txnid < X as open.
      This means if construct ValidTxnIdList with HWM=X-1 and use that in getAcidState(), any files determined by this call as 'obsolete', will be seen as obsolete by any existing/future reader, i.e. can be physically deleted.

      This is also necessary for multi-statement transactions where relying on the state of Lock Manager is not sufficient. For example

      Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
      13 commits (via it's parent txn) at t2 > t1. (17 is still running).
      Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on Table1/Part1 (17 is still running)
      Now delta_13 may be cleaned since it can be seen as obsolete and there may be no locks on it, i.e. no one is reading it.
      Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot use base_14 is that may have absorbed delete events from delete_delta_14.

      Another Use Case
      There is delta_1_1 and delta_2_2 on disk both created by committed txns.
      T5 starts reading these. At the same time compactor creates delta_1_2.
      Now Cleaner sees delta_1_1 and delta_1_2 as obsolete and may remove them while the read is still in progress. This is because Compactor itself is not running in a txn and the files that
      it produces are visible immediately. If it ran in a txn, the new files would only be visible once
      this txn is visible to others (including the Cleaner).

      Using MIN_HISTORY_LEVEL solves this.

      See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL

      Attachments

        1. HIVE-18772.04.patch
          60 kB
          Eugene Koifman
        2. HIVE-18772.03.patch
          67 kB
          Eugene Koifman
        3. HIVE-18772.02.patch
          27 kB
          Eugene Koifman
        4. HIVE-18772.02.patch
          37 kB
          Eugene Koifman
        5. HIVE-18772.01.patch
          20 kB
          Eugene Koifman

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: