Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20322

CME in StoreScanner causes region server crash

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.2
    • 1.3.3, 1.4.4
    • None
    • None
    • Reviewed

    Description

      RS crashed with ConcurrentModificationException on our 1.3 cluster, stack trace below. toffer and I checked and there is a race condition between flush and scanner close. When StoreScanner.updateReaders() is updating the scanners after a newly flushed file (in this trace below a region close during a split), the client's scanner could be closing thus causing CME.

      Its rare, but since it crashes the region server, needs to be fixed.

      FATAL regionserver.HRegionServer [regionserver/<rs>] : ABORTING region server <rs>: Replay of WAL required. Forcing server shutdown
      org.apache.hadoop.hbase.DroppedSnapshotException: region: <regionname>
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
      at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
      at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
      at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
      at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
      at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
      at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
      at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.util.ConcurrentModificationException
      at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
      at java.util.ArrayList$Itr.next(ArrayList.java:851)
      at org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
      at org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
      at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)

      PS: ignore the line no in the above stack trace, method calls should help understand whats happening.

      Attachments

        1. HBASE-20322.branch-1.3.001.patch
          3 kB
          Thiruvel Thirumoolan
        2. HBASE-20322.branch-1.3.002.patch
          3 kB
          Thiruvel Thirumoolan
        3. HBASE-20322.branch-1.3.002-addendum.patch
          0.8 kB
          Thiruvel Thirumoolan
        4. HBASE-20322.branch-1.4.001.patch
          3 kB
          Thiruvel Thirumoolan

        Issue Links

          Activity

            People

              thiruvel Thiruvel Thirumoolan
              thiruvel Thiruvel Thirumoolan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: