Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-11012

[Hbase-Ozone] HMaster down with NO_REPLICA_FOUND causing "CorruptHFileException: Problem reading HFile Trailer"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • None
    • None
    • SCM
    • None

    Description

      Both the HMasters are abruptly down with IllegalArgumentException: NO_REPLICA_FOUND.
      causing "CorruptHFileException: Problem reading HFile Trailer from file"

      Stack Trace:

      2024-06-13 02:57:51,744 ERROR org.apache.hadoop.hbase.master.HMaster: Failed to become active master
      java.io.IOException: java.io.IOException: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file ofs://ozone1717496222/volhbase-new07062024/buckethbase-1717572506/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/91207977e6d74ba2ba6a564570832563
              at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1144)
              at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1087)
              at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:990)
              at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:940)
              at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7904)
              at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7861)
              at org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:307)
              at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:424)
              at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:122)
              at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
              at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2216)
              at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.IOException: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file ofs://ozone1717496222/volhbase-new07062024/buckethbase-1717572506/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/91207977e6d74ba2ba6a564570832563
              at org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:284)
              at org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:334)
              at org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:306)
              at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6365)
              at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1110)
              at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1107)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              ... 1 more
      Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file ofs://ozone1717496222/volhbase-new07062024/buckethbase-1717572506/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/91207977e6d74ba2ba6a564570832563
              at org.apache.hadoop.hbase.io.hfile.HFileInfo.initTrailerAndContext(HFileInfo.java:349)
              at org.apache.hadoop.hbase.io.hfile.HFileInfo.<init>(HFileInfo.java:123)
              at org.apache.hadoop.hbase.regionserver.StoreFileInfo.initHFileInfo(StoreFileInfo.java:706)
              at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:364)
              at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:485)
              at org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:224)
              at org.apache.hadoop.hbase.regionserver.StoreEngine.lambda$openStoreFiles$0(StoreEngine.java:262)
              ... 6 more
      Caused by: java.lang.IllegalArgumentException: NO_REPLICA_FOUND
              at org.apache.hadoop.ozone.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
              at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:180)
              at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:161)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.acquireClient(BlockInputStream.java:342)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getBlockData(BlockInputStream.java:258)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:164)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:370)
              at org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
              at org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
              at org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:96)
              at org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
              at org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:81)
              at java.io.DataInputStream.readFully(DataInputStream.java:195)
              at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:394)
              at org.apache.hadoop.hbase.io.hfile.HFileInfo.initTrailerAndContext(HFileInfo.java:339)
              ... 12 more
      2024-06-13 02:57:51,745 ERROR org.apache.hadoop.hbase.master.HMaster: ***** ABORTING master vc0121.xyz.com,22001,1718272586518: Unhandled exception. Starting shutdown. ***** 

      cc: ashishk Sammi weichiu 

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pratyush.bhatt Pratyush Bhatt
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: