Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10689

[HBase Ozone] All HBase HMasters/RS down with "OMException: Unable to allocate a container to the block"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • SCM
    • None

    Description

      Both the HMasters and all RS failed with same "OMException: Unable to allocate a container to the block" error approximately at the same time.

      Logs from HMaster:

      2024-04-10 18:24:23,197 ERROR org.apache.hadoop.hbase.master.HMaster: ***** ABORTING master Master-1,22001,1712638569318: IOE in log roller *****
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Unable to allocate a container to the block of size: 268435456, replicationConfig: RATIS/THREE. Waiting for one of pipelines to be OPEN failed. Pipeline f1362ba6-ee67-48a9-bdb7-ac80e8d55435,3c2d89bc-935b-424f-8c90-6dcc74933640,040a71f0-fa7d-43ff-baed-37ae3ee87c63,31caf9ea-c145-4d37-91ef-456088158b99,37b5a056-55e0-485a-ad35-53ef27069e39 is not ready in 60000 ms
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:756)
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2293)
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2281)
              at org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2115)
              at org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:855)
              at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:400)
              at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:304)
              at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createNonRecursive(BasicRootedOzoneFileSystem.java:280)
              at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1382)
              at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1360)
              at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:63)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190)
              at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160)
              at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
              at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
              at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
              at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
      2024-04-10 18:24:23,200 INFO org.apache.ranger.plugin.util.PolicyRefresher: PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
      java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
              at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
              at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208) 

      Checked the SCM Leader logs, showing WARN logs like below:

      2024-04-10 18:23:22,106 WARN [IPC Server handler 3 on 9863]-org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider: Pipeline creation failed for repConfig RATIS/THREE Datanodes may be used up. Try to see if any pipeline is in ALLOCATED state, and then will wait for it to be OPEN
      org.apache.hadoop.hdds.scm.exceptions.SCMException: Pipeline creation failed due to no sufficient healthy datanodes. Required 3. Found 1. Excluded 7.
              at org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.filterViableNodes(PipelinePlacementPolicy.java:167)
              at org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.chooseDatanodesInternal(PipelinePlacementPolicy.java:256)
              at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:209)
              at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:140)
              at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:176)
              at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:56)
              at org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:89)
              at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:255)
              at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:241)
              at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:100)
              at org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74)
              at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:163)
              at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:206)
              at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:198)
              at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:144)
              at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
              at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:115)
              at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:15752)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899) 

      Attachments

        Issue Links

          Activity

            People

              Sammi Sammi Chen
              pratyush.bhatt Pratyush Bhatt
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: