Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Both the HMasters and all RS failed with same "OMException: Unable to allocate a container to the block" error approximately at the same time.
Logs from HMaster:
2024-04-10 18:24:23,197 ERROR org.apache.hadoop.hbase.master.HMaster: ***** ABORTING master Master-1,22001,1712638569318: IOE in log roller *****
INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Unable to allocate a container to the block of size: 268435456, replicationConfig: RATIS/THREE. Waiting for one of pipelines to be OPEN failed. Pipeline f1362ba6-ee67-48a9-bdb7-ac80e8d55435,3c2d89bc-935b-424f-8c90-6dcc74933640,040a71f0-fa7d-43ff-baed-37ae3ee87c63,31caf9ea-c145-4d37-91ef-456088158b99,37b5a056-55e0-485a-ad35-53ef27069e39 is not ready in 60000 ms
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:756)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2293)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2281)
at org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2115)
at org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:855)
at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:400)
at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:304)
at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createNonRecursive(BasicRootedOzoneFileSystem.java:280)
at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1382)
at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1360)
at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:63)
at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190)
at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
2024-04-10 18:24:23,200 INFO org.apache.ranger.plugin.util.PolicyRefresher: PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208)
Checked the SCM Leader logs, showing WARN logs like below:
2024-04-10 18:23:22,106 WARN [IPC Server handler 3 on 9863]-org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider: Pipeline creation failed for repConfig RATIS/THREE Datanodes may be used up. Try to see if any pipeline is in ALLOCATED state, and then will wait for it to be OPEN org.apache.hadoop.hdds.scm.exceptions.SCMException: Pipeline creation failed due to no sufficient healthy datanodes. Required 3. Found 1. Excluded 7. at org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.filterViableNodes(PipelinePlacementPolicy.java:167) at org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.chooseDatanodesInternal(PipelinePlacementPolicy.java:256) at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:209) at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:140) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:176) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:56) at org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:89) at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:255) at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:241) at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:100) at org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74) at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:163) at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:206) at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:198) at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:144) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89) at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:115) at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:15752) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
Attachments
Issue Links
- duplicates
-
HDDS-10749 Shutdown datanode when RatisServer is down
- Resolved
- is fixed by
-
HDDS-10749 Shutdown datanode when RatisServer is down
- Resolved