Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The Replication Manager is sending delete container command to a non-empty container due to HDDS-7775. The container is not deleted but the subsequent decommissioning calls to any of the DNs is not completing because the container is in under-replicated as well as unhealthy state.
SCM.log:
2023-03-14 21:53:26,413 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 15019, replicaIndex: 1, force: false] for container ContainerInfo{id=#15019, state=DELETING, pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 1ca038f8-c505-47ca-b701-d542b85bb75b 2023-03-14 21:53:26,413 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 15019, replicaIndex: 5, force: false] for container ContainerInfo{id=#15019, state=DELETING, pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 1ac8e090-7eb7-4dab-93b7-97e4845f7b49 2023-03-14 23:19:12,206 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 15019, replicaIndex: 3, force: false] for container ContainerInfo{id=#15019, state=DELETING, pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to c5c3948e-1296-4313-8c4e-9e6e50424280 2023-03-14 23:19:53,296 INFO org.apache.hadoop.hdds.scm.node.NodeDecommissionManager: Starting Decommission for node c5c3948e-1296-4313-8c4e-9e6e50424280 2023-03-14 23:22:38,512 INFO org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Under Replicated Container #15019 org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f; Replicas{ ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=4}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=3}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5}} 2023-03-14 23:22:38,512 INFO org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Unhealthy Container #15019 org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f; Replicas{ ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=4}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=3}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1}, ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5}} 2023-03-14 23:22:38,512 INFO org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: c5c3948e-1296-4313-8c4e-9e6e50424280 has 60 sufficientlyReplicated, 1 underReplicated and 1 unhealthy containers
DN.log:
2023-03-14 21:53:32,032 ERROR org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received container deletion command for container 15019 but the container is not empty with blockCount 1 2023-03-14 21:53:32,035 ERROR org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler: Exception occurred while deleting the container. org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Non-force deletion of non-empty container is not allowed. at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1303) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1160) at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182) at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handleInternal(DeleteContainerCommandHandler.java:108) at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:78) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
Attachments
Issue Links
- relates to
-
HDDS-7775 EC: Exception encountered while deleting UNHEALTHY replica in Datanode
- Resolved
- links to