Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7759 Improve Ozone Replication Manager
  3. HDDS-8179

Datanode decommissioning blocked due to non-empty replica of deleting container

    XMLWordPrintableJSON

Details

    Description

      The Replication Manager is sending delete container command to a non-empty container due to HDDS-7775. The container is not deleted but the subsequent decommissioning calls to any of the DNs is not completing because the container is in under-replicated as well as unhealthy state.

      SCM.log:

      2023-03-14 21:53:26,413 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 15019, replicaIndex: 1, force: false] for container ContainerInfo{id=#15019, state=DELETING, pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 1ca038f8-c505-47ca-b701-d542b85bb75b
      2023-03-14 21:53:26,413 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 15019, replicaIndex: 5, force: false] for container ContainerInfo{id=#15019, state=DELETING, pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 1ac8e090-7eb7-4dab-93b7-97e4845f7b49
      
      2023-03-14 23:19:12,206 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 15019, replicaIndex: 3, force: false] for container ContainerInfo{id=#15019, state=DELETING, pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to c5c3948e-1296-4313-8c4e-9e6e50424280
      
      2023-03-14 23:19:53,296 INFO org.apache.hadoop.hdds.scm.node.NodeDecommissionManager: Starting Decommission for node c5c3948e-1296-4313-8c4e-9e6e50424280
      
      2023-03-14 23:22:38,512 INFO org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Under Replicated Container #15019 org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f; Replicas{
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=4},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=3},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5}}
      
      2023-03-14 23:22:38,512 INFO org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Unhealthy Container #15019 org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f; Replicas{
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=4},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=3},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=1},
      
      ContainerReplica{containerID=#15019, state=CLOSED, datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, bytesUsed=102400,replicaIndex=5}}
      
      2023-03-14 23:22:38,512 INFO org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: c5c3948e-1296-4313-8c4e-9e6e50424280 has 60 sufficientlyReplicated, 1 underReplicated and 1 unhealthy containers

      DN.log:

      2023-03-14 21:53:32,032 ERROR org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received container deletion command for container 15019 but the container is not empty with blockCount 1
      2023-03-14 21:53:32,035 ERROR org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler: Exception occurred while deleting the container.
      org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Non-force deletion of non-empty container is not allowed.
          at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1303)
          at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1160)
          at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
          at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handleInternal(DeleteContainerCommandHandler.java:108)
          at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:78)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:834)

      Attachments

        Issue Links

          Activity

            People

              siddhant Siddhant Sangwan
              varsha.ravi Varsha Ravi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: