Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7738

SCM terminates when adding container to a closed pipeline

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.4.0
    • SCM

    Description

      This is similar to HDDS-5843, but in a different scenario.

       

      An Ozone customer encountered this issue after a container (c1) is allocated with a newly created pipeline (p1). The chain of events is as follows:

      1. SCM processes pipeline creation transaction p1 => p1 is created.
      2. SCM received a request to close p1 from a data node (see the previous comment)
        => p1 is closed.
        => SCM also tried to find and close relevant containers, at this point, container c1 doesn't exist yet, so it can't be closed.
      3. SCM processes the container c1 allocation transaction => failed because p1 is closed already.
        => SCM terminates and both transactions #1 and #3 are not committed (as Ratis commits transactions in chunks).

      Because the transactions are not committed, whenever SCM restarts, it got through the same step #1 and #3 and terminates again.

      Solution: SCM should not terminate when adding a container with a closed pipeline. The fix is similar to HDDS-5843.

       

      Stacktrace:

      2022-12-28 11:53:20,465  ERROR org.apache.ratis.statemachine.StateMachine: Terminating with exit status 1: Cannot add container to pipeline=PipelineID=7f97fc6a-c31a-4978-be3b-e38af7cd023f in closed state
      java.io.IOException: Cannot add container to pipeline=PipelineID=7f97fc6a-c31a-4978-be3b-e38af7cd023f in closed state
      	at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110)
      	at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl.addContainerToPipeline(PipelineStateManagerImpl.java:114)
      	at jdk.internal.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
      	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
      	at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeLocal(SCMHAInvocationHandler.java:87)
      	at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:72)
      	at com.sun.proxy.$Proxy17.addContainerToPipeline(Unknown Source)
      	at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.addContainerToPipeline(PipelineManagerImpl.java:327)
      	at org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.lambda$addContainer$1(ContainerStateManagerImpl.java:309)
      	at org.apache.hadoop.hdds.scm.ha.ExecutionUtil.execute(ExecutionUtil.java:59)
      	at org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.addContainer(ContainerStateManagerImpl.java:321)
      	at jdk.internal.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
      	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
      	at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:168)
      	at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:139)
      	at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1588)
      	at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
      	at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
      	at java.base/java.lang.Thread.run(Thread.java:829)

      Attachments

        Issue Links

          Activity

            People

              duongnguyen Duong
              duongnguyen Duong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: