Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
This is similar to HDDS-5843, but in a different scenario.
An Ozone customer encountered this issue after a container (c1) is allocated with a newly created pipeline (p1). The chain of events is as follows:
- SCM processes pipeline creation transaction p1 => p1 is created.
- SCM received a request to close p1 from a data node (see the previous comment)
=> p1 is closed.
=> SCM also tried to find and close relevant containers, at this point, container c1 doesn't exist yet, so it can't be closed. - SCM processes the container c1 allocation transaction => failed because p1 is closed already.
=> SCM terminates and both transactions #1 and #3 are not committed (as Ratis commits transactions in chunks).
Because the transactions are not committed, whenever SCM restarts, it got through the same step #1 and #3 and terminates again.
Solution: SCM should not terminate when adding a container with a closed pipeline. The fix is similar to HDDS-5843.
Stacktrace:
2022-12-28 11:53:20,465 ERROR org.apache.ratis.statemachine.StateMachine: Terminating with exit status 1: Cannot add container to pipeline=PipelineID=7f97fc6a-c31a-4978-be3b-e38af7cd023f in closed state java.io.IOException: Cannot add container to pipeline=PipelineID=7f97fc6a-c31a-4978-be3b-e38af7cd023f in closed state at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110) at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl.addContainerToPipeline(PipelineStateManagerImpl.java:114) at jdk.internal.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeLocal(SCMHAInvocationHandler.java:87) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:72) at com.sun.proxy.$Proxy17.addContainerToPipeline(Unknown Source) at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.addContainerToPipeline(PipelineManagerImpl.java:327) at org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.lambda$addContainer$1(ContainerStateManagerImpl.java:309) at org.apache.hadoop.hdds.scm.ha.ExecutionUtil.execute(ExecutionUtil.java:59) at org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.addContainer(ContainerStateManagerImpl.java:321) at jdk.internal.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:168) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:139) at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1588) at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182) at java.base/java.lang.Thread.run(Thread.java:829)
Attachments
Issue Links
- relates to
-
HDDS-5843 SCM terminates when adding container to a pipeline during startup
- Resolved
- links to