Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
This happens in test, but it maybe also happen in production.
For example, leader is s3 and follower is s4.
1. kill s4, and restart s4.
2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
2. s4 start and set configuration from storage at setRaftConf(raftConf.getLogEntryIndex(), raftConf) and s4 will change to RUNNING at lifeCycle.transition(RUNNING)
2020-06-19 07:03:18,127 [pool-16-thread-1] INFO impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0 2020-06-19 07:03:18,153 [Thread-6194] INFO impl.RaftServerImpl (RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a follower, conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null 2020-06-19 07:03:18,153 [Thread-6194] INFO impl.RaftServerImpl (RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from null to FOLLOWER at term 1 for startAsFollower
3. s3 send append entry request to s4, and s4 change to RUNNING at lifeCycle.compareAndTransition(STARTING, RUNNING)
2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl (RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, s1:c0, s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, client-9414EC4E73DA, cid=3000
4. If change to RUNNING in step3 happens before step2, then step2 will throw exception.
2020-06-19 07:03:18,169 [Thread-6194] INFO impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s4: start FollowerState 2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty (ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to kill/restart server: s4 2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL TRANSITION: In s4, RUNNING -> RUNNING 2020-06-19T07:03:18.1918899Z at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63) 2020-06-19T07:03:18.1919240Z at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115) 2020-06-19T07:03:18.1919558Z at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155) 2020-06-19T07:03:18.1919878Z at org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214) 2020-06-19T07:03:18.1920206Z at org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186) 2020-06-19T07:03:18.1920520Z at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) 2020-06-19T07:03:18.1920839Z at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) 2020-06-19T07:03:18.1921330Z at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) 2020-06-19T07:03:18.1921639Z at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) 2020-06-19T07:03:18.1921951Z at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) 2020-06-19T07:03:18.1922261Z at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 2020-06-19T07:03:18.1922575Z at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) 2020-06-19T07:03:18.1922885Z at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) 2020-06-19T07:03:18.1925464Z at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) 2020-06-19T07:03:18.1940816Z at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) 2020-06-19T07:03:18.1953283Z at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) 2020-06-19T07:03:18.1967610Z at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) 2020-06-19T07:03:18.1980549Z at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650) 2020-06-19T07:03:18.1991620Z at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301) 2020-06-19T07:03:18.1991958Z at org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:312) 2020-06-19T07:03:18.1992275Z at org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:304) 2020-06-19T07:03:18.1992609Z at org.apache.ratis.RaftBasicTests.lambda$killAndRestartServer$2(RaftBasicTests.java:100) 2020-06-19T07:03:18.1992920Z at java.lang.Thread.run(Thread.java:748)
Attachments
Issue Links
- duplicates
-
RATIS-902 Failed UT: RaftSnapshotBaseTest.testBasicInstallSnapshot IllegalState
- Resolved
- links to