Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
M5
-
None
-
None
Description
Saw this on Jenkins, never on my machine. Happened on ASAN. This was the relevant section of the test log:
W0517 15:16:08.446135 24076 consensus_peers.cc:196] T 8f3a4dbe88634217b568b2cef0fcf356 P 3c1f0f8d1ec347f89cb10ab169721b4b: Couldn't send request to peer 186634713bf9465d92948c5a44675e73 for tablet 8f3a4dbe88634217b568b2cef0fcf356 Status: Illegal state: Tablet not RUNNING: FAILED: Corruption: Failed log replay. Reason: Debug Info: Error playing entry 1362 of segment 1 of tablet 8f3a4dbe88634217b568b2cef0fcf356. Segment path: /data1/test-tmp/raft_consensus-itest.RaftConsensusITest.MultiThreadedInsertWithFailovers.1431900679989151-32223/raft_consensus-itest-cluster/ts-0/wals/8f3a4dbe88634217b568b2cef0fcf356.recovery/wal-000000001. Entry: type: REPLICATE replicate { id { term: 5 index: 680 } timestamp: 680 op_type: WRITE_OP write_request { tablet_id: "8f3a4dbe88634217b568b2cef0fcf356" schema { columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "int_val" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "string_val" type: STRING is_key: false is_nullable: true encoding: AUT...: Unexpected opid following opid term: 5 index: 685. Operation: 5,680 REPLICATE (Type: WRITE_OP). Retrying in the next heartbeat period. Already tried 137 times. W0517 15:16:08.946322 24076 consensus_peers.cc:196] T 8f3a4dbe88634217b568b2cef0fcf356 P 3c1f0f8d1ec347f89cb10ab169721b4b: Couldn't send request to peer 186634713bf9465d92948c5a44675e73 for tablet 8f3a4dbe88634217b568b2cef0fcf356 Status: Illegal state: Tablet not RUNNING: FAILED: Corruption: Failed log replay. Reason: Debug Info: Error playing entry 1362 of segment 1 of tablet 8f3a4dbe88634217b568b2cef0fcf356. Segment path: /data1/test-tmp/raft_consensus-itest.RaftConsensusITest.MultiThreadedInsertWithFailovers.1431900679989151-32223/raft_consensus-itest-cluster/ts-0/wals/8f3a4dbe88634217b568b2cef0fcf356.recovery/wal-000000001. Entry: type: REPLICATE replicate { id { term: 5 index: 680 } timestamp: 680 op_type: WRITE_OP write_request { tablet_id: "8f3a4dbe88634217b568b2cef0fcf356" schema { columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "int_val" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "string_val" type: STRING is_key: false is_nullable: true encoding: AUT...: Unexpected opid following opid term: 5 index: 685. Operation: 5,680 REPLICATE (Type: WRITE_OP). Retrying in the next heartbeat period. Already tried 138 times. /data1/jenkins-workspace/kudu-test/BUILD_TYPE/ASAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/cluster_verifier.cc:51: Failure Failed Bad status: Aborted: 1 errors were detected
This is specifically the error:
Unexpected opid following opid term: 5 index: 685. Operation: 5,680 REPLICATE (Type: WRITE_OP).
Link: http://a1228.halxg.cloudera.com:8080/diagnose?key=87d4d9ae-fce2-11e4-ab1d-28924ad1fba8
Attachments
Issue Links
- duplicates
-
KUDU-783 Follower fails to bootstrap after a tumultuous leader switch
- Resolved