Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-780

Occasional out-of-order WAL entry in RaftConsensusITest.MultiThreadedInsertWithFailovers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • M5
    • None
    • consensus
    • None

    Description

      Saw this on Jenkins, never on my machine. Happened on ASAN. This was the relevant section of the test log:

      W0517 15:16:08.446135 24076 consensus_peers.cc:196] T 8f3a4dbe88634217b568b2cef0fcf356 P 3c1f0f8d1ec347f89cb10ab169721b4b: Couldn't send request to peer 186634713bf9465d92948c5a44675e73 for tablet 8f3a4dbe88634217b568b2cef0fcf356 Status: Illegal state: Tablet not RUNNING: FAILED: Corruption: Failed log replay. Reason: Debug Info: Error playing entry 1362 of segment 1 of tablet 8f3a4dbe88634217b568b2cef0fcf356. Segment path: /data1/test-tmp/raft_consensus-itest.RaftConsensusITest.MultiThreadedInsertWithFailovers.1431900679989151-32223/raft_consensus-itest-cluster/ts-0/wals/8f3a4dbe88634217b568b2cef0fcf356.recovery/wal-000000001. Entry: type: REPLICATE replicate { id { term: 5 index: 680 } timestamp: 680 op_type: WRITE_OP write_request { tablet_id: "8f3a4dbe88634217b568b2cef0fcf356" schema { columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "int_val" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "string_val" type: STRING is_key: false is_nullable: true encoding: AUT...: Unexpected opid following opid term: 5 index: 685. Operation: 5,680 REPLICATE (Type: WRITE_OP). Retrying in the next heartbeat period. Already tried 137 times.
      W0517 15:16:08.946322 24076 consensus_peers.cc:196] T 8f3a4dbe88634217b568b2cef0fcf356 P 3c1f0f8d1ec347f89cb10ab169721b4b: Couldn't send request to peer 186634713bf9465d92948c5a44675e73 for tablet 8f3a4dbe88634217b568b2cef0fcf356 Status: Illegal state: Tablet not RUNNING: FAILED: Corruption: Failed log replay. Reason: Debug Info: Error playing entry 1362 of segment 1 of tablet 8f3a4dbe88634217b568b2cef0fcf356. Segment path: /data1/test-tmp/raft_consensus-itest.RaftConsensusITest.MultiThreadedInsertWithFailovers.1431900679989151-32223/raft_consensus-itest-cluster/ts-0/wals/8f3a4dbe88634217b568b2cef0fcf356.recovery/wal-000000001. Entry: type: REPLICATE replicate { id { term: 5 index: 680 } timestamp: 680 op_type: WRITE_OP write_request { tablet_id: "8f3a4dbe88634217b568b2cef0fcf356" schema { columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "int_val" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION } columns { name: "string_val" type: STRING is_key: false is_nullable: true encoding: AUT...: Unexpected opid following opid term: 5 index: 685. Operation: 5,680 REPLICATE (Type: WRITE_OP). Retrying in the next heartbeat period. Already tried 138 times.
      /data1/jenkins-workspace/kudu-test/BUILD_TYPE/ASAN/label/kudu-gerrit-slaves/src/kudu/integration-tests/cluster_verifier.cc:51: Failure
      Failed
      Bad status: Aborted: 1 errors were detected
      

      This is specifically the error:

      Unexpected opid following opid term: 5 index: 685. Operation: 5,680 REPLICATE (Type: WRITE_OP).

      Link: http://a1228.halxg.cloudera.com:8080/diagnose?key=87d4d9ae-fce2-11e4-ab1d-28924ad1fba8

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mpercy Mike Percy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: