Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19147 All branch-2 unit tests pass
  3. HBASE-20015

TestMergeTableRegionsProcedure and TestRegionMergeTransactionOnCluster flakey

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0-beta-2, 2.0.0
    • test
    • None

    Description

      MergeRegionProcedure seems incomplete. The ProcedureExecutor framework can run in a test mode such that it kills the Procedure before it can persist state and it does this repeatedly to shake out areas where Procedures may not be preserving all needed state at each Procedural step. The kill will cause the Procedure to 'fail'. It'll then run the rollback procedure. The MergeRegionProcedure is not able to roll back the last few steps of Merge.... It throws an UnsupportedException (the hope was that the missing steps would be filled in ... but they are hard to complete in that they themselves are stepped).

      So....

      Well it turns out that Split has a mechanism where it will not fail the Procedure if gets to a stage from which it cannot rollback. Instead, it will just retry and keep retrying till it succeeds.... eventually. Merge has this facility half-implemented. Merge tests are therefore flakey. They do stuff like this:

      2018-02-17 04:04:02,999 WARN  [PEWorker-1] assignment.MergeTableRegionsProcedure(311): Failed rollback attempt step MERGE_TABLE_REGIONS_UPDATE_META for merging the regions [485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c] in table testRollbackAndDoubleExecution
      java.lang.UnsupportedOperationException: pid=44, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: abort requested; MergeTableRegionsProcedure table=testRollbackAndDoubleExecution, regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c], forcibly=false unhandled state=MERGE_TABLE_REGIONS_UPDATE_META
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:291)
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:78)
      	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:199)
      	at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:859)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1356)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1312)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1181)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
      2018-02-17 04:04:03,007 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): CODE-BUG: Uncaught runtime exception for pid=44, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: abort requested; MergeTableRegionsProcedure table=testRollbackAndDoubleExecution, regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c], forcibly=false
      java.lang.UnsupportedOperationException: pid=44, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: abort requested; MergeTableRegionsProcedure table=testRollbackAndDoubleExecution, regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c], forcibly=false unhandled state=MERGE_TABLE_REGIONS_UPDATE_META
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:291)
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:78)
      	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:199)
      	at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:859)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1356)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1312)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1181)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
      

      i.e. throw up their hands which makes for a CODE-BUG... a condition the framework can not process.... The test fails.

      Attachments

        1. HBASE-20015.branch-2.001.patch
          3 kB
          Michael Stack

        Issue Links

          Activity

            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 14s Docker mode activated.
                  Prechecks
            0 findbugs 0m 0s Findbugs executables are not available.
            +1 hbaseanti 0m 0s Patch does not have any anti-patterns.
            +1 @author 0m 0s The patch does not contain any @author tags.
            -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                  branch-2 Compile Tests
            +1 mvninstall 3m 16s branch-2 passed
            +1 compile 0m 41s branch-2 passed
            +1 checkstyle 1m 5s branch-2 passed
            +1 shadedjars 5m 2s branch has no errors when building our shaded downstream artifacts.
            +1 javadoc 0m 27s branch-2 passed
                  Patch Compile Tests
            +1 mvninstall 3m 23s the patch passed
            +1 compile 0m 44s the patch passed
            +1 javac 0m 44s the patch passed
            -1 checkstyle 1m 9s hbase-server: The patch generated 1 new + 148 unchanged - 0 fixed = 149 total (was 148)
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 shadedjars 4m 8s patch has no errors when building our shaded downstream artifacts.
            +1 hadoopcheck 14m 57s Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0.
            +1 javadoc 0m 29s the patch passed
                  Other Tests
            +1 unit 103m 29s hbase-server in the patch passed.
            +1 asflicense 0m 21s The patch does not generate ASF License warnings.
            134m 23s



            Subsystem Report/Notes
            Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db
            JIRA Issue HBASE-20015
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12911002/HBASE-20015.branch-2.001.patch
            Optional Tests asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
            uname Linux 7722d4c1beb3 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux
            Build tool maven
            Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
            git revision branch-2 / 8be0696320
            maven version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)
            Default Java 1.8.0_151
            checkstyle https://builds.apache.org/job/PreCommit-HBASE-Build/11557/artifact/patchprocess/diff-checkstyle-hbase-server.txt
            Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/11557/testReport/
            Max. process+thread count 5423 (vs. ulimit of 10000)
            modules C: hbase-server U: hbase-server
            Console output https://builds.apache.org/job/PreCommit-HBASE-Build/11557/console
            Powered by Apache Yetus 0.7.0 http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated.       Prechecks 0 findbugs 0m 0s Findbugs executables are not available. +1 hbaseanti 0m 0s Patch does not have any anti-patterns. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       branch-2 Compile Tests +1 mvninstall 3m 16s branch-2 passed +1 compile 0m 41s branch-2 passed +1 checkstyle 1m 5s branch-2 passed +1 shadedjars 5m 2s branch has no errors when building our shaded downstream artifacts. +1 javadoc 0m 27s branch-2 passed       Patch Compile Tests +1 mvninstall 3m 23s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed -1 checkstyle 1m 9s hbase-server: The patch generated 1 new + 148 unchanged - 0 fixed = 149 total (was 148) +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedjars 4m 8s patch has no errors when building our shaded downstream artifacts. +1 hadoopcheck 14m 57s Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. +1 javadoc 0m 29s the patch passed       Other Tests +1 unit 103m 29s hbase-server in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 134m 23s Subsystem Report/Notes Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db JIRA Issue HBASE-20015 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12911002/HBASE-20015.branch-2.001.patch Optional Tests asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile uname Linux 7722d4c1beb3 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision branch-2 / 8be0696320 maven version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) Default Java 1.8.0_151 checkstyle https://builds.apache.org/job/PreCommit-HBASE-Build/11557/artifact/patchprocess/diff-checkstyle-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/11557/testReport/ Max. process+thread count 5423 (vs. ulimit of 10000) modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/11557/console Powered by Apache Yetus 0.7.0 http://yetus.apache.org This message was automatically generated.
            stack Michael Stack added a comment -

            Pushed to master and branch-2 after fixing checkstyle. Leaving open to see if this makes a difference in our test runs.

            stack Michael Stack added a comment - Pushed to master and branch-2 after fixing checkstyle. Leaving open to see if this makes a difference in our test runs.
            hudson Hudson added a comment -

            FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4603 (See https://builds.apache.org/job/HBase-Trunk_matrix/4603/)
            HBASE-20015 TestMergeTableRegionsProcedure and (stack: rev f3ff55a2b4bb7a8b4980fdbb5b1f7a8d033631f3)

            • (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
            • (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
            hudson Hudson added a comment - FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4603 (See https://builds.apache.org/job/HBase-Trunk_matrix/4603/ ) HBASE-20015 TestMergeTableRegionsProcedure and (stack: rev f3ff55a2b4bb7a8b4980fdbb5b1f7a8d033631f3) (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
            stack Michael Stack added a comment -

            Hmm.. These have dropped off the top list in the flakies dashboard but still on the bottom set, as though the patch were not in place... Looking.

            stack Michael Stack added a comment - Hmm.. These have dropped off the top list in the flakies dashboard but still on the bottom set, as though the patch were not in place... Looking.
            stack Michael Stack added a comment -

            Resolving. This fixed one category of TestMergeRegion failiure... where the test abort would be thrown while mid-step. Another category remains, where abort is called outside of the Procedure. Let me not address in new issue rather than have the one JIRA do two different fixes.

            stack Michael Stack added a comment - Resolving. This fixed one category of TestMergeRegion failiure... where the test abort would be thrown while mid-step. Another category remains, where abort is called outside of the Procedure. Let me not address in new issue rather than have the one JIRA do two different fixes.

            People

              stack Michael Stack
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: