Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0-beta-2, 2.0.0
    • amv2, test
    • None

    Description

      Failed in the nightly in interesting way.

      https://builds.apache.org/job/HBase%20Nightly/job/branch-2/398/

      A subprocedure of enable table is assigning regions. The test is doing a kill of the procedure store to ensure we can handle all state transtions even in face of failure. In this case, the kill of PE framework came in the finish of the assign procedure AFTER we'd updated hbase:meta and the internal AM state moving Region from OPENING to OPEN. Rerunning this step on restoration of the AMv2 WAL Store results in....

      2018-02-27 16:28:02,290 WARN  [PEWorker-1] assignment.RegionTransitionProcedure(331): Retryable error trying to transition: pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPEN, location=104080720bc8,38818,1519748808967
      org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected [OFFLINE, CLOSED, SPLITTING, SPLIT, OPENING, FAILED_OPEN] so could move to OPEN but current state=OPEN
      

      Attachments

        1. HBASE-20100.branch-2.001.patch
          3 kB
          Michael Stack

        Issue Links

          Activity

            stack Michael Stack added a comment -

            For posterity, here is the interesting part of the log... Below is a filter on region name only. We start from a store restart loading up the content of the meta table...

            2018-02-27 16:27:55,193 DEBUG [Time-limited test] procedure2.ProcedureExecutor(389): Loading pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c
            2018-02-27 16:27:55,196 DEBUG [PEWorker-1] procedure2.ProcedureExecutor(1230): LOCK_EVENT_WAIT pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c
            2018-02-27 16:27:57,465 INFO  [Time-limited test] assignment.RegionStateStore(121): Load hbase:meta entry region={ENCODED => cdc8d2202f4aa4daf381bf1bf2e2fe0c, NAME => 'testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c.', STARTKEY => 'c', ENDKEY => ''} regionState=OPENING lastHost=104080720bc8,38818,1519748808967 regionLocation=104080720bc8,38818,1519748808967
            2018-02-27 16:27:57,643 INFO  [PEWorker-1] procedure.MasterProcedureScheduler(571): pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c, testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c.
            2018-02-27 16:27:57,643 INFO  [PEWorker-1] assignment.RegionStateStore(189): pid=64 updating hbase:meta row=testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c., regionState=OPENING
            2018-02-27 16:27:57,651 INFO  [PEWorker-1] assignment.RegionTransitionProcedure(213): Dispatch pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPENING, location=104080720bc8,38818,1519748808967
            2018-02-27 16:27:57,651 DEBUG [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=33104] assignment.RegionTransitionProcedure(235): Received report OPENED seqId=0, pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPENING, location=104080720bc8,38818,1519748808967
            2018-02-27 16:27:57,767 INFO  [PEWorker-1] assignment.RegionStateStore(189): pid=64 updating hbase:meta row=testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c., regionState=OPEN, openSeqNum=0, regionLocation=104080720bc8,38818,1519748808967
            2018-02-27 16:27:57,770 DEBUG [PEWorker-1] procedure2.ProcedureExecutor(1517): TESTING: Kill before store update: pid=64, ppid=52, state=SUCCESS; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c
            2018-02-27 16:27:57,804 WARN  [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=38818] regionserver.RSRpcServices(1948): Received OPEN for the region:testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c., which is already online
            2018-02-27 16:27:58,464 DEBUG [Time-limited test] procedure2.ProcedureExecutor(389): Loading pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c
            2018-02-27 16:27:58,468 DEBUG [PEWorker-1] procedure2.ProcedureExecutor(1230): LOCK_EVENT_WAIT pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c
            2018-02-27 16:28:00,273 INFO  [Time-limited test] assignment.RegionStateStore(121): Load hbase:meta entry region={ENCODED => cdc8d2202f4aa4daf381bf1bf2e2fe0c, NAME => 'testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c.', STARTKEY => 'c', ENDKEY => ''} regionState=OPEN lastHost=104080720bc8,38818,1519748808967 regionLocation=104080720bc8,38818,1519748808967
            2018-02-27 16:28:00,274 INFO  [PEWorker-1] procedure.MasterProcedureScheduler(571): pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c, testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c.
            2018-02-27 16:28:00,275 WARN  [PEWorker-1] assignment.RegionTransitionProcedure(331): Retryable error trying to transition: pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPEN, location=104080720bc8,38818,1519748808967
            2018-02-27 16:28:00,395 WARN  [PEWorker-1] assignment.RegionTransitionProcedure(331): Retryable error trying to transition: pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPEN, location=104080720bc8,38818,1519748808967
            
            stack Michael Stack added a comment - For posterity, here is the interesting part of the log... Below is a filter on region name only. We start from a store restart loading up the content of the meta table... 2018-02-27 16:27:55,193 DEBUG [Time-limited test] procedure2.ProcedureExecutor(389): Loading pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c 2018-02-27 16:27:55,196 DEBUG [PEWorker-1] procedure2.ProcedureExecutor(1230): LOCK_EVENT_WAIT pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c 2018-02-27 16:27:57,465 INFO [Time-limited test] assignment.RegionStateStore(121): Load hbase:meta entry region={ENCODED => cdc8d2202f4aa4daf381bf1bf2e2fe0c, NAME => 'testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c.' , STARTKEY => 'c' , ENDKEY => ''} regionState=OPENING lastHost=104080720bc8,38818,1519748808967 regionLocation=104080720bc8,38818,1519748808967 2018-02-27 16:27:57,643 INFO [PEWorker-1] procedure.MasterProcedureScheduler(571): pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c, testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c. 2018-02-27 16:27:57,643 INFO [PEWorker-1] assignment.RegionStateStore(189): pid=64 updating hbase:meta row=testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c., regionState=OPENING 2018-02-27 16:27:57,651 INFO [PEWorker-1] assignment.RegionTransitionProcedure(213): Dispatch pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPENING, location=104080720bc8,38818,1519748808967 2018-02-27 16:27:57,651 DEBUG [RpcServer. default .FPBQ.Fifo.handler=3,queue=0,port=33104] assignment.RegionTransitionProcedure(235): Received report OPENED seqId=0, pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPENING, location=104080720bc8,38818,1519748808967 2018-02-27 16:27:57,767 INFO [PEWorker-1] assignment.RegionStateStore(189): pid=64 updating hbase:meta row=testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c., regionState=OPEN, openSeqNum=0, regionLocation=104080720bc8,38818,1519748808967 2018-02-27 16:27:57,770 DEBUG [PEWorker-1] procedure2.ProcedureExecutor(1517): TESTING: Kill before store update: pid=64, ppid=52, state=SUCCESS; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c 2018-02-27 16:27:57,804 WARN [RpcServer. default .FPBQ.Fifo.handler=4,queue=0,port=38818] regionserver.RSRpcServices(1948): Received OPEN for the region:testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c., which is already online 2018-02-27 16:27:58,464 DEBUG [Time-limited test] procedure2.ProcedureExecutor(389): Loading pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c 2018-02-27 16:27:58,468 DEBUG [PEWorker-1] procedure2.ProcedureExecutor(1230): LOCK_EVENT_WAIT pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c 2018-02-27 16:28:00,273 INFO [Time-limited test] assignment.RegionStateStore(121): Load hbase:meta entry region={ENCODED => cdc8d2202f4aa4daf381bf1bf2e2fe0c, NAME => 'testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c.' , STARTKEY => 'c' , ENDKEY => ''} regionState=OPEN lastHost=104080720bc8,38818,1519748808967 regionLocation=104080720bc8,38818,1519748808967 2018-02-27 16:28:00,274 INFO [PEWorker-1] procedure.MasterProcedureScheduler(571): pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c, testRecoveryAndDoubleExecution,c,1519748855039.cdc8d2202f4aa4daf381bf1bf2e2fe0c. 2018-02-27 16:28:00,275 WARN [PEWorker-1] assignment.RegionTransitionProcedure(331): Retryable error trying to transition: pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPEN, location=104080720bc8,38818,1519748808967 2018-02-27 16:28:00,395 WARN [PEWorker-1] assignment.RegionTransitionProcedure(331): Retryable error trying to transition: pid=64, ppid=52, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=testRecoveryAndDoubleExecution, region=cdc8d2202f4aa4daf381bf1bf2e2fe0c; rit=OPEN, location=104080720bc8,38818,1519748808967
            stack Michael Stack added a comment -

            .001 Allow that region may already be in the OPEN state.

            Allow OPEN as a possible state when we update region transition state.
            Usually state is OPENING but if crash before finish step is completed,
            on replay, master may have read that the state is OPEN from meta table
            and so will think it open....

            Small fix.

            stack Michael Stack added a comment - .001 Allow that region may already be in the OPEN state. Allow OPEN as a possible state when we update region transition state. Usually state is OPENING but if crash before finish step is completed, on replay, master may have read that the state is OPEN from meta table and so will think it open.... Small fix.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 20s Docker mode activated.
                  Prechecks
            +1 hbaseanti 0m 0s Patch does not have any anti-patterns.
            +1 @author 0m 0s The patch does not contain any @author tags.
            -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                  branch-2 Compile Tests
            +1 mvninstall 3m 39s branch-2 passed
            +1 compile 0m 42s branch-2 passed
            +1 checkstyle 1m 3s branch-2 passed
            +1 shadedjars 5m 1s branch has no errors when building our shaded downstream artifacts.
            -1 findbugs 1m 59s hbase-server in branch-2 has 2 extant Findbugs warnings.
            +1 javadoc 0m 28s branch-2 passed
                  Patch Compile Tests
            +1 mvninstall 3m 22s the patch passed
            +1 compile 0m 43s the patch passed
            +1 javac 0m 43s the patch passed
            +1 checkstyle 1m 4s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 shadedjars 3m 55s patch has no errors when building our shaded downstream artifacts.
            +1 hadoopcheck 14m 46s Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0.
            +1 findbugs 2m 20s the patch passed
            +1 javadoc 0m 33s the patch passed
                  Other Tests
            +1 unit 83m 53s hbase-server in the patch passed.
            +1 asflicense 0m 22s The patch does not generate ASF License warnings.
            119m 34s



            Subsystem Report/Notes
            Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db
            JIRA Issue HBASE-20100
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12912321/HBASE-20100.branch-2.001.patch
            Optional Tests asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
            uname Linux 36be7e308930 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux
            Build tool maven
            Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
            git revision branch-2 / 06cf04e458
            maven version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)
            Default Java 1.8.0_151
            findbugs v3.1.0-RC3
            findbugs https://builds.apache.org/job/PreCommit-HBASE-Build/11710/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
            Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/11710/testReport/
            Max. process+thread count 3063 (vs. ulimit of 10000)
            modules C: hbase-server U: hbase-server
            Console output https://builds.apache.org/job/PreCommit-HBASE-Build/11710/console
            Powered by Apache Yetus 0.7.0 http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated.       Prechecks +1 hbaseanti 0m 0s Patch does not have any anti-patterns. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       branch-2 Compile Tests +1 mvninstall 3m 39s branch-2 passed +1 compile 0m 42s branch-2 passed +1 checkstyle 1m 3s branch-2 passed +1 shadedjars 5m 1s branch has no errors when building our shaded downstream artifacts. -1 findbugs 1m 59s hbase-server in branch-2 has 2 extant Findbugs warnings. +1 javadoc 0m 28s branch-2 passed       Patch Compile Tests +1 mvninstall 3m 22s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 1m 4s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedjars 3m 55s patch has no errors when building our shaded downstream artifacts. +1 hadoopcheck 14m 46s Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. +1 findbugs 2m 20s the patch passed +1 javadoc 0m 33s the patch passed       Other Tests +1 unit 83m 53s hbase-server in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 119m 34s Subsystem Report/Notes Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db JIRA Issue HBASE-20100 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12912321/HBASE-20100.branch-2.001.patch Optional Tests asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile uname Linux 36be7e308930 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision branch-2 / 06cf04e458 maven version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) Default Java 1.8.0_151 findbugs v3.1.0-RC3 findbugs https://builds.apache.org/job/PreCommit-HBASE-Build/11710/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/11710/testReport/ Max. process+thread count 3063 (vs. ulimit of 10000) modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/11710/console Powered by Apache Yetus 0.7.0 http://yetus.apache.org This message was automatically generated.
            hudson Hudson added a comment -

            FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4662 (See https://builds.apache.org/job/HBase-Trunk_matrix/4662/)
            HBASE-20100 TestEnableTableProcedure flakey (stack: rev ba063abd2f4b0aaa0622a8665c4ba96ed31eafb8)

            • (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
            • (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
            • (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
            hudson Hudson added a comment - FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4662 (See https://builds.apache.org/job/HBase-Trunk_matrix/4662/ ) HBASE-20100 TestEnableTableProcedure flakey (stack: rev ba063abd2f4b0aaa0622a8665c4ba96ed31eafb8) (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
            stack Michael Stack added a comment -

            Linked to a follow-on issue. The AssignProcedure is too coarsely grained (feedback from Apache9 that makes sense). HBASE-20103 is about breaking it up to do finer steps. Will do in follow-on.

            stack Michael Stack added a comment - Linked to a follow-on issue. The AssignProcedure is too coarsely grained (feedback from Apache9 that makes sense). HBASE-20103 is about breaking it up to do finer steps. Will do in follow-on.
            stack Michael Stack added a comment -

            This is not on the flakies list. A nightly did fail subsequent to this patch going in so we'll back here I think. Resolving this in the meantime.

            stack Michael Stack added a comment - This is not on the flakies list. A nightly did fail subsequent to this patch going in so we'll back here I think. Resolving this in the meantime.

            People

              stack Michael Stack
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: