Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Failing about 15% of the time.. In testShutdownHandling.. https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html
Adding some debug. Its hard to follow what is going on in this test.
Attachments
Attachments
- HBASE-19840.master.001.patch
- 29 kB
- Michael Stack
- HBASE-19840.master.001.patch
- 29 kB
- Michael Stack
Issue Links
- is related to
-
HBASE-19907 TestMetaWithReplicas still flakey
- Resolved
- relates to
-
HBASE-19350 TestMetaWithReplicas is flaky
- Resolved
- links to
Activity
FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4450 (See https://builds.apache.org/job/HBase-Trunk_matrix/4450/)
HBASE-19840 Flakey TestMetaWithReplicas (stack: rev d49357f2652ef730a3dcbd40a8b0eb7e2174626a)
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java
The meta replicas all end up on the one server which is then killed in the test (no check for all metas on one server) so there is no place to go get region location. This happens randomly. I checked the last few fails up here https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html
The balancer is disabled in the setup. It has some notion of replicas but I think the way meta replicas are assigned, it frustrates balancer being able to distribute out the replicas.
Trying having balancer enabled on startup and added assert if all metas are on same server so we fail fast.... only, then I ran into this.... in shutdown:
Thread 753 (M:0;localhost:62884):
State: TIMED_WAITING
Blocked count: 115
Waited count: 1351
Stack:
java.lang.Thread.sleep(Native Method)
org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:181)
org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:168)
org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:142)
org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:122)
org.apache.hadoop.hbase.master.assignment.AssignmentManager.assignMeta(AssignmentManager.java:470)
org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMeta(MasterMetaBootstrap.java:133)
org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMetaReplicas(MasterMetaBootstrap.java:82)
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:948)
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026)
org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:557)
... same as HBASE-19791 Let me do the trick over in
commit 86f4df5f74fc402b560f741e4dcd46ccaffab391
Author: zhangduo <zhangduo@apache.org>
Date: Mon Jan 22 15:03:24 2018 +0800
HBASE-19836 Fix TestZooKeeper.testLogSplittingAfterMasterRecoveryDueToZKExpiry
... until address the above.
Hmm. Above trick does not work here. Test completes but hangs in shutdown because backup master is coming up. Let me dig...
I removed the patch I committed that added debug, 2.001.
.001 should fix this flakey. It does make an interesting change though. Lets see if any repercussions; i.e. on shutdown, we stop the procedure executor.
Fix two issues:
- Meta Replicas can all be assigned to the same server. This
will call the test to hang when we do our kill of the server
hosting meta because there'll be no replicas to read from
as test intends. Check is to look for this condition on
startup and adjust if we come across it. Replicas cross-cut
assignment. They need work. - Other issue was shutdown. The master started toward the
end of the test may not have come up fully by the time
shutdown is called. We could be stuck assigning the
meta replicas. Have shutdown shutdown the procedure
executor engine.
There is other cleanup and notes in the below.
M HMaster
Remove the silly stops in startup now we have real
means of shutting down Master during init.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
This replica stuff was doing stuff it shouldn't be doing
like setting core Master state flags. It may have made
sense once but now meta is assigned by a Pv2 Procedure
so the flag setting in here is meddlesome. Clear out
methods no longer needed.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Remove unused methods.
Changes local variable names so they align w/ our naming elsewhere in
code base.
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java
Check for all replicas on the one server.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 8s | Docker mode activated. |
Prechecks | |||
0 | findbugs | 0m 0s | Findbugs executables are not available. |
+1 | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 2 new or modified test files. |
master Compile Tests | |||
0 | mvndep | 0m 22s | Maven dependency ordering for branch |
+1 | mvninstall | 4m 18s | master passed |
+1 | compile | 0m 59s | master passed |
+1 | checkstyle | 1m 30s | master passed |
+1 | shadedjars | 6m 9s | branch has no errors when building our shaded downstream artifacts. |
+1 | javadoc | 0m 44s | master passed |
Patch Compile Tests | |||
0 | mvndep | 0m 13s | Maven dependency ordering for patch |
+1 | mvninstall | 4m 21s | the patch passed |
+1 | compile | 1m 0s | the patch passed |
+1 | javac | 1m 0s | the patch passed |
-1 | checkstyle | 1m 9s | hbase-server: The patch generated 5 new + 244 unchanged - 12 fixed = 249 total (was 256) |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedjars | 4m 37s | patch has no errors when building our shaded downstream artifacts. |
+1 | hadoopcheck | 18m 7s | Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. |
+1 | javadoc | 0m 44s | the patch passed |
Other Tests | |||
+1 | unit | 2m 14s | hbase-common in the patch passed. |
-1 | unit | 107m 7s | hbase-server in the patch failed. |
+1 | asflicense | 0m 38s | The patch does not generate ASF License warnings. |
148m 58s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12907631/HBASE-19840.master.001.patch |
Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile |
uname | Linux 373add6e51fc 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux |
Build tool | maven |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh |
git revision | master / ce50830a0a |
maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
Default Java | 1.8.0_151 |
checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11196/artifact/patchprocess/diff-checkstyle-hbase-server.txt |
unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11196/artifact/patchprocess/patch-unit-hbase-server.txt |
Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11196/testReport/ |
modules | C: hbase-common hbase-server U: . |
Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11196/console |
Powered by | Apache Yetus 0.6.0 http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 14s | Docker mode activated. |
Prechecks | |||
0 | findbugs | 0m 0s | Findbugs executables are not available. |
+1 | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 2 new or modified test files. |
master Compile Tests | |||
0 | mvndep | 0m 50s | Maven dependency ordering for branch |
+1 | mvninstall | 5m 45s | master passed |
+1 | compile | 1m 14s | master passed |
+1 | checkstyle | 1m 47s | master passed |
+1 | shadedjars | 7m 3s | branch has no errors when building our shaded downstream artifacts. |
+1 | javadoc | 1m 0s | master passed |
Patch Compile Tests | |||
0 | mvndep | 0m 13s | Maven dependency ordering for patch |
+1 | mvninstall | 5m 13s | the patch passed |
+1 | compile | 1m 15s | the patch passed |
+1 | javac | 1m 15s | the patch passed |
-1 | checkstyle | 1m 25s | hbase-server: The patch generated 5 new + 244 unchanged - 12 fixed = 249 total (was 256) |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedjars | 5m 12s | patch has no errors when building our shaded downstream artifacts. |
+1 | hadoopcheck | 21m 36s | Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. |
+1 | javadoc | 0m 57s | the patch passed |
Other Tests | |||
+1 | unit | 2m 31s | hbase-common in the patch passed. |
+1 | unit | 104m 35s | hbase-server in the patch passed. |
+1 | asflicense | 0m 33s | The patch does not generate ASF License warnings. |
155m 12s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12907815/HBASE-19840.master.001.patch |
Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile |
uname | Linux b6d4fc08ee91 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux |
Build tool | maven |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh |
git revision | master / aeffca497b |
maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
Default Java | 1.8.0_151 |
checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11200/artifact/patchprocess/diff-checkstyle-hbase-server.txt |
Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11200/testReport/ |
modules | C: hbase-common hbase-server U: . |
Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11200/console |
Powered by | Apache Yetus 0.6.0 http://yetus.apache.org |
This message was automatically generated.
Pushed to branch-2 and master. Then added a subsequent addendum to address the checkstyle.
FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4492 (See https://builds.apache.org/job/HBase-Trunk_matrix/4492/)
HBASE-19840 Flakey TestMetaWithReplicas (stack: rev 4f547b3817e01a1f98c965a502775de481e6ca96)
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
- (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/HasThread.java
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
HBASE-19840Flakey TestMetaWithReplicas; ADDENDUM to fix Checksyte (stack: rev 0b9a0dc9519d511908efd28caf2cf010e3a1ff79) - (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
I see some flakyness still. There is something weird going on. Two ServerNames seem to hash the same. Doesn't make sense (I made a test to try it). Reopening to figure. Pushing a bit more debug... in meantime.
Reopening.
FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4533 (See https://builds.apache.org/job/HBase-Trunk_matrix/4533/)
HBASE-19840 Flakey TestMetaWithReplicas; ADDENDUM Adding debug (stack: rev 9f2149f171e5bcd4e0160458f818fa192c62c082)
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/TestServerName.java
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java
Hmm... This has fallen off the 'charts': https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html Will give it another day...
Pushed the .001 debug patch to master and branch-2.