Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1.1
-
None
-
None
-
Hardware:
Red Hat Enterprise Linux Server release 7.9 (Maipo)
HDD 12 * 50G
16 cores
Software:
HBase Version : 2.1.1
Hadoop Version : hadoop-2.7.2
Zookeeper Version : 3.5.7
Roles:
HBase is configured to use a master high availability (HA) mode with two masters and three regionservers.
host role ysl102-qax.com master regionserver ysl103-qax.com master regionserver ysl104-qax.com regionserver Hardware: Red Hat Enterprise Linux Server release 7.9 (Maipo) HDD 12 * 50G 16 cores Software: HBase Version : 2.1.1 Hadoop Version : hadoop-2.7.2 Zookeeper Version : 3.5.7 Roles: HBase is configured to use a master high availability (HA) mode with two masters and three regionservers. host role ysl102-qax.com master regionserver ysl103-qax.com master regionserver ysl104-qax.com regionserver
Description
When I use RSGroupAdminEndpoint and restart both master before restarting the regionserver, I encounter a syscall:getsockopt(..) issue that prevents the hbase:meta region from coming online, resulting in a service exception.
2023-07-10 16:32:22,282 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162] rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread: Updating default servers. 2023-07-10 16:32:22,299 INFO [PEWorker-8] procedure.ServerCrashProcedure: Start pid=2, state=RUNNABLE:SERVER_CRASH_START, locked=true; ServerCrashProcedure server=ysl102-qax.com,16020,1688977249460, splitWal=true, meta=false 2023-07-10 16:32:22,400 INFO [PEWorker-10] master.SplitLogManager: hdfs://HACluster/home/hbase/WALs/ysl102-qax.com,16020,1688977249460-splitting dir is empty, no logs to split. 2023-07-10 16:32:22,411 INFO [PEWorker-10] master.SplitLogManager: Finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://HACluster/home/hbase/WALs/ysl102-qax.com,16020,1688977249460-splitting] in 0ms 2023-07-10 16:32:22,521 INFO [PEWorker-10] procedure2.ProcedureExecutor: Finished pid=2, state=SUCCESS; ServerCrashProcedure server=ysl102-qax.com,16020,1688977249460, splitWal=true, meta=false in 326msec 2023-07-10 16:32:22,941 INFO [RegionServerTracker-0] master.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [ysl104-qax.com,16020,1688977251592] 2023-07-10 16:32:22,941 INFO [RegionServerTracker-0] master.ServerManager: Processing expiration of ysl104-qax.com,16020,1688977251592 on ysl104-qax.com,16000,1688977910162 2023-07-10 16:32:23,069 INFO [PEWorker-12] procedure.ServerCrashProcedure: Start pid=3, state=RUNNABLE:SERVER_CRASH_START, locked=true; ServerCrashProcedure server=ysl104-qax.com,16020,1688977251592, splitWal=true, meta=true 2023-07-10 16:32:23,126 INFO [PEWorker-12] master.SplitLogManager: hdfs://HACluster/home/hbase/WALs/ysl104-qax.com,16020,1688977251592-splitting dir is empty, no logs to split. 2023-07-10 16:32:23,135 INFO [PEWorker-12] master.SplitLogManager: Finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://HACluster/home/hbase/WALs/ysl104-qax.com,16020,1688977251592-splitting] in 0ms 2023-07-10 16:32:23,174 INFO [PEWorker-12] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=4, ppid=3, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740}] 2023-07-10 16:32:23,206 INFO [PEWorker-15] procedure.MasterProcedureScheduler: Took xlock for pid=4, ppid=3, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740 2023-07-10 16:32:23,325 INFO [PEWorker-15] assignment.AssignProcedure: Starting pid=4, ppid=3, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure table=hbase:meta, region=1588230740; rit=OFFLINE, location=ysl104-qax.com,16020,1688977251592; forceNewPlan=false, retain=true 2023-07-10 16:32:23,476 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:24,477 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:25,478 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:26,479 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:26,665 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=46, started=4175 ms ago, cancelled=false, msg=Call to YSL104-QAX.COM/10.59.12.104:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: YSL104-QAX.COM/xx.xx.xx.104:16020, details=row 'hbase:rsgroup' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1 2023-07-10 16:32:27,480 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:28,481 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:29,482 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:30,483 WARN [master/YSL104-QAX:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2023-07-10 16:32:30,899 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=46, started=8409 ms ago, cancelled=false, msg=Connection closed, details=row 'hbase:rsgroup' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1 2023-07-10 16:32:31,025 INFO [RpcServer.default.FPBQ.Fifo.handler=198,queue=18,port=16000] master.ServerManager: Registering regionserver=ysl103-qax.com,16020,1688977946684 2023-07-10 16:32:31,064 INFO [RegionServerTracker-0] master.RegionServerTracker: RegionServer ephemeral node created, adding [ysl103-qax.com,16020,1688977946684] 2023-07-10 16:32:31,439 INFO [RpcServer.default.FPBQ.Fifo.handler=198,queue=18,port=16000] master.ServerManager: Registering regionserver=ysl102-qax.com,16020,1688977947399 2023-07-10 16:32:31,467 INFO [RegionServerTracker-0] master.RegionServerTracker: RegionServer ephemeral node created, adding [ysl102-qax.com,16020,1688977947399] 2023-07-10 16:32:32,934 INFO [RpcServer.default.FPBQ.Fifo.handler=198,queue=18,port=16000] master.ServerManager: Registering regionserver=ysl104-qax.com,16020,1688977948804 2023-07-10 16:32:32,965 INFO [RegionServerTracker-0] master.RegionServerTracker: RegionServer ephemeral node created, adding [ysl104-qax.com,16020,1688977948804] 2023-07-10 16:32:41,041 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162] client.RpcRetryingCallerImpl: ption: hbase:meta,,1 is not online on ysl104-qax.com,16020,1688977948804 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3316) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3293) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1431) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2449) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , details=row 'hbase:rsgroup' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1 2023-07-10 16:32:51,123 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162] client.RpcRetryingCallerImpl: ption: hbase:meta,,1 is not online on ysl104-qax.com,16020,1688977948804 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3316) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3293) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1431) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2449) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , details=row 'hbase:rsgroup' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1 2023-07-10 16:33:01,212 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162] client.RpcRetryingCallerImpl: ption: hbase:meta,,1 is not online on ysl104-qax.com,16020,1688977948804 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3316) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3293) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1431) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2449) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , details=row 'hbase:rsgroup' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1