Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28663

CanaryTool continues executing and scanning after timeout

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      If you run the CanaryTool in region mode until it reaches the configured timeout, the logs and sink results will show that it can continue executing and scanning for 10 seconds.

      This is because the RegionTasks have already been submitted to an ExecutorService which continues execution after timeout, and the Monitor continues execution on a separate thread.

      The 10 second delay in shutdown is seen, in hbase 2.x at least, because runMonitor will close the Connection and that process (code) will lead to ConnectionImplementation#close (code) and inside shutdownPools we will potentially wait the full 10 seconds of awaitTermination if client operations are in progress.

      The scenario can be improved by simply interrupting the monitor thread, as we will often be in an invokeAll call in a sniff method. The invokeAll method is blocking, and interrupting the monitor in this call will interrupt the client threads and generally shutdown properly and timely. However, we can be more robust by also watching for a shutdown signal in the various tasks such as RegionTask so any remaining tasks will drain quickly and without errors. This will remove a lot of errors from the canary logs during shutdown.

       

      2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor is running too long (1140098) after timeout limit:1140000 will be killed itself !!
      
      2024-06-12 02:57:14 [Time-limited test] INFO client.ConnectionImplementation(2039): Closing master protocol: MasterService
      
      2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed
      java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 180094]
      	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199)
      	at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
      	at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
      	at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
      	at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)
      
      [... repeats for 10 seconds and tens of thousands of regions ... ]
      
      2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from REGION10000. on serverName=REGIONSERVER-2, columnFamily=0 failed
      java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 180098]
      
      [...]
      
      2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed
      java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 180101]
      
      2024-06-12T02:57:24.202Z, java.io.InterruptedIOException
      
      	at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:294)
      	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:255)
      	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:53)
      	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:191)
      	at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
      	at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
      	at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
      	at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471) 
      
      2024-06-12 02:57:24 [pool-3-thread-8] WARN client.ConnectionImplementation(428): Checking master connection org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to address=HMASTER-1 failed on local exception: org.apache.hadoop.hbase.ipc.StoppedRpcClientException at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:206) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:446) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:332) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:595) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java) at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceState.isMasterRunning(ConnectionImplementation.java:1235) at org.apache.hadoop.hbase.client.ConnectionImplementation.lambda$masterConnectionStateSupplier$1(ConnectionImplementation.java:421) at org.apache.hadoop.hbase.client.ConnectionImplementation.isKeepAliveMasterConnectedAndRunning(ConnectionImplementation.java:2025) at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1441) at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1433) at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:101) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3020) at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:579) at org.apache.hadoop.hbase.client.HTable.getDescriptor(HTable.java:242) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:593) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)
      
      2024-06-12 02:57:24 [pool-3-thread-7] ERROR tool.Canary(345): Read from REGION43000. on serverName=REGIONSERVER-4 failed
      org.apache.hadoop.hbase.DoNotRetryIOException: Connection was closed while trying to get master
      	at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1381)
      	at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1445)
      	at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1433)
      	at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
      	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:101)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3020)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:579)
      	at org.apache.hadoop.hbase.client.HTable.getDescriptor(HTable.java:242)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:593)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
      	at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)

      Attachments

        Activity

          People

            dmanning David Manning
            dmanning David Manning
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 24h
                24h