Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2408

metadata table not assigned after root table is loaded

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0, 1.5.1
    • 1.4.5, 1.5.2, 1.6.0
    • master

    Description

      During a nightly integration test run, BigRootTableIT failed, timing out after 4 minutes:

      java.lang.Exception: test timed out after 240000 milliseconds
      	at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
      	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
      	at org.apache.accumulo.core.client.admin.TableOperationsImpl.addSplits(TableOperationsImpl.java:437)
      	at org.apache.accumulo.test.functional.BigRootTabletIT.test(BigRootTabletIT.java:50)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      

      Looking at the logs, the root tablet is assigned successfully:

      2014-02-26 05:17:09,414 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(tserver1:9997[1446db2884a0002],null,null)
      2014-02-26 05:17:09,596 [master.EventCoordinator] INFO : tablet +r<< was loaded on tserver1:9997
      

      No other tablets are assigned for the next four minutes.

      The logs are full of "Failed to bin" errors:

      2014-02-26 05:19:09,613 [impl.ThriftTransportPool] TRACE: Using existing connection to tserver1:9997
      2014-02-26 05:19:09,615 [impl.ThriftTransportPool] TRACE: Returned connection tserver1:9997 (120000) ioCount : 562
      2014-02-26 05:19:09,615 [metadata.MetadataLocationObtainer] TRACE: tid=28 oid=3448  Got 2 results  from +r<< in 0.002 secs
      2014-02-26 05:19:09,615 [impl.TabletLocatorImpl] TRACE: tid=28 oid=3446  Binned 1 ranges for table !0 to 0 tservers in 0.003 secs
      2014-02-26 05:19:09,616 [impl.TabletServerBatchReaderIterator] TRACE: Failed to bin 1 ranges, tablet locations were null, retrying in 100ms
      

      There is an IOException, trying to do a batch read

      2014-02-26 05:19:09,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : tserver1:9997 msg : java.net.SocketTimeoutException: 120000 millis timeout while
       waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
      2014-02-26 05:19:09,689 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting
       for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
      java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.
      channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:713)
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:372)
              at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
              at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
              at java.lang.Thread.run(Thread.java:744)
      Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
              at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
              at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
              at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
              at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291)
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:658)
              ... 7 more
      Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
              ... 18 more
      2014-02-26 05:19:09,693 [impl.TabletServerBatchReaderIterator] TRACE: Failed to execute multiscans against 1 tablets, retrying...
      

      This would appear to be the batch scanner used to read the root table in the master.

      The tablet server hosting the root tablet is being successfully scanned more that 24x a second, presumably from clients.

      There are no errors in the tserver logs.

      Attachments

        Issue Links

          Activity

            People

              ecn Eric C. Newton
              ecn Eric C. Newton
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: