Details
Description
During a nightly integration test run, BigRootTableIT failed, timing out after 4 minutes:
java.lang.Exception: test timed out after 240000 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) at org.apache.accumulo.core.client.admin.TableOperationsImpl.addSplits(TableOperationsImpl.java:437) at org.apache.accumulo.test.functional.BigRootTabletIT.test(BigRootTabletIT.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Looking at the logs, the root tablet is assigned successfully:
2014-02-26 05:17:09,414 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(tserver1:9997[1446db2884a0002],null,null) 2014-02-26 05:17:09,596 [master.EventCoordinator] INFO : tablet +r<< was loaded on tserver1:9997
No other tablets are assigned for the next four minutes.
The logs are full of "Failed to bin" errors:
2014-02-26 05:19:09,613 [impl.ThriftTransportPool] TRACE: Using existing connection to tserver1:9997 2014-02-26 05:19:09,615 [impl.ThriftTransportPool] TRACE: Returned connection tserver1:9997 (120000) ioCount : 562 2014-02-26 05:19:09,615 [metadata.MetadataLocationObtainer] TRACE: tid=28 oid=3448 Got 2 results from +r<< in 0.002 secs 2014-02-26 05:19:09,615 [impl.TabletLocatorImpl] TRACE: tid=28 oid=3446 Binned 1 ranges for table !0 to 0 tservers in 0.003 secs 2014-02-26 05:19:09,616 [impl.TabletServerBatchReaderIterator] TRACE: Failed to bin 1 ranges, tablet locations were null, retrying in 100ms
There is an IOException, trying to do a batch read
2014-02-26 05:19:09,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : tserver1:9997 msg : java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] 2014-02-26 05:19:09,689 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio. channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:713) at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:372) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270) at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291) at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:658) ... 7 more Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 18 more 2014-02-26 05:19:09,693 [impl.TabletServerBatchReaderIterator] TRACE: Failed to execute multiscans against 1 tablets, retrying...
This would appear to be the batch scanner used to read the root table in the master.
The tablet server hosting the root tablet is being successfully scanned more that 24x a second, presumably from clients.
There are no errors in the tserver logs.
Attachments
Issue Links
- duplicates
-
ACCUMULO-1861 MetadataSplitIT test failed
- Resolved