Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.11.0
-
None
-
None
Description
I tried to run tpcds_sf100-query2 on parquet table in 10 concurrency threads on single node drillbit cluster (I use Drill with DRILL-5599 fix) and caught a resources leak. The query hanged in CANCELLATION_REQUESTED state.
Steps to reproduce:
1) Start ConcurrencyTest.java with tpcds_sf100-query2 on parquet table (in attachment);
2) Wait 3-5 seconds and make Ctrl+c to kill a client.
3) Retry step 2) several times until you get "CANCELLATION_REQUESTED" on some queries.
Queries will hang until drillbit restart. If we make "top", we can see that drillbit uses CPU.
Jstack example:
"26af36b2-7a44-5af8-e0c3-95a4f132fc7a:frag:14:1" #1268 daemon prio=10 os_prio=0 tid=0x00007f25a5afa800 nid=0x16f2 runnable [0x00007f2535a5a000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked <0x0000000728ca82b0> (a java.lang.InterruptedException) at java.lang.Throwable.<init>(Throwable.java:250) at java.lang.Exception.<init>(Exception.java:54) at java.lang.InterruptedException.<init>(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439) at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.clear(AsyncPageReader.java:301) at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.clear(ColumnReader.java:147) at org.apache.drill.exec.store.parquet.columnreaders.ReadState.close(ReadState.java:179) at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.close(ParquetRecordReader.java:318) at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:209) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) at org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext(BroadcastSenderRootExec.java:95) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
I added drillbit.log and full jstack log in attachments.
Attachments
Attachments
Issue Links
- duplicates
-
DRILL-5420 all cores at 100% of all servers
- Resolved