Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Not A Bug
-
Impala 2.12.0
-
None
-
ghx-label-4
Description
I've been chasing a flaky test that I saw in test_basic_runtime_filters when running against https://gerrit.cloudera.org/#/c/8966/ (the scanner buffer pool changes).
I think it is a latent bug that has started reproducing more frequently. What I've found is:
- Different queries fail with CANCELLED. I can repro it on my branch ~3/4 times by running: impala-py.test tests/query_test/test_runtime_filters.py -n8 --verbose --maxfail 1 -k basic . It happens with a variety of queries and file formats.
- It seems to happen when all files are pruned out by runtime filters
- Logging reveals IssueInitialRanges() fails with a CANCELLED status, which propagates up to the query status:
if (!initial_ranges_issued_) { // We do this in GetNext() to maximise the amount of work we can do while waiting for // runtime filters to show up. The scanner threads have already started (in Open()), // so we need to tell them there is work to do. // TODO: This is probably not worth splitting the organisational cost of splitting // initialisation across two places. Move to before the scanner threads start. Status status = IssueInitialScanRanges(state); if (!status.ok()) LOG(INFO) << runtime_state_->fragment_instance_id() << " IssueInitialRanges() failed with status: " << status.GetDetail() << " " << (void*) this;
- It appears that the CANCELLED comes from DiskIoMgr::AddScanRanges().
- That function returned cancelled because a scanner thread noticed that the scan was complete here and cancelled the RequestContext:
// Done with range and it completed successfully if (progress_.done()) { // All ranges are finished. Indicate we are done. LOG(INFO) << runtime_state_->fragment_instance_id() << " All ranges done " << (void*) this; SetDone(); break; }
Attachments
Issue Links
- blocks
-
IMPALA-4835 HDFS scans should operate with a constrained number of I/O buffers
- Resolved