Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6564

Queries randomly fail with "CANCELLED" due to a race with IssueInitialRanges()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Not A Bug
    • Impala 2.12.0
    • None
    • Backend

    Description

      I've been chasing a flaky test that I saw in test_basic_runtime_filters when running against https://gerrit.cloudera.org/#/c/8966/ (the scanner buffer pool changes).

      I think it is a latent bug that has started reproducing more frequently. What I've found is:

      • Different queries fail with CANCELLED. I can repro it on my branch ~3/4 times by running: impala-py.test tests/query_test/test_runtime_filters.py -n8 --verbose --maxfail 1 -k basic . It happens with a variety of queries and file formats.
      • It seems to happen when all files are pruned out by runtime filters
      • Logging reveals IssueInitialRanges() fails with a CANCELLED status, which propagates up to the query status:
          if (!initial_ranges_issued_) {
            // We do this in GetNext() to maximise the amount of work we can do while waiting for
            // runtime filters to show up. The scanner threads have already started (in Open()),
            // so we need to tell them there is work to do.
            // TODO: This is probably not worth splitting the organisational cost of splitting
            // initialisation across two places. Move to before the scanner threads start.
            Status status = IssueInitialScanRanges(state);
            if (!status.ok()) LOG(INFO) << runtime_state_->fragment_instance_id() << " IssueInitialRanges() failed with status: " << status.GetDetail()  << " " << (void*) this;
        
      • It appears that the CANCELLED comes from DiskIoMgr::AddScanRanges().
      • That function returned cancelled because a scanner thread noticed that the scan was complete here and cancelled the RequestContext:
            // Done with range and it completed successfully
            if (progress_.done()) {
              // All ranges are finished.  Indicate we are done.
              LOG(INFO) << runtime_state_->fragment_instance_id() << " All ranges done " << (void*) this;
              SetDone();
              break;
            }
        

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: