Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-4
Description
The concurrent_select.py process starts multiple sub processes (called query runners), to run the queries. It also starts 2 threads called the query producer thread and the query consumer thread. The query producer thread adds queries to a query queue and the query consumer thread pulls off the queue and feeds the queries to the query runners.
The query runner, once it gets queries, does the following:
(pseudo code. Real code here: https://github.com/apache/impala/blob/d49f629c447ea59ad73ceeb0547fde4d41c651d1/tests/stress/concurrent_select.py#L583-L595)
with _submit_query_lock:
increment(num_queries_started)
run_query() # One runner crashes here.
increment(num_queries_finished)
One of the runners crash inside run_query(), thereby never incrementing num_queries_finished.
Another thread that's supposed to check for memory leaks (but actually doesn't), periodically acquires '_submit_query_lock' and waits for the number of running queries to reach 0 before releasing the lock:
https://github.com/apache/impala/blob/d49f629c447ea59ad73ceeb0547fde4d41c651d1/tests/stress/concurrent_select.py#L449-L511
However, in the above case, the number of running queries will never reach 0 because one of the query runners hasn't incremented 'num_queries_finished' and exited. Therefore, the poll_mem_usage() function will hold the lock indefinitely, causing no new queries to be submitted, nor the stress test to complete running.
Attachments
Issue Links
- is caused by
-
IMPALA-6326 segfault during impyla HiveServer2Cursor.cancel_operation() over SSL
- Resolved
- is superceded by
-
IMPALA-6681 Refactor query producer consumer model in stress runner
- Open