Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.6.0
-
None
-
ghx-label-9
Description
When doing concurrency testing as part of the competitive benchmarking I noticed that it is very difficult to saturate all CPUs @100%
Below is a snapshot from htop during a concurrency run, state below closely mimics the steady state, note that CPUs 41-60 are less busy compared to 1-20.
Then I ran the command below which dumps the threads and processor associated with each, reference.
for i in $(pgrep impalad); do ps -mo pid,tid,fname,user,psr -p $i;done
From the man page for ps :
psr PSR processor that process is currently assigned to.
The output showed that a large number of threads are running on core 61, not surprisingly the 1K threads are all thrift-server threads, so I am wondering if this is skewing the kernel's ability to evenly distribute the threads across the cores or something.
I did a followup experiment using by profiling different core ranges on the system :
Run 80 concurrent queries dominated by shuffle exchange
Profile cores 01-20 to foo_01-20
Profile cores 41-60 to foo_41-60
Results showed that :
Cores 01-20 had 50% more instructions retired
Cores 01-20 show significantly more contention on pthread_cond_wait, base::internal::SpinLockDelay and __lll_lock_wait
Skew is dominated by DataStreamSender
ScannerThread(s) also show significant skew
Attachments
Attachments
Issue Links
- is related to
-
IMPALA-4923 Operators running on top of selective Parquet scans spend a lot of time calling impala::MemPool::FreeAll on empty batches
- Resolved
- relates to
-
IMPALA-5302 tcmalloc contention limits CPU utilization on machines with >40 logical processors
- Resolved
-
IMPALA-4923 Operators running on top of selective Parquet scans spend a lot of time calling impala::MemPool::FreeAll on empty batches
- Resolved