Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 4.1.0
-
None
-
Patch, Important
-
ghx-label-3
Description
This issue was observed when impala queries large datasets resides in Kudu. Even single ImpalaD is scanning multiple kudu tablets, it shows a slowness to retrive data eventhough ImpalaD makes parrellel scans. Reason for this is ImpalaD only uses a single Kudu client for multiple scans but KuduScanner::NextBatch runs on a single thread. So it's rpc reactor thread utilizes upto a single core and bottlenecks all parrelel scans.
This behaviour makes Impala clusters that scans kudu cannot be vertically scales to the maximum performance/cores of a node.
Please refer the screenshots from Kudu slack channel for more information.