Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 4.3.0
-
None
-
ghx-label-12
Description
IMPALA-11604 add ProcessingCost (PC) concept to measure the cost for a distinct PlanNode / DataSink / PlanFragment to process its input rows globally across all of its instances.
We should investigate if the row width should be considered in computing PC for more operators, and if that will make the PC model more accurate. The code in IMPALA-11604 has materialization cost parameter to accommodate PC where row width should factor in. Currently, PC of ScanNode, ExchangeNode, and DataStreamSink has row width factored in through materialization parameter here.
For VARCHAR, we can use some kind of average width stats, if available. For fixed width columns, we just use the width. In both cases, the unit should be in bytes. The idea of including a width in costing is to make the outcome as precise and less error-prone as possible.
Attachments
Issue Links
- relates to
-
IMPALA-12657 Improve ProcessingCost of ScanNode and NonGroupingAggregator
- Resolved