Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 3.1.0
-
None
-
None
-
ghx-label-7
Description
AggregationNode's memory estimates are calculated based on the input cardinality of the node, without accounting for the division of input data across fragment instances. This results in very high memory estimates. In reality, the nodes often use only a part of this memory.
Example query:
[localhost:21000] default> select distinct * from tpch.lineitem limit 5;
Summary:
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 04:EXCHANGE | 1 | 21.24us | 21.24us | 5 | 5 | 48.00 KB | 16.00 KB | UNPARTITIONED | | 03:AGGREGATE | 3 | 5.11s | 5.15s | 15 | 5 | 576.21 MB | 1.62 GB | FINALIZE | | 02:EXCHANGE | 3 | 709.75ms | 728.91ms | 6.00M | 6.00M | 5.46 MB | 10.78 MB | HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment) | | 01:AGGREGATE | 3 | 4.37s | 4.70s | 6.00M | 6.00M | 36.77 MB | 1.62 GB | STREAMING | | 00:SCAN HDFS | 3 | 437.14ms | 480.60ms | 6.00M | 6.00M | 65.51 MB | 264.00 MB | tpch.lineitem | +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The plan estimates 3.50 GB memory per host but the query ends up with a peak memory usage of 682.07 MB.