Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.11.0
-
None
-
None
Description
When aggregating a non-nullable column (like sum(l_partkey) below), the code generation creates an extra value vector (in addition to the actual "sum" vector) which is used as a "nonNullCount".
This is useless (as the underlying column is non-nullable), and wastes considerable memory ( 8 * 64K = 512K per each value in a batch !!)
Example query:
select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by l_orderkry;
And as can be seen in the generated code below, the bigint value vector vv5 is only used to hold a 1 flag to note "not null":
public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx) throws SchemaChangeException { { IntHolder out11 = new IntHolder(); { out11 .value = vv8 .getAccessor().get((incomingRowIdx)); } IntHolder in = out11; work0 .value = vv1 .getAccessor().get((htRowIdx)); BigIntHolder value = work0; work4 .value = vv5 .getAccessor().get((htRowIdx)); BigIntHolder nonNullCount = work4; SumFunctions$IntSum_add: { nonNullCount.value = 1; value.value += in.value; } work0 = value; vv1 .getMutator().set((htRowIdx), work0 .value); work4 = nonNullCount; vv5 .getMutator().set((htRowIdx), work4 .value); } }