[DRILL-5728] Hash Aggregate: Useless bigint value vector in the values batch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.11.0
Fix Version/s: None
Component/s: Execution - Codegen
Labels:
None

Description

When aggregating a non-nullable column (like sum(l_partkey) below), the code generation creates an extra value vector (in addition to the actual "sum" vector) which is used as a "nonNullCount".
This is useless (as the underlying column is non-nullable), and wastes considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by l_orderkry;

And as can be seen in the generated code below, the bigint value vector vv5 is only used to hold a 1 flag to note "not null":

        public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
            throws SchemaChangeException
        {
            {
                IntHolder out11 = new IntHolder();
                {
                    out11 .value = vv8 .getAccessor().get((incomingRowIdx));
                }
                IntHolder in = out11;
                work0 .value = vv1 .getAccessor().get((htRowIdx));
                BigIntHolder value = work0;
                work4 .value = vv5 .getAccessor().get((htRowIdx));
                BigIntHolder nonNullCount = work4;
                 
SumFunctions$IntSum_add: {
    nonNullCount.value = 1;
    value.value += in.value;
}
 
                work0 = value;
                vv1 .getMutator().set((htRowIdx), work0 .value);
                work4 = nonNullCount;
                vv5 .getMutator().set((htRowIdx), work4 .value);
            }
        }

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Boaz Ben-Zvi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Aug/17 23:09

Updated:: 17/Aug/17 23:42