Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
v1.6.0
-
None
Description
Problem
When query server needs to handle millions of records, CubeTupleConverter could become performance bottleneck.
An experiment shows that converting 5 millions records takes ~11s, which accounts for 50% of the total query time.
Motivation
Records returned from each storage partition is guaranteed to be ordered. Therefore we could reduce the number of records passed to CubeTupleConverter by
- merge sorted records from all partitions, similar to what we have done in
KYLIN-1787 - use a stream aggregate algorithm on merged stream to aggregate those records with the same key
Proposal
- Add a new physical operator GTStreamAggregateScanner which implements the stream aggregate algorithm
- Refine SortedIteratorMergerWithLimit that was used to merge sort records from different partitions. The previous implementation has performance issues (
KYLIN-2483) due to expensive record clone - Leverage GTStreamAggregateScanner to aggregate records on merged stream
Scope
Stream aggregate has some good properties such as low memory usage and streamable ordered outputs, making it better than hash/sort based alternatives when input is already sorted. So I bet the new GTStreamAggregateScanner operator can also be used to accelerate cubing and coprocessor aggregation in certain cases. I'll focus on query server in this jira and leave those optimizations as future works.