[KAFKA-13842] Add per-aggregation step before repartitioning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
- needs-kip

Description

Kafka Streams follows a continuous refinement model for aggregation. For this reason, we never implement a pre-aggregation step before data repartitioning, because it won't help much to reduce repartition cost (there is no natural boundary when a pre-aggregation is finished and when to emit it downstream for the actual aggregation roll-up).

With https://issues.apache.org/jira/browse/KAFKA-13785 we introduce a per-aggregation "emit final" feature (different to suppress()) that changes the continuous refinement model and thus it seems to be a good optimization to add a pre-aggregation step if this new feature is used.

We might want to give user control over inserting the pre-aggregation step because there is no free lunch... If we have X distinct keys, pre-aggregation implies that the upstream RocksDB store will need to store up to X rows to hold the pre-aggregate. Thus, given N input partitions, we need to hold N*X rows (upstream) plus X rows (in the final donwstream aggregation). – In contrast, a direct repartition step will only require to hold X rows downstream. It's a tradeoff between (much) higher disk usage vs network/Kafka traffic.

Attachments

Issue Links

requires

KAFKA-13785 Support emit final result for windowed aggregation

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Matthias J. Sax

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Apr/22 23:38

Updated:: 20/Apr/22 23:59