Description
I have a job that has 10,000+ partitions that it's consuming from. After SAMZA-123, it's been switched to use the GroupBySystemStreamPartition strategy, which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint messages being sent every minute.
To keep the checkpoint topic from getting too large, we enabled log compaction on the Kafka topic, but we discovered that the topic then grew to be very large. This behavior was triggered because we were sending compressed messages to the Kafka checkpoint topic.
Based on KAFKA-1374, it appears that we can't use compressed checkpoint topics with log compaction.
I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the ticket is resolved, we can update the Samza code to default the checkpoint topics to be log compacted (with a small segment size), and not worry about the compression anymore.
Attachments
Attachments
Issue Links
- depends upon
-
KAFKA-1374 LogCleaner (compaction) does not support compressed topics
- Resolved
- is related to
-
SAMZA-399 Make checkpoint topic log segment size configurable
- Resolved