Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
2
Description
We can reduce memory footprint on workload with thousand active partitions between checkpoints. That workload is relevant with wide checkpoint interval. More specifically, active partition here is a special case of active fileId.
Write client holds map with write handles to create ReplaceHandle between checkpoints. It leads to OutOfMemoryError on the workload because write handle is huge object.
create table source ( `id` int, `data` string ) with ( 'connector' = 'datagen', 'rows-per-second' = '100', 'fields.id.kind' = 'sequence', 'fields.id.start' = '0', 'fields.id.end' = '3000' ); create table sink ( `id` int primary key, `data` string, `part` string ) partitioned by (`part`) with ( 'connector' = 'hudi', 'path' = '/tmp/sink', 'write.batch.size' = '0.001', -- 1024 bytes 'write.task.max.size' = '101.001', -- 101.001MB 'write.merge.max_memory' = '1' -- 1024 bytes ); insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as `part` from source;
Attachments
Issue Links
- links to