Description
Currently the background deletion services in OM and SCM send one Ratis request per iteration. This can result in very large protos if the task's limits are high, and Ratis may reject them due to their size. HDDS-8977 tried to fix this by check the Ratis message size and falling back to a hardcoded limit of 1000 entries if the message is too big, but this is too conservative and will over-throttle the service under heavy deletion load.
For each run, these entries should collect as many items as is their configured limit, and then break them into multiple Ratis requests to send out. This way limit per task can be set purely based on the time taken to gather the entries, and not based on size of the resulting proto.
This Jira may optionally be split into subtasks for each service:
- OM open key cleanup
- OM file delete
- OM directory delete
- SCM block delete
Attachments
Issue Links
- is required by
-
HDDS-11514 Set optimal default values for delete configurations based on live cluster testing
- Open
- relates to
-
HDDS-8977 Ratis crash if a lot of directories deleted at once
- Resolved