Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently task history pruning is configured with three main settings: maximum terminal tasks per job, maximum time to retain terminal tasks and a minimum time to retain terminal tasks.
There are times where a combination of bad actors in the cluster and the minimum time to retain terminal tasks can lead to incredibly bloated task store sizes, leading to serious problems with GC pressure during task store queries, and also when creating and persisting snapshots.
At Twitter we've run into this and have had to respond by redeploying the Scheduler with more aggressive task pruning settings - which affects every user in the cluster.
What we'd like is an endpoint in aurora_admin that accepts a TaskQuery and will prune all inactive tasks that match. This should allow us to limit the pruning by role, environment and also limit the number of tasks pruned.