Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
Aurora's maintenance primitives, whilst great, can be frustrating to use when dealing with large clusters, primarily due to the speed of draining hosts. The host_drain feature does accept a grouping function that can be used to drain hosts in batches, but for large clusters we typically don't want to arbitrarily divide the cluster into groups/batches and would prefer instead to drain everything that was requested, where possible, without violating the SLA.
eg, 100 hosts in need of maintenance, with each host running 1 task (of many) from 100 different jobs – all 100 hosts could be drained simultaneously without violating the SLA.