Details
Description
We recently had to patch hosts, in our situation we have a couple of services that run less than 2-5 instances with production = true and tier = preferred as provided in the default example documentation.
As we understood host_drain is not configurable to set the minimum job instance count, the default is 10. We tried to compile a list of hosts with aurora_admin sla_list_safe_domain that are running these services to feed host_drain with an unsafe_hosts_file.
When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 95 1m the scheduler returns:
INFO] Response from scheduler: OK (message: )
As if there are no hosts. We tried to change the percentage and duration to see if anything was returned but we never receive an different response.
To ensure that the client is not the cause we used the 0.16.0 client against an 0.14.0 cluster, this cluster reports hosts that are safe to kill without violating job sla's.
To ensure its not a faulty cluster setup on our part we started the vagrant sandbox, started an task with 3 instances with tier = preferred and production = True.
commands used:
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m
With -l or with time and percentage variations never changes the outcome.
Changing the instance_count to a higher number does not change output either.