[AURORA-1811] sla_list_safe_domain no longer reports SLA usage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 0.16.0
Fix Version/s: None
Component/s: Client, Maintenance, SLA
Labels:
- client
- features
- sla
Environment:

Vagrant image - Ubuntu, Centos 7.2

Description

We recently had to patch hosts, in our situation we have a couple of services that run less than 2-5 instances with production = true and tier = preferred as provided in the default example documentation.

As we understood host_drain is not configurable to set the minimum job instance count, the default is 10. We tried to compile a list of hosts with aurora_admin sla_list_safe_domain that are running these services to feed host_drain with an unsafe_hosts_file.

When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 95 1m the scheduler returns:
INFO] Response from scheduler: OK (message: )

As if there are no hosts. We tried to change the percentage and duration to see if anything was returned but we never receive an different response.

To ensure that the client is not the cause we used the 0.16.0 client against an 0.14.0 cluster, this cluster reports hosts that are safe to kill without violating job sla's.

To ensure its not a faulty cluster setup on our part we started the vagrant sandbox, started an task with 3 instances with tier = preferred and production = True.

commands used:
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m

With -l or with time and percentage variations never changes the outcome.

Changing the instance_count to a higher number does not change output either.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Rogier Dikkes

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Nov/16 11:58

Updated:: 01/Feb/17 15:17