Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
The PreemptionVictimFilterImpl uses a comparator to sort ResourceBags in order to preempt the biggest tasks first when searching for a victim. However, the current implementation causes an exception which causes the Scheduler to fail:
SEVERE: Service PreemptorService [FAILED] has failed in the RUNNING state. java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeLo(TimSort.java:777) at java.util.TimSort.mergeAt(TimSort.java:514) at java.util.TimSort.mergeCollapse(TimSort.java:441) at java.util.TimSort.sort(TimSort.java:245) at java.util.Arrays.sort(Arrays.java:1438) at com.google.common.collect.Ordering.immutableSortedCopy(Ordering.java:882) at org.apache.aurora.scheduler.preemptor.PreemptionVictimFilter$PreemptionVictimFilterImpl.filterPreemptionVictims(PreemptionVictimFilter.java:210) at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:178) at org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147) at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) at org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562) at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113) at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135) at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) at org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:205) at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188) at com.google.common.util.concurrent.Callables$4.run(Callables.java:122) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Looking at the code, it seems it violates transitivity:
@VisibleForTesting static final Ordering<ResourceBag> ORDER = new Ordering<ResourceBag>() { @Override public int compare(ResourceBag left, ResourceBag right) { Set<ResourceType> types = ImmutableSet.<ResourceType>builder() .addAll(left.streamResourceVectors().map(e -> e.getKey()).iterator()) .addAll(right.streamResourceVectors().map(e -> e.getKey()).iterator()) .build(); boolean allZero = true; boolean allGreaterOrEqual = true; boolean allLessOrEqual = true; for (ResourceType type : types) { int compare = left.valueOf(type).compareTo(right.valueOf(type)); if (compare != 0) { allZero = false; } if (compare < 0) { allGreaterOrEqual = false; } if (compare > 0) { allLessOrEqual = false; } } if (allZero) { return 0; } if (allGreaterOrEqual) { return 1; } if (allLessOrEqual) { return -1; } return 0; } };
The example below illustrates the error:
Resource: X Y Z Bag A: 2 0 2 Bag B: 1 2 1 Bag C: 2 2 1
We can see that A = B, B < C, and C = A which would cause an exception.
There are a couple of routes we can take to solve this. What we really want is to be able to define partial ordering (comparator does total ordering) so we can do a topological sort. Additionally, we can give priority to particular resources (Dominant Resource Fairness).
Attachments
Issue Links
- is related to
-
AURORA-1943 PreemptionVictimFilter does not handle varied ResourceTypes
- Resolved