Description
We have noticed an occasional scheduler failover when host maintenance is in effect:
To index multiple values under a key, use Multimaps.index. at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215) at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173) at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$224(PendingTaskProcessor.java:130) at org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:138) at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) at org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:570) at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113) at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:119)
Diffing colliding HostOffer objects revealed the only difference is in HostAttributes maintenance mode value:
mode=NONE vs. mode=DRAINING
Upon examination it appears that it's quite possible to have duplicate HostOffer instances (same offer, same slave, different maintenance mode) due to the way offers are accessed as unmodifiable view over underlying ConcurrentSkipListSet. Here is the possible sequence:
- Pending task processor starts building unique index and the offers iterator pulls OfferA with mode=None
- A host drain operation is initiated, a HostAttributesChanged event is raised
- OfferManager processes HostAttributeChanged event and atomically swaps OfferA with OfferA' (mode=DRAINING)
- iterator.next() inside of the uniqueIndex routine pulls OfferA' and the error is raised.
We should either copy inside a synchronized getOffers() implementation or deal with possible duplicates at call site. I tend to think copying on access is a better approach. The only consumer of getOffers() is PendingTaskProcessor with a relatively infrequent run loop (1 minute), so the perf impact of making a copy of all offers within a synchronized method should be acceptable. The alternative implies leaking the abstraction of host maintenance mode into the preemptor, which is less than ideal.