Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.16.0
-
None
-
None
Description
While we tried to declare network bandwidth as a custom resource in Mesos, we faced a crash in Aurora with the following stacktrace:
Jul 18, 2018 1:35:19 PM com.google.common.util.concurrent.ServiceManager$ServiceListener failed SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING state. java.lang.NullPointerException: Unknown Mesos resource: name: "network_bandwidth" type: SCALAR scalar { value: 2000.0 } role: "*" 11: "\n\adefault" at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) at org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) at org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) at org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) at org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) at org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) E0718 13:35:19.240 [SlotSizeCounterService RUNNING, GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService [FAILED] faile I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting down application I0718 13:35:19.240 [SlotSizeCounterService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands. I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] SchedulerLifecycle state machine transition ACTIVE -> DEAD I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework 2a905643-b76f-4f17-a406-524d406f49f8-0000 I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] storage state machine transition READY -> STOPPED I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver exited, terminating lifecycle. I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389] SchedulerLifecycle state machine transition DEAD -> DEAD I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown already invoked, ignoring extra call. I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down Quartz cron scheduler. I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:694] Scheduler QuartzScheduler_$_aurora-cron-1 shutting down. I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:613] Scheduler QuartzScheduler_$_aurora-cron-1 paused. I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:771] Scheduler QuartzScheduler_$_aurora-cron-1 shutdown complete. E0718 13:35:19.945 [AsyncProcessor-0, AsyncUtil:159] java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Driver is no
It would be great if Aurora was able to handle custom resources or at least not crash.
We are using version 0.16.0.
https://mesos.slack.com/archives/C1KR1PRP1/p1532013001000626