Description
Mesos agent crashes with following stack trace on newer Linux kernels (>=5.8.x) if started with MESOS_ISOLATION=linux/capabilities.
Tested on 5.7.19 where it was running fine, but fails on 5.8.18, 5.9.11 and 5.10
Dec 13 05:08:28 mesosbox mesos-agent[465]: sh: hadoop: command not found
Dec 13 05:08:28 mesosbox mesos-agent[466]: I1213 05:08:28.234824 458 fetcher.cpp:66] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Hadoop client is not available, exit status: 32512
Dec 13 05:08:28 mesosbox mesos-agent[466]: Reached unreachable statement at linux/capabilities.cpp:497
Dec 13 05:08:28 mesosbox mesos-agent[466]: *** Aborted at 1607836108 (unix time) try "date -d @1607836108" if you are using GNU date ***
Dec 13 05:08:28 mesosbox mesos-agent[466]: PC: @ 0x7f875bd62387 __GI_raise
Dec 13 05:08:28 mesosbox mesos-agent[466]: *** SIGABRT (@0x1ca) received by PID 458 (TID 0x7f8760ddca00) from PID 458; stack trace: ***
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875c626630 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd62387 __GI_raise
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd63a78 __GI_abort
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875e60f237 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef6e7c1 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef723cc (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef70c96 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875f05389d (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ed837fc (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ed72332 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ecf54c6 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x55f5d9c1a256 (unknown)
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd4e555 __libc_start_main
Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x55f5d9c1d10f (unknown)
Dec 13 05:08:28 mesosbox kernel: audit: type=1701 audit(1607836108.250:274): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=4772 comm="mesos-agent" exe="/usr/sbin/mesos-agent" sig=6 res=1
When looked further, I could find out that this was raised from linux/capabilities.cpp which converts capability enum values to human-readable names.
ostream& operator<<(ostream& stream, const Capability& capability) { switch (capability) { case CHOWN: return stream << "CHOWN"; case DAC_OVERRIDE: return stream << "DAC_OVERRIDE"; case AUDIT_READ: return stream << "AUDIT_READ"; ... ... case MAX_CAPABILITY: UNREACHABLE(); // !!! Crash site } UNREACHABLE(); }
MAX_CAPABILITY is defined as 38. But as of now, new capabilities were introduced to Linux. Namely,
- CAP_PERFMON=38 // (since Linux 5.8) - Employ various performance-monitoring mechanisms
- CAP_BPF=39 // (since Linux 5.8) - Employ privileged BPF operations;
- CAP_CHECKPOINT_RESTORE=40 // (since Linux 5.9) - Allow checkpoint/restore related operations
ref: https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h
Above Mesos code does not seem to respect such kernel evolutions. So adding new capability on Kernel will break the Isolator.
Attachments
Issue Links
- links to