Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.0
-
None
-
None
-
Linux 4.4.0-1069-aws #79-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
Mesos 1.7.0
Description
I noticed that when I set both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP for my mesos-slave, only LIBPROCESS_IP gets propagated to mesos-docker-executor. I noticed this because I have to set them both to avoid a hostname lookup, which doesn't work in my environment. LIBPROCESS_IP is set to 0.0.0.0, so that the slave will bind to any IP adrdess (and still be reachable locally at port 5051 for metrics gathering), while LIBPROCESS_ADVERTISE_IP is set to my externally reachable IP address so the rest of the cluster can talk to it. Lo and behold, with this setup, my slave executor processes were failing with the dreaded hostname lookup.
I notice there is code to inject LIBPROCESS_IP into the executor environment, but not mention of LIBPROCESS_ADVERTISE_IP.
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L9974-L9983
Here's the command line and environment for my slave:
LIBPROCESS_IP=0.0.0.0
MASTER=zk://10.33.13.250:2181,10.33.9.108:2181,10.33.7.6:2181/mesos
LC_ALL=en_US.UTF-8
LOGS=/var/log/mesos
LIBPROCESS_ADVERTISE_IP=10.33.15.130
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
SHLVL=0
ULIMIT=-n 8192
/usr/sbin/mesos-slave --master=zk://10.33.13.250:2181,10.33.9.108:2181,10.33.7.6:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --work_dir=/mesos
And here's the command-line and environment for the executor process it attempted to run:
LIBPROCESS_IP=0.0.0.0
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.33.15.130:5051
MESOS_CHECKPOINT=0
MESOS_DIRECTORY=/mesos/slaves/7c587a36-c4ed-48ce-bfa2-2b0d6e8274b2-S3864/frameworks/dummy_sleep-func-dadkins-d84e56b1a9/executors/dummy_sleep-func-dadkins-d84e56b1a9-func_0/runs/6b5adff6-c745-49ce-93c3-682bf7a23aca
MESOS_EXECUTOR_ID=dummy_sleep-func-dadkins-d84e56b1a9-func_0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
MESOS_FRAMEWORK_ID=dummy_sleep-func-dadkins-d84e56b1a9
MESOS_HTTP_COMMAND_EXECUTOR=0
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.7.0.so
MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.7.0.so
MESOS_SLAVE_ID=7c587a36-c4ed-48ce-bfa2-2b0d6e8274b2-S3864
MESOS_SLAVE_PID=slave(1)@10.33.15.130:5051
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
mesos-docker-executor --cgroups_enable_cfs=false --container=mesos-6b5adff6-c745-49ce-93c3-682bf7a23aca–docker=docker --docker_socket=/var/run/docker.sock --help=false --initialize_driver_logging=true --launcher_dir=/usr/libexec/mesos --logbufsecs=0 --logging_level=INFO --mapped_directory=/mnt/mesos/sandbox --quiet=false --sandbox_directory=/mesos/slaves/7c587a36-c4ed-48ce-bfa2-2b0d6e8274b2-S3864/frameworks/dummy_sleep-func-dadkins-d84e56b1a9/executors/dummy_sleep-func-dadkins-d84e56b1a9