Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.1, 2.3.0
-
None
-
OS: Ubuntu 16.0.4
Spark: 2.3.0
Mesos: 1.5.0
Description
It seems to be a bug related to spark's MesosClusterDispatcher. In order to reproduce the bug, you need to have mesos and mesos dispatcher running.
I'm currently running mesos 1.5 and spark 2.3.0 (tried with 2.2.1 as well).
If you launch the following program:
spark-submit --master mesos://127.0.1.1:7077 --deploy-mode cluster --class org.apache.spark.examples.SparkPi --name "my favorite task (myId = 123-456)" /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar 100
, then the task fails with the following output :
I0409 11:00:35.360352 22726 fetcher.cpp:551] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110035-0004\/runs\/8ac20902-74e1-45c4-9ab6-c52a79940189","user":"tiboun"} I0409 11:00:35.363119 22726 fetcher.cpp:450] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' I0409 11:00:35.363143 22726 fetcher.cpp:291] Fetching directly into the sandbox directory I0409 11:00:35.363168 22726 fetcher.cpp:225] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' W0409 11:00:35.366839 22726 fetcher.cpp:330] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar I0409 11:00:35.366873 22726 fetcher.cpp:603] Fetched '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' to '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189/spark-examples_2.11-2.3.0.jar' I0409 11:00:35.366878 22726 fetcher.cpp:608] Successfully fetched all URIs into '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189' I0409 11:00:35.438725 22733 exec.cpp:162] Version: 1.5.0 I0409 11:00:35.440770 22734 exec.cpp:236] Executor registered on agent 0262246c-14a3-4408-9b74-5e3b65dc1344-S0 I0409 11:00:35.441388 22733 executor.cpp:171] Received SUBSCRIBED event I0409 11:00:35.441586 22733 executor.cpp:175] Subscribed executor on tiboun-Dell-Precision-M3800 I0409 11:00:35.441643 22733 executor.cpp:171] Received LAUNCH event I0409 11:00:35.441767 22733 executor.cpp:638] Starting task driver-20180409110035-0004 I0409 11:00:35.445050 22733 executor.cpp:478] Running '/usr/libexec/mesos/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>' I0409 11:00:35.445770 22733 executor.cpp:651] Forked command at 22743 sh: 1: Syntax error: "(" unexpected I0409 11:00:35.538661 22736 executor.cpp:938] Command exited with status 2 (pid: 22743) I0409 11:00:36.541016 22739 process.cpp:887] Failed to accept socket: future discarded
If you remove the parentheses, you get the following result:
I0409 11:03:02.023701 23085 fetcher.cpp:551] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110301-0006\/runs\/f887c0ab-b48f-4382-850c-383c1c944269","user":"tiboun"} I0409 11:03:02.028268 23085 fetcher.cpp:450] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' I0409 11:03:02.028302 23085 fetcher.cpp:291] Fetching directly into the sandbox directory I0409 11:03:02.028336 23085 fetcher.cpp:225] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' W0409 11:03:02.031209 23085 fetcher.cpp:330] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar I0409 11:03:02.031250 23085 fetcher.cpp:603] Fetched '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' to '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/spark-examples_2.11-2.3.0.jar' I0409 11:03:02.031258 23085 fetcher.cpp:608] Successfully fetched all URIs into '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269' I0409 11:03:02.090797 23095 exec.cpp:162] Version: 1.5.0 I0409 11:03:02.095283 23092 exec.cpp:236] Executor registered on agent 0262246c-14a3-4408-9b74-5e3b65dc1344-S0 I0409 11:03:02.096693 23095 executor.cpp:171] Received SUBSCRIBED event I0409 11:03:02.097040 23095 executor.cpp:175] Subscribed executor on tiboun-Dell-Precision-M3800 I0409 11:03:02.097141 23095 executor.cpp:171] Received LAUNCH event I0409 11:03:02.097357 23095 executor.cpp:638] Starting task driver-20180409110301-0006 I0409 11:03:02.101521 23095 executor.cpp:478] Running '/usr/libexec/mesos/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>' I0409 11:03:02.102332 23095 executor.cpp:651] Forked command at 23100 Error: Cannot load main class from JAR file:/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/favorite Run with --help for usage help or --verbose for debug output I0409 11:03:02.792325 23090 executor.cpp:938] Command exited with status 1 (pid: 23100) I0409 11:03:03.794505 23098 process.cpp:887] Failed to accept socket: future discarded
Interesting things is that mesos try to find main class on a file called "favorite" which is part of the task name.
I've tried to launch spark-shell with the same name and it works fine. Task name's get driver's name and add a sequence after it.
Attachments
Issue Links
- is duplicated by
-
SPARK-23464 MesosClusterScheduler double-escapes parameters to bash command
- Resolved
-
SPARK-24380 argument quoting/escaping broken in mesos cluster scheduler
- Closed
- links to