Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24599

SPARK_MOUNTED_CLASSPATH contains incorrect semicolon on Windows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.3.0, 2.3.1
    • None
    • Kubernetes, Spark Core, Windows
    • None

    Description

      When running spark-submit in cluster mode on kubernetes on a windows machine, the environment variable SPARK_MOUNTED_CLASSPATH does incorrectly contain a semicolon:

      $ echo $SPARK_MOUNTED_CLASSPATH
      
      /opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar
      

      When running spark-submit, the driver aborts:

       ./bin/spark-submit.cmd --master k8s://https://localhost:6445 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.insta
       nces=1 --conf spark.kubernetes.container.image=spark:k8s-spark1 local:///opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar
      kubectl logs spark-pi-b12d0501f2fc309d89e8634937b7f52c-driver
      ++ id -u
      + myuid=0
      ++ id -g
      + mygid=0
      ++ getent passwd 0
      + uidentry=root:x:0:0:root:/root:/bin/ash
      + '[' -z root:x:0:0:root:/root:/bin/ash ']'
      + SPARK_K8S_CMD=driver
      + '[' -z driver ']'
      + shift 1
      + SPARK_CLASSPATH=':/opt/spark/jars/*'
      + env
      + grep SPARK_JAVA_OPT_
      + sed 's/[^=]*=\(.*\)/\1/g'
      + sort -t_ -k4 -n
      + readarray -t SPARK_JAVA_OPTS
      + '[' -n '/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar' ']'
      + SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar'
      + '[' -n '' ']'
      + case "$SPARK_K8S_CMD" in
      + CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
      + exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -Dspark.kubernetes.driver.pod.name=spark-pi-b12d0501f2fc309d89e8634937b7f52c-driver -Dspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar,/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar -Dspark.app.name=spark-pi -Dspark.submit.deployMode=cluster -Dspark.driver.blockManager.port=7079 -Dspark.kubernetes.executor.podNamePrefix=spark-pi-b12d0501f2fc309d89e8634937b7f52c -Dspark.executor.instances=1 -Dspark.app.id=spark-65f2c8cc3ccf462694a67c18e947158c -Dspark.driver.port=7078 -Dspark.master=k8s://https://localhost:6445 -Dspark.kubernetes.container.image=spark:k8s-spark1 -Dspark.driver.host=spark-pi-b12d0501f2fc309d89e8634937b7f52c-driver-svc.default.svc -cp ':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.1.jar' -Xms1g -Xmx1g -Dspark.driver.bindAddress=10.1.0.150 org.apache.spark.examples.SparkPi
      Error: Could not find or load main class org.apache.spark.examples.SparkPi

      (Note the semicolon in the last part of the line SPARK_CLASSPATH=...)

      You can overwrite SPARK_MOUNTED_CLASSPATH in $SPARK_HOME/kubernetes/dockerfiles/spark/entrypoint.sh removing the part with the semicolon, and then rebuild the docker image with $SPARK_HOME/bin/docker-image-tool.sh. After that, spark-submit does succeed.

      See also SO: https://stackoverflow.com/questions/49728170/spark-submit-from-windows-vs-linux

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tobias-hd Tobias Munk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: