Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13587 Support virtualenv in PySpark
  3. SPARK-19095

virtualenv example does not work in yarn cluster mode

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Spark version: 2
      Steps:

      • install virtualenv on all nodes
      • create requirement1.txt with "numpy > requirement1.txt "
      • Run kmeans.py application in yarn-cluster mode.
        spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3

        The application fails to find numpy.

        LogType:stdout
        Log Upload Time:Thu Jan 05 20:05:49 +0000 2017
        LogLength:134
        Log Contents:
        Traceback (most recent call last):
          File "kmeans.py", line 27, in <module>
            import numpy as np
        ImportError: No module named numpy
        
        End of LogType:stdout
        

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yeshavora Yesha Vora
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: