Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Spark version: 2
Steps:
- install virtualenv on all nodes
- create requirement1.txt with "numpy > requirement1.txt "
- Run kmeans.py application in yarn-cluster mode.
spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3
The application fails to find numpy.
LogType:stdout Log Upload Time:Thu Jan 05 20:05:49 +0000 2017 LogLength:134 Log Contents: Traceback (most recent call last): File "kmeans.py", line 27, in <module> import numpy as np ImportError: No module named numpy End of LogType:stdout
Attachments
Issue Links
- duplicates
-
SPARK-13587 Support virtualenv in PySpark
- In Progress
- is duplicated by
-
SPARK-19096 Kmeans.py application fails with virtualenv and due to parse error
- Resolved
-
SPARK-19097 virtualenv example failed with conda due to ImportError: No module named ruamel.yaml.comments
- Resolved