[SPARK-19095] virtualenv example does not work in yarn cluster mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Critical
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Spark version: 2
Steps:

install virtualenv on all nodes
create requirement1.txt with "numpy > requirement1.txt "

Run kmeans.py application in yarn-cluster mode.

spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3

The application fails to find numpy.

LogType:stdout
Log Upload Time:Thu Jan 05 20:05:49 +0000 2017
LogLength:134
Log Contents:
Traceback (most recent call last):
  File "kmeans.py", line 27, in <module>
    import numpy as np
ImportError: No module named numpy

End of LogType:stdout

Attachments

Issue Links

duplicates

SPARK-13587 Support virtualenv in PySpark

In Progress

is duplicated by

SPARK-19096 Kmeans.py application fails with virtualenv and due to parse error

Resolved

SPARK-19097 virtualenv example failed with conda due to ImportError: No module named ruamel.yaml.comments

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Yesha Vora

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Jan/17 01:50

Updated:: 06/Jan/17 09:42

Resolved:: 06/Jan/17 09:17