Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
-
None
-
None
Description
Currently, it's not easy for user to add third party python packages in pyspark.
- One way is to using --py-files (suitable for simple dependency, but not suitable for complicated dependency, especially with transitive dependency)
- Another way is install packages manually on each node (time wasting, and not easy to switch to different environment)
Python has now 2 different virtualenv implementation. One is native virtualenv another is through conda. This jira is trying to migrate these 2 tools to distributed environment
Attachments
Issue Links
- is depended upon by
-
ZEPPELIN-2233 Virtualenv support for PySparkInterpreter
- Open
- is duplicated by
-
SPARK-19095 virtualenv example does not work in yarn cluster mode
- Resolved
-
SPARK-16367 Wheelhouse Support for PySpark
- Resolved
- is related to
-
SPARK-16367 Wheelhouse Support for PySpark
- Resolved
- is superceded by
-
SPARK-20001 Support PythonRunner executing inside a Conda env
- Resolved
- relates to
-
SPARK-17428 SparkR executors/workers support virtualenv
- Resolved
-
SPARK-6764 Add wheel package support for PySpark
- Resolved
- links to
1.
|
virtualenv example does not work in yarn cluster mode | Resolved | Unassigned | |
2.
|
Kmeans.py application fails with virtualenv and due to parse error | Resolved | Unassigned | |
3.
|
virtualenv example failed with conda due to ImportError: No module named ruamel.yaml.comments | Resolved | Unassigned |