Details
-
Brainstorming
-
Status: Resolved
-
Critical
-
Resolution: Incomplete
-
2.2.0
-
None
Description
Per the recent discussion on the dev list, this JIRA is for discussing how we can make MLlib DataFrame-based APIs more extensible, especially for the purpose of writing 3rd-party libraries with APIs extended from the MLlib APIs (for custom Transformers, Estimators, etc.).
- For people who have written such libraries, what issues have you run into?
- What APIs are not public or extensible enough? Do they require changes before being made more public?
- Are APIs for non-Scala languages such as Java and Python friendly or extensive enough?
The easy answer is to make everything public, but that would be terrible of course in the long-term. Let's discuss what is needed and how we can present stable, sufficient, and easy-to-use APIs for 3rd-party developers.
Attachments
Issue Links
- is duplicated by
-
SPARK-19717 Expanding Spark ML under Different Namespace
- Resolved
- is related to
-
SPARK-17048 ML model read for custom transformers in a pipeline does not work
- Resolved
-
SPARK-8515 Improve ML attribute API
- Resolved
-
SPARK-5874 How to improve the current ML pipeline API?
- Resolved
-
SPARK-7146 Should ML sharedParams be a public API?
- Resolved
-
SPARK-10817 ML abstraction umbrella
- Resolved
-
SPARK-8984 Developer documentation for ML Pipelines
- Resolved