[SPARK-19498] Discussion: Making MLlib APIs extensible for 3rd party libraries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Resolved
Priority: Critical
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Per the recent discussion on the dev list, this JIRA is for discussing how we can make MLlib DataFrame-based APIs more extensible, especially for the purpose of writing 3rd-party libraries with APIs extended from the MLlib APIs (for custom Transformers, Estimators, etc.).

For people who have written such libraries, what issues have you run into?
What APIs are not public or extensible enough? Do they require changes before being made more public?
Are APIs for non-Scala languages such as Java and Python friendly or extensive enough?

The easy answer is to make everything public, but that would be terrible of course in the long-term. Let's discuss what is needed and how we can present stable, sufficient, and easy-to-use APIs for 3rd-party developers.

Attachments

Issue Links

is duplicated by

SPARK-19717 Expanding Spark ML under Different Namespace

Resolved

is related to

SPARK-17048 ML model read for custom transformers in a pipeline does not work

Resolved

SPARK-8515 Improve ML attribute API

Resolved

SPARK-5874 How to improve the current ML pipeline API?

Resolved

SPARK-7146 Should ML sharedParams be a public API?

Resolved

SPARK-10817 ML abstraction umbrella

Resolved

SPARK-8984 Developer documentation for ML Pipelines

Resolved

(2 is related to)

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 2 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 07/Feb/17 17:09

Updated:: 08/Oct/19 05:44

Resolved:: 08/Oct/19 05:44