[SPARK-7412] Designing distributed prediction model abstractions for spark.ml - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

The Pipelines API (spark.ml package) now includes abstractions for single-label prediction: Predictor, Classifier, Regressor. These assume models are local, where single-Row prediction methods can be used as UDFs. We need to think about how to support distributed models in these abstractions.

Should the abstractions be modified somehow? Or should there be parallel (or inheriting) abstractions, or a mix-in?

Motivation: We may start supporting distributed models since linear models, random forests, and other models can get large enough to merit distributed storage and computation.

Attachments

Issue Links

Is contained by

SPARK-10817 ML abstraction umbrella

Resolved

is related to

SPARK-6233 Should spark.ml Models be distributed by default?

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/May/15 23:33

Updated:: 21/May/19 04:33

Resolved:: 21/May/19 04:33