[SPARK-7131] Move tree,forest implementation from spark.mllib to spark.ml - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.5.0
Component/s: ML, MLlib
Labels:
None

Target Version/s:

1.5.0

Description

We want to change and improve the spark.ml API for trees and ensembles, but we cannot change the old API in spark.mllib. To support the changes we want to make, we should move the implementation from spark.mllib to spark.ml. We will generalize and modify it, but will also ensure that we do not change the behavior of the old API.

There are several steps to this:
1. Copy the implementation over to spark.ml and change the spark.ml classes to use that implementation, rather than calling the spark.mllib implementation. The current spark.ml tests will ensure that the 2 implementations learn exactly the same models. Note: This should include performance testing to make sure the updated code does not have any regressions. --> UPDATE: I have run tests using spark-perf, and there were no regressions.
2. Remove the spark.mllib implementation, and make the spark.mllib APIs wrappers around the spark.ml implementation. The spark.ml tests will again ensure that we do not change any behavior.
3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to verify model equivalence.

This JIRA is now for step 1 only. Steps 2 and 3 will be in separate JIRAs.

After these updates, we can more safely generalize and improve the spark.ml implementation.

Attachments

Issue Links

blocks

SPARK-3727 Trees and ensembles: More prediction functionality

Resolved

SPARK-3155 Support DecisionTree pruning

Resolved

SPARK-7130 spark.ml RandomForest* should always do bootstrapping

Closed

is blocked by

SPARK-6113 Stabilize DecisionTree and ensembles APIs

Resolved

is related to

SPARK-12183 Remove spark.mllib tree, forest implementations and use spark.ml

Resolved

SPARK-12326 Move GBT implementation from spark.mllib to spark.ml

Resolved

SPARK-10232 Decide whether spark.ml Decision Tree and Random Forest can replace spark.mllib implementation

Resolved

links to

[Github] Pull Request #7294 (jkbradley)

(2 is related to, 1 links to)

Activity

People

Assignee:: Joseph K. Bradley

Reporter:: Joseph K. Bradley

Shepherd:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 24/Apr/15 19:28

Updated:: 15/Mar/18 20:28

Resolved:: 17/Jul/15 05:27

Time Tracking

Estimated:

168h

Remaining:

168h

Logged:

Not Specified