Description
Issue: The APIs for DecisionTree and ensembles (RandomForests and GradientBoostedTrees) have been experimental for a long time. The API has become very convoluted because trees and ensembles have many, many variants, some of which we have added incrementally without a long-term design.
Proposal: This JIRA is for discussing changes required to finalize the APIs. After we discuss, I will make a PR to update the APIs and make them non-Experimental. This will require making many breaking changes; see the design doc for details.
Design doc : This outlines current issues and the proposed API.
Attachments
Issue Links
- blocks
-
SPARK-3727 Trees and ensembles: More prediction functionality
- Resolved
-
SPARK-7126 For spark.ml Classifiers, automatically index labels if they are not yet indexed
- Resolved
-
SPARK-7131 Move tree,forest implementation from spark.mllib to spark.ml
- Resolved
-
SPARK-7127 Broadcast spark.ml tree ensemble models for predict
- Resolved
-
SPARK-7132 Add fit with validation set to spark.ml GBT
- Resolved
- contains
-
SPARK-3164 Store DecisionTree Split.categories as Set
- Resolved
- is blocked by
-
SPARK-5886 Add StringIndexer
- Resolved
-
SPARK-4081 Categorical feature indexing
- Resolved
- is duplicated by
-
SPARK-5399 tree Losses strings should match loss names
- Resolved
- relates to
-
SPARK-7047 Model parent should be Optional
- Resolved
- links to