Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.4.0
-
None
Description
Now that we have StringIndexer, we could have spark.ml.classification.Classifier (the abstraction) automatically handle label indexing if the labels are not yet indexed.
This would require a bit of design:
- Should predict() output the original labels or the indices?
- How should we notify users that the labels are being automatically indexed?
- How should we provide that index to the users?
- If multiple parts of a Pipeline automatically index labels, what do we need to do to make sure they are consistent?
Attachments
Issue Links
- is blocked by
-
SPARK-6113 Stabilize DecisionTree and ensembles APIs
- Resolved
-
SPARK-6965 StringIndexer should convert input to Strings
- Resolved
- relates to
-
SPARK-11106 Should ML Models contains single models or Pipelines?
- Resolved
-
SPARK-14862 Tree and ensemble classification: do not require label metadata
- Resolved
- supercedes
-
SPARK-2206 Automatically infer the number of classification classes in multiclass classification
- Resolved