Description
Improvement: computation
For categorical features with many categories, it could be more efficient to store Split.categories as a Set, not a List. (It is currently a List.) A Set might be more scalable (for log n lookups), though tests would need to be done to ensure that Sets do not incur too much more overhead than Lists.
Attachments
Issue Links
- Is contained by
-
SPARK-6113 Stabilize DecisionTree and ensembles APIs
- Resolved
- links to