[SPARK-21088] CrossValidator, TrainValidationSplit should collect all models when fitting: Python API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.4.0
Component/s: ML, PySpark
Labels:
None

Description

In pyspark:
We add a parameter whether to collect the full model list when CrossValidator/TrainValidationSplit training (Default is NOT, avoid the change cause OOM)
Add a method in CrossValidatorModel/TrainValidationSplitModel, allow user to get the model list
CrossValidatorModelWriter add a “option”, allow user to control whether to persist the model list to disk.
Note: when persisting the model list, use indices as the sub-model path

Attachments

Issue Links

blocks

SPARK-22005 CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API

Resolved

is blocked by

SPARK-21911 Parallel Model Evaluation for ML Tuning: PySpark

Resolved

links to

[Github] Pull Request #19627 (WeichenXu123)

Activity

People

Assignee:: Weichen Xu

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Jun/17 01:57

Updated:: 16/Apr/18 16:31

Resolved:: 16/Apr/18 16:31