[SPARK-18060] Avoid unnecessary standardization in multinomial logistic regression training - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: ML
Labels:
None

Target Version/s:

2.1.0

Description

The MLOR implementation in spark.ml trains the model in the standardized feature space by dividing the feature values by the column standard deviation in each iteration. We perform this computation many time more than is necessary in order to achieve sequential memory access pattern when computing the gradients. We can have both - sequential access patterns and reduced computation - if we use a column major layout for the coefficients.

Attachments

Issue Links

is related to

SPARK-18456 Use matrix abstraction for LogisticRegression coefficients during training

Resolved

links to

[Github] Pull Request #15593 (sethah)

Activity

People

Assignee:: Seth Hendrickson

Reporter:: Seth Hendrickson

Shepherd:: DB Tsai

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Oct/16 22:59

Updated:: 15/Nov/16 23:04

Resolved:: 12/Nov/16 01:43