[SPARK-28295] Is there a way of getting feature names from pyspark.ml.regression GeneralizedLinearRegression? - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Request
Status: Resolved
Priority: Minor
Resolution: Invalid
Affects Version/s: 2.3.1
Fix Version/s: 2.3.1
Component/s: Build
Labels:
- features

Target Version/s:

2.3.1

Description

Using pyspark.ml.regression,

when I fit a GeneralizedLinearRegression like this:
glr = GeneralizedLinearRegression(family="gaussian", link="identity",
regParam=0.3, maxIter=10)
model = glr.fit(someData)

It seems like there is no way to get the matching of the features and their coefficients or standard errors. I am using an ugly work around like this right now:

field = model.summary._call_java('getClass').getDeclaredField("coefficientsWithStatistics")
object2 = model._call_java('summary')
field.setAccessible(True)
value = field.get(object2)

coef_value = {}

for i in range(0, len(value)):
row = value[i].toString()
values = row.split(',')
coef_value[values[0].replace('(', '').replace(')', '')] = float(values[1])

Am I missing something?
If not, I'd like to request a method similar to model.coefficients with which one can just get the feature names in the right order, like model.features or something like that.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nils Skotara

Votes:: 2 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Jul/19 08:44

Updated:: 12/Dec/22 18:10

Resolved:: 09/Jul/19 01:14