Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names. Let's take the HouseVotes84 data set as an example:
case m: XXXModel =>
val attrs = AttributeGroup.fromStructField(
m.summary.predictions.schema(m.summary.featuresCol))
attrs.attributes.get.map(_.name.get)
The code above gets features' names from the features column. Usually, the features column is generated by RFormula. The latter has a VectorAssembler in it, which leads the output attributes not equal with the original ones.
E.g., we want to learn the HouseVotes84's features' name "V1, V2, ..., V16". But with RFormula, we can only get "V1_n, V2_y, ..., V16_y" because the transform function of VectorAssembler adds salts of the column names.
Attachments
Issue Links
- Is contained by
-
SPARK-15540 RFormula and R feature processing improvement umbrella
- Resolved
- is related to
-
SPARK-13449 Naive Bayes wrapper in SparkR
- Resolved