Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
Currently, many feature transformers in ml use SchemaUtils.checkColumnType(schema, ..., new VectorUDT) to check the column type is a (ml.linalg) vector.
The resulting error message contains "instance" info for the VectorUDT, i.e. something like this:
java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually StringType.
A solution would either be to amend SchemaUtils.checkColumnType to print the error message using getClass.getName, or to create a private[spark] case object VectorUDT extends VectorUDT for convenience, since it is used so often (and incidentally this would make it easier to put VectorUDT into lists of data types e.g. schema validation, UDAFs etc).
Attachments
Issue Links
- is superceded by
-
SPARK-16075 Make VectorUDT/MatrixUDT singleton under spark.ml package
- Resolved
- relates to
-
SPARK-15668 ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type
- Resolved