[SPARK-15746] SchemaUtils.checkColumnType with VectorUDT prints instance details in error message - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Currently, many feature transformers in ml use SchemaUtils.checkColumnType(schema, ..., new VectorUDT) to check the column type is a (ml.linalg) vector.

The resulting error message contains "instance" info for the VectorUDT, i.e. something like this:

java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually StringType.

A solution would either be to amend SchemaUtils.checkColumnType to print the error message using getClass.getName, or to create a private[spark] case object VectorUDT extends VectorUDT for convenience, since it is used so often (and incidentally this would make it easier to put VectorUDT into lists of data types e.g. schema validation, UDAFs etc).

Attachments

Issue Links

is superceded by

SPARK-16075 Make VectorUDT/MatrixUDT singleton under spark.ml package

Resolved

relates to

SPARK-15668 ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Nicholas Pentreath

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Jun/16 01:26

Updated:: 21/May/19 04:33

Resolved:: 21/May/19 04:33