Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15746

SchemaUtils.checkColumnType with VectorUDT prints instance details in error message

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • ML

    Description

      Currently, many feature transformers in ml use SchemaUtils.checkColumnType(schema, ..., new VectorUDT) to check the column type is a (ml.linalg) vector.

      The resulting error message contains "instance" info for the VectorUDT, i.e. something like this:

      java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually StringType.
      

      A solution would either be to amend SchemaUtils.checkColumnType to print the error message using getClass.getName, or to create a private[spark] case object VectorUDT extends VectorUDT for convenience, since it is used so often (and incidentally this would make it easier to put VectorUDT into lists of data types e.g. schema validation, UDAFs etc).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mlnick Nicholas Pentreath
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: