Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15204

Improve nullability inference for Aggregator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.1.0
    • SQL
    • None
    • spark-2.0.0-SNAPSHOT

    Description

      object SimpleSum extends Aggregator[Row, Int, Int] {
        def zero: Int = 0
        def reduce(b: Int, a: Row) = b + a.getInt(1)
        def merge(b1: Int, b2: Int) = b1 + b2
        def finish(b: Int) = b
        def bufferEncoder: Encoder[Int] = Encoders.scalaInt
        def outputEncoder: Encoder[Int] = Encoders.scalaInt
      }
      
      val df = List(("a", 1), ("a", 2), ("a", 3)).toDF("k", "v")
      val df1 = df.groupBy("k").agg(SimpleSum.toColumn as "v1")
      df1.printSchema
      df1.show
      
      root
       |-- k: string (nullable = true)
       |-- v1: integer (nullable = true)
      
      +---+---+
      |  k| v1|
      +---+---+
      |  a|  6|
      +---+---+
      

      notice how v1 has nullable set to true. the default (and expected) behavior for spark sql is to give an int column false for nullable. for example if i had uses a built-in aggregator like "sum" instead if would have reported nullable = false.

      Attachments

        Activity

          People

            koertkuipers Koert Kuipers
            koert koert kuipers
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: