Details
Description
object SimpleSum extends Aggregator[Row, Int, Int] { def zero: Int = 0 def reduce(b: Int, a: Row) = b + a.getInt(1) def merge(b1: Int, b2: Int) = b1 + b2 def finish(b: Int) = b def bufferEncoder: Encoder[Int] = Encoders.scalaInt def outputEncoder: Encoder[Int] = Encoders.scalaInt } val df = List(("a", 1), ("a", 2), ("a", 3)).toDF("k", "v") val df1 = df.groupBy("k").agg(SimpleSum.toColumn as "v1") df1.printSchema df1.show root |-- k: string (nullable = true) |-- v1: integer (nullable = true) +---+---+ | k| v1| +---+---+ | a| 6| +---+---+
notice how v1 has nullable set to true. the default (and expected) behavior for spark sql is to give an int column false for nullable. for example if i had uses a built-in aggregator like "sum" instead if would have reported nullable = false.