[SPARK-15204] Improve nullability inference for Aggregator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None
Environment:

spark-2.0.0-SNAPSHOT

Target Version/s:

2.1.0

Description

object SimpleSum extends Aggregator[Row, Int, Int] {
  def zero: Int = 0
  def reduce(b: Int, a: Row) = b + a.getInt(1)
  def merge(b1: Int, b2: Int) = b1 + b2
  def finish(b: Int) = b
  def bufferEncoder: Encoder[Int] = Encoders.scalaInt
  def outputEncoder: Encoder[Int] = Encoders.scalaInt
}

val df = List(("a", 1), ("a", 2), ("a", 3)).toDF("k", "v")
val df1 = df.groupBy("k").agg(SimpleSum.toColumn as "v1")
df1.printSchema
df1.show

root
 |-- k: string (nullable = true)
 |-- v1: integer (nullable = true)

+---+---+
|  k| v1|
+---+---+
|  a|  6|
+---+---+

notice how v1 has nullable set to true. the default (and expected) behavior for spark sql is to give an int column false for nullable. for example if i had uses a built-in aggregator like "sum" instead if would have reported nullable = false.

Attachments

Issue Links

links to

[Github] Pull Request #13012 (sbcd90)

[Github] Pull Request #13532 (koertkuipers)

Activity

People

Assignee:: Koert Kuipers

Reporter:: koert kuipers

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 07/May/16 16:31

Updated:: 04/Jul/16 04:19

Resolved:: 04/Jul/16 04:19