Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.1, 3.2.0
-
Ubuntu 20, OSX 11.6
OpenJDK 11, Spark 3.2
Description
Sequence function with dates and step interval in months producing unexpected results.
Here is a sample using Spark 3.2 (though the behavior is the same in 3.1.1 and presumably earlier):
scala> spark.sql("select sequence(date '2021-01-01', date '2022-01-01', interval '3' month) x, date '2021-01-01' + interval '3' month y").collect()
res1: Array[org.apache.spark.sql.Row] = Array([WrappedArray(2021-01-01, 2021-03-31, 2021-06-30, 2021-09-30, 2022-01-01),2021-04-01])
Expected result of adding 3 months to the 2021-01-01 is 2021-04-01, while sequence returns 2021-03-31.
At the same time sequence over timestamps works as expected:
scala> spark.sql("select sequence(timestamp '2021-01-01 00:00', timestamp '2022-01-01 00:00', interval '3' month) x").collect()
res2: Array[org.apache.spark.sql.Row] = Array([WrappedArray(2021-01-01 00:00:00.0, *2021-04-01* 00:00:00.0, *2021-07-01* 00:00:00.0, *2021-10-01* 00:00:00.0, 2022-01-01 00:00:00.0)])
A similar issue was reported in the past - SPARK-31654 sequence producing inconsistent intervals for month step - ASF JIRA (apache.org)
It's marked resolved, but the problem is either resurfaced or was never actually fixed.