Details
Description
There is an issue with DateTimeUtils.daysToMillis implementation. It affects DateTimeUtils.toJavaDate and ultimately CatalystTypeConverter, i.e the conversion of date stored as Int days from epoch in InternalRow to java.sql.Date of Row returned to user.
The issue can be reproduced with this test (all the following tests are in my defalut timezone Europe/Moscow):
$ sbt -Duser.timezone=Europe/Moscow catalyst/console scala> java.util.Calendar.getInstance().getTimeZone res0: java.util.TimeZone = sun.util.calendar.ZoneInfo[id="Europe/Moscow",offset=10800000,dstSavings=0,useDaylight=false,transitions=79,lastRule=null] scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._ import org.apache.spark.sql.catalyst.util.DateTimeUtils._ scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, 14332, 14696, 15060)
For example, for 4108 day of epoch, the correct date should be 1981-04-01
scala> DateTimeUtils.toJavaDate(4107) res25: java.sql.Date = 1981-03-31 scala> DateTimeUtils.toJavaDate(4108) res26: java.sql.Date = 1981-03-31 scala> DateTimeUtils.toJavaDate(4109) res27: java.sql.Date = 1981-04-02
There was previous unsuccessful attempt to work around the problem in SPARK-11415. It seems that issue involves flaws in java date implementation and I don't see how it can be fixed without third-party libraries.
I was not able to identify the library of choice for Spark. The following implementation uses JSR-310
def millisToDays(millisUtc: Long): SQLDate = { val instant = Instant.ofEpochMilli(millisUtc) val zonedDateTime = instant.atZone(ZoneId.systemDefault) zonedDateTime.toLocalDate.toEpochDay.toInt } def daysToMillis(days: SQLDate): Long = { val localDate = LocalDate.ofEpochDay(days) val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault) zonedDateTime.toInstant.toEpochMilli }
that produces correct results:
scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days res37: scala.collection.immutable.IndexedSeq[Int] = Vector() scala> new java.sql.Date(daysToMillis(4108)) res36: java.sql.Date = 1981-04-01
Attachments
Issue Links
- is related to
-
SPARK-16788 Investigate JSR-310 & scala-time alternatives to our own datetime utils
- Resolved
- links to