Details
Description
Reading a json with dates and timestamps is limited to predetermined string formats or long values.
1) Should be able to set an option on json datasource to parse dates and timestamps using custom string format.
2) Should be able to change the interpretation of long values since epoch. It could support different precisions like days, seconds, milliseconds, microseconds and nanoseconds.
Something in the lines of :
object Precision extends Enumeration { val days, seconds, milliseconds, microseconds, nanoseconds = Value } def convertWithPrecision(time: Long, from: Precision.Value, to: Precision.Value): Long = ... ... val dateFormat = parameters.getOrElse("dateFormat", "").trim val timestampFormat = parameters.getOrElse("timestampFormat", "").trim val longDatePrecision = getOrElse("longDatePrecision", "days") val longTimestampPrecision = getOrElse("longTimestampPrecision", "milliseconds")
and
case (VALUE_STRING, DateType) => val stringValue = parser.getText val days = if (configOptions.dateFormat.nonEmpty) { // User defined format, make sure it complies to the SQL DATE format (number of days) val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not thread safe. DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, Precision.milliseconds, Precision.days) } else if (stringValue.forall(_.isDigit)) { DateTimeUtils.convertWithPrecision(stringValue.toLong, configOptions.longDatePrecision, Precision.days) } else { // The format of this string will probably be "yyyy-mm-dd". DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, Precision.milliseconds, Precision.days) } days.toInt case (VALUE_NUMBER_INT, DateType) => DateTimeUtils.convertWithPrecision((parser.getLongValue, configOptions.longDatePrecision, Precision.days).toInt
With similar handling for Timestamps.
Attachments
Issue Links
- is related to
-
SPARK-17914 Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
- Resolved