Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.0
Description
ORC 1.4.4 includes nine fixes. One of the issues is about `Timestamp` bug (ORC-306) which occurs when `native` ORC vectorized reader reads ORC column vector's sub-vector `times` and `nanos`. ORC-306 fixes this according to the original definition and the linked PR includes the updated interpretation on ORC column vectors. Note that `hive` ORC reader and ORC MR reader is not affected.
scala> spark.version res0: String = 2.3.0 scala> spark.sql("set spark.sql.orc.impl=native") scala> Seq(java.sql.Timestamp.valueOf("1900-05-05 12:34:56.000789")).toDF().write.orc("/tmp/orc") scala> spark.read.orc("/tmp/orc").show(false) +--------------------------+ |value | +--------------------------+ |1900-05-05 12:34:55.000789| +--------------------------+
This issue aims to update Apache Spark to use it.
FULL LIST
ID | TITLE |
---|---|
|
Fix compiler warnings from clang 5.0 |
|
`extractFileTail` should open a file in `try` statement |
|
Fix TestRecordReaderImpl to not fail with new storage-api |
|
Fix incorrect workaround for bug in java.sql.Timestamp |
|
Add support for ARM and PPC arch |
|
Remove unnecessary Hive artifacts from root pom |
|
Add syntax version to orc_proto.proto |
|
Remove avro and parquet dependency management entries |
|
Implement error checking on subtype fields in Java |
Attachments
Issue Links
- blocks
-
SPARK-20901 Feature parity for ORC with Parquet
- Open
- links to