[SPARK-24322] Upgrade Apache ORC to 1.4.4 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.3.1, 2.4.0
Component/s: Build
Labels:
- correctness

Description

ORC 1.4.4 includes nine fixes. One of the issues is about `Timestamp` bug (~~ORC-306~~) which occurs when `native` ORC vectorized reader reads ORC column vector's sub-vector `times` and `nanos`. ~~ORC-306~~ fixes this according to the original definition and the linked PR includes the updated interpretation on ORC column vectors. Note that `hive` ORC reader and ORC MR reader is not affected.

scala> spark.version
res0: String = 2.3.0
scala> spark.sql("set spark.sql.orc.impl=native")
scala> Seq(java.sql.Timestamp.valueOf("1900-05-05 12:34:56.000789")).toDF().write.orc("/tmp/orc")
scala> spark.read.orc("/tmp/orc").show(false)
+--------------------------+
|value                     |
+--------------------------+
|1900-05-05 12:34:55.000789|
+--------------------------+

This issue aims to update Apache Spark to use it.

FULL LIST

ID	TITLE
~~ORC-281~~	Fix compiler warnings from clang 5.0
~~ORC-301~~	`extractFileTail` should open a file in `try` statement
~~ORC-304~~	Fix TestRecordReaderImpl to not fail with new storage-api
~~ORC-306~~	Fix incorrect workaround for bug in java.sql.Timestamp
~~ORC-324~~	Add support for ARM and PPC arch
~~ORC-330~~	Remove unnecessary Hive artifacts from root pom
~~ORC-332~~	Add syntax version to orc_proto.proto
~~ORC-336~~	Remove avro and parquet dependency management entries
~~ORC-360~~	Implement error checking on subtype fields in Java

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

links to

[Github] Pull Request #21372 (dongjoon-hyun)

Activity

People

Assignee:: Dongjoon Hyun

Reporter:: Dongjoon Hyun

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/May/18 18:20

Updated:: 24/May/18 03:38

Resolved:: 24/May/18 03:38