[ORC-435] Ability to read stripes that are greater than 2GB - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.4, 1.4.4, 1.5.3, 1.6.0
Fix Version/s: 1.5.4, 1.6.0
Component/s: Reader
Labels:
None

Description

ORC reader fails with NegativeArraySizeException if the stripe size is >2GB. Even though default stripe size is 64MB there are cases where stripe size will reach >2GB even before memory manager can kick in to check memory size. Say if we are inserting 500KB strings (mostly unique) by the time we reach 5000 rows stripe size is already over 2GB. Reader will have to chunk the disk range reads for such cases instead of reading the stripe as whole blob.

Exception thrown when reading such files

2018-10-12 21:43:58,833 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NegativeArraySizeException
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:272)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:1007)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:835)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1029)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1062)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1085)

Attachments

Issue Links

causes

ORC-793 Deprecate the incorrect getInt method and add a new setInt within OrcConf

Closed

is blocked by

ORC-438 NPE in StringStatisticsImpl.merge()

Closed

links to

GitHub Pull Request #338

Activity

People

Assignee:: Prasanth Jayachandran

Reporter:: Prasanth Jayachandran

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 09/Nov/18 21:35

Updated:: 05/May/21 16:44

Resolved:: 27/Nov/18 23:14