Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.2.0, 1.3.0, 2.0.0
-
None
-
None
-
None
-
Estimate columnar projection when generating ORC splits
Description
Currently, ORC generates splits based on stripe offset + stripe length.
This means that the splits for all columnar projections are exactly the same size, despite reading the footer which gives the estimated sizes for each column.
This is a hold-out from FileSplit which uses getLen() as the I/O cost of reading a file in a map-task.
RCFile didn't have a footer with column statistics information, but for ORC this would be extremely useful to reduce task overheads when processing extremely wide tables with highly selective column projections.
Attachments
Attachments
Issue Links
- is blocked by
-
HIVE-10497 Upgrade hive branch to latest Tez
- Resolved
- is duplicated by
-
HIVE-10397 LLAP: Implement Tez SplitSizeEstimator for Orc
- Resolved
- relates to
-
HIVE-11546 Projected columns read size should be scaled to split size for ORC Splits
- Closed
-
TEZ-1993 Implement a pluggable InputSizeEstimator for grouping fairly
- Closed
- supercedes
-
HIVE-10397 LLAP: Implement Tez SplitSizeEstimator for Orc
- Resolved