Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to 64 MB. Right now, splits are created based on byte ranges.
Here is an example:
|<-The start of a stripe |<-The end of a stripe v v |---------------------------------------| ^ ^ |<- The start of a split |<- The end of a split
So, for some Mappers, it is possible that there is no start of a stripe within the byte range of a split. Those Mappers will process 0 record. We can improve how splits are created for ORC.
Attachments
Issue Links
- duplicates
-
HIVE-5102 ORC getSplits should create splits based the stripes
- Resolved