Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.0, 0.12.0
-
None
Description
Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes.