Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.6.0
-
None
Description
The calculation for end of a split using the file metadata is broken by PARQUET-108. The calculation was updated to use the requested schema so that the end of a block would be the end of the last projected column. But the end logic actually calculates the total number of bytes that are selected.
The end of a split is only used to select row groups when a block has no row group offsets, which doesn't happen when the constructor that uses the broken method is called. However, this should still be removed.
After 1.6.0, I want to move Hive to pass FileSplits directly rather than wrapping them in ParquetInputSplit. The internal reader code can handle mapping row groups to splits because it needs to for PARQUET-84.