Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-207

ParquetInputSplit end calculation bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.0
    • 1.6.0
    • parquet-mr
    • None

    Description

      The calculation for end of a split using the file metadata is broken by PARQUET-108. The calculation was updated to use the requested schema so that the end of a block would be the end of the last projected column. But the end logic actually calculates the total number of bytes that are selected.

      The end of a split is only used to select row groups when a block has no row group offsets, which doesn't happen when the constructor that uses the broken method is called. However, this should still be removed.

      After 1.6.0, I want to move Hive to pass FileSplits directly rather than wrapping them in ParquetInputSplit. The internal reader code can handle mapping row groups to splits because it needs to for PARQUET-84.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rdblue Ryan Blue
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: