Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10910

Iceberg scans don't apply runtime filters at Parquet row group level

    XMLWordPrintableJSON

Details

    • ghx-label-4

    Description

      From a performance test on TPC-DS 3000 executed by rizaon we noticed that runtime filters are only applied at row level.

      It is known that runtime filters are not applied at file/partition level on Iceberg tables (IMPALA-10453). But they could be applied at Parquet row group level. I think achieving this is much easier than fixing IMPALA-10453.

      E.g. here is a snipped of the runtime profile of q49 of TPC-DS:

              Filter 0 (8.00 KB) [108 instances]:
                 - Files processed: 0 (0)
                 - Files rejected: 0 (0)
                 - Files total: 0 (0)
                 - InactiveTotalTime: 0.000ns
                 - RowGroups processed: 0 (0)
                 - RowGroups rejected: 0 (0)
                 - RowGroups total: 0 (0)
                 - Rows processed: 19.34M (19335783)
                 - Rows rejected: 19.32M (19323695)
                 - Rows total: 20.00M (19999711)
                 - Splits processed: 0 (0)
                 - Splits rejected: 0 (0)
                 - Splits total: 0 (0)
                 - TotalTime: 0.000ns
      

      We could save a lot of IO by applying the filters at row group level.

      Attachments

        Issue Links

          Activity

            People

              tmate Tamas Mate
              boroknagyz Zoltán Borók-Nagy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: