Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15055 Column pruning for nested fields in Parquet
  3. HIVE-13873

Support column pruning for struct fields in select statement

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • Logical Optimizer
    • None

    Description

      This is the grounding work for the nested column pruning in Hive, for Parquet format. In this patch, we address the case for struct type in select statements. In particular, for queries such as:

      select s.a from tbl
      

      where tbl has schema:

      s:struct<a:int, b:boolean, c:array<int>>
      

      then only the field a should have been scanned in the Parquet reader, while field b and c can be ignored.

      Future work includes support other types of statements, as well as more combinations of types (e.g., selecting fields of array type inside a struct type).

      Attachments

        1. HIVE-13873.wip.patch
          38 kB
          Ferdinand Xu
        2. HIVE-13873.patch
          53 kB
          Ferdinand Xu
        3. HIVE-13873.6.patch
          63 kB
          Ferdinand Xu
        4. HIVE-13873.5.patch
          64 kB
          Ferdinand Xu
        5. HIVE-13873.4.patch
          54 kB
          Ferdinand Xu
        6. HIVE-13873.3.patch
          52 kB
          Ferdinand Xu
        7. HIVE-13873.2.patch
          55 kB
          Ferdinand Xu
        8. HIVE-13873.1.patch
          54 kB
          Ferdinand Xu

        Activity

          People

            Ferd Ferdinand Xu
            xuefuz Xuefu Zhang
            Votes:
            2 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: