Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This is the grounding work for the nested column pruning in Hive, for Parquet format. In this patch, we address the case for struct type in select statements. In particular, for queries such as:
select s.a from tbl
where tbl has schema:
s:struct<a:int, b:boolean, c:array<int>>
then only the field a should have been scanned in the Parquet reader, while field b and c can be ignored.
Future work includes support other types of statements, as well as more combinations of types (e.g., selecting fields of array type inside a struct type).
Attachments
Attachments
Issue Links
- links to