Commit fb6d96e001c1a04475a8fd01f757dd0605cf3279 in impala's branch refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fb6d96e ]
IMPALA-9741: Support querying Iceberg table by impala
This patch mainly realizes the querying of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't specify this property in your SQL, default file format
is PARQUET.
We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When querying iceberg table, we pushdown
partition column predicates to iceberg to decide which data files
need to be scanned, and then transfer this information to BE to
do the real scan operation.
Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in test_scanners.py
Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Reviewed-on: http://gerrit.cloudera.org:8080/16143
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Commit efc627d050caeb9947af2dfd3fc8a02236c44d0e in impala's branch refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=efc627d ]
IMPALA-10158: Set timezone to UTC for Iceberg-related E2E testsWe found that the tests of test_iceberg_query and test_iceberg_profile
fail after the patch for
IMPALA-9741has been merged and that it is dueto the default timezone of Impala not being UTC. This patch fixes the
issue by adding "SET TIMEZONE=UTC;" before those test queries are run.
Testing:
test_iceberg_query and test_iceberg_profile could pass after applying
this patch.
Change-Id: Ie985519e8ded04f90465e141488bd2dda78af6c3
Reviewed-on: http://gerrit.cloudera.org:8080/16425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>