Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.2
-
None
-
I used an AWS Cloudformation script from AWS's big data blog[1]. The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0.
Source Table:
CREATE EXTERNAL TABLE analyst1.lineitem_partitioned (
`l_orderkey` int,
`l_partkey` int,
`l_suppkey` int,
`l_linenumber` int,
`l_quantity` double,
`l_extendedprice` double,
`l_discount` double,
`l_tax` double,
`l_returnflag` string,
`l_linestatus` string,
`l_commitdate` string,
`l_receiptdate` string,
`l_shipinstruct` string,
`l_shipmode` string,
`l_comment` string
) PARTITIONED BY (`l_shipdate` string)
STORED AS PARQUET
LOCATION '/user/analyst1/tpch/sf100/lineitem';Destination Table:
CREATE EXTERNAL TABLE analyst1.test1(
l_commitdate string,
l_receiptdate string
) PARTITIONED BY (`l_shipdate` string)
STORED AS PARQUET
LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned';Query:
insert overwrite table analyst1.test1 PARTITION (l_shipdate)
select l_commitdate, l_receiptdate, l_shipdate
from default.lineitem_parq_partitioned
where l_shipdate = '1992-01-02';Ranger Masking Rule:
Hive Database: analyst1
Hive Table: lineitem_partitioned
Mask Condition Option: Custom: "XXXXXX" (replace the column with a static string for simplicity, but our use case uses a complex UDF).I used an AWS Cloudformation script from AWS's big data blog [1] . The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0. Source Table: CREATE EXTERNAL TABLE analyst1.lineitem_partitioned ( `l_orderkey` int, `l_partkey` int, `l_suppkey` int, `l_linenumber` int, `l_quantity` double, `l_extendedprice` double, `l_discount` double, `l_tax` double, `l_returnflag` string, `l_linestatus` string, `l_commitdate` string, `l_receiptdate` string, `l_shipinstruct` string, `l_shipmode` string, `l_comment` string ) PARTITIONED BY (`l_shipdate` string) STORED AS PARQUET LOCATION '/user/analyst1/tpch/sf100/lineitem'; Destination Table: CREATE EXTERNAL TABLE analyst1.test1( l_commitdate string, l_receiptdate string ) PARTITIONED BY (`l_shipdate` string) STORED AS PARQUET LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned'; Query: insert overwrite table analyst1.test1 PARTITION (l_shipdate) select l_commitdate, l_receiptdate, l_shipdate from default.lineitem_parq_partitioned where l_shipdate = '1992-01-02'; Ranger Masking Rule: Hive Database: analyst1 Hive Table: lineitem_partitioned Mask Condition Option: Custom: "XXXXXX" (replace the column with a static string for simplicity, but our use case uses a complex UDF). [1] https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/
Description
I have a partitioned table, which I have a Ranger masking policy on a non-partition column. When I am attempting to query the table that includes the column that has masking enabled, then partition pruning no longer occurs.
To reproduce:
Create two partitioned tables. I used TPC-H tables as they are publicly available and will provide the schemas and queries I used. Insert into the second table from the first table. For example:
insert overwrite table analyst1.test1 PARTITION (l_shipdate)
select l_commitdate, l_receiptdate, l_shipdate
from analyst1.lineitem_partitioned
where l_shipdate = '1992-01-02';
I have attached the explain plan when a masking rule on l_commitdate is enabled and when not enabled.
I have done a bit of deep dive and see that the pruning expression is not being set when the masking rule is enabled.
Attachments
Attachments
Issue Links
- is caused by
-
HIVE-17639 don't reuse planner context when re-parsing the query
- Closed