Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Given the following plan for an EXPORT TABLE query:
+----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-2 is a root stage | | Stage-3 depends on stages: Stage-2 | | Stage-1 depends on stages: Stage-3 | | Stage-6 depends on stages: Stage-1 | | Stage-5 depends on stages: Stage-6 | | Stage-7 depends on stages: Stage-5 | | | | STAGE PLANS: | | Stage: Stage-2 | | Tez | | DagId: hive_20230124103803_b865e63c-4565-4b95-8664-8c983938697a:7 | | DagName: hive_20230124103803_b865e63c-4565-4b95-8664-8c983938697a:7 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: distcptest | | Statistics: Num rows: 2 Data size: 182658 Basic stats: COMPLETE Column stats: COMPLETE | | Select Operator | | expressions: id (type: int), txt (type: string) | | outputColumnNames: _col0, _col1 | | Statistics: Num rows: 2 Data size: 182658 Basic stats: COMPLETE Column stats: COMPLETE | | File Output Operator | | compressed: false | | Statistics: Num rows: 2 Data size: 182658 Basic stats: COMPLETE Column stats: COMPLETE | | table: | | input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | | output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | | serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde | | name: bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | | Execution mode: vectorized | | | | Stage: Stage-3 | | Dependency Collection | | | | Stage: Stage-1 | | Move Operator | | tables: | | replace: false | | table: | | input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | | output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | | serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde | | name: bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | | | | Stage: Stage-6 | | Set Properties | | table name: bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | | properties: | | transactional true | | | | Stage: Stage-5 | | Export Work | | | | Stage: Stage-7 | | Drop Table | | table: bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | | | +----------------------------------------------------+
here, Stage-5 Export Work starts MR distcp jobs
the problem and solution are described in code comment, pasted below:
/** * This method ensures if there is an explicit tez.queue.name set, the hadoop shim will submit jobs * to the same yarn queue. This solves a security issue where e.g settings have the following values: * tez.queue.name=sample * hive.server2.tez.queue.access.check=true * In this case, when a query submits Tez DAGs, the tez client layer checks whether the end user has access to * the yarn queue 'sample' via YarnQueueHelper, but this is not respected in case of MR jobs that run * even if the query execution engine is Tez. E.g. an EXPORT TABLE can submit DistCp MR jobs at some stages when * certain criteria are met. We tend to restrict the setting of mapreduce.job.queuename in order to bypass this * security flaw, and even the default queue is unexpected if we explicitly set tez.queue.name. * Under the hood the desired behavior is to have DistCp jobs in the same yarn queue as other parts * of the query. Most of the time, the user isn't aware that a query involves DistCp jobs, hence isn't aware * of these details. */
after the fix, the MR job went into the same yarn queue: https://issues.apache.org/jira/secure/attachment/13055215/image%20%281%29.png
Attachments
Attachments
Issue Links
- links to