Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
The correlated query below will not produce any result no matter the content of the table.
create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES ('transactional'='true'); create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES ('transactional'='true'); EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id = t2.id);
The CBO is able to derive that part of the query is empty and ends up with the following plan.
CBO PLAN: HiveProject(id=[$0]) LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) HiveValues(tuples=[[]])
The presence of LogicalCorrelate is first redundant but also problematic since many parts of the optimizer assume that queries are decorrelated and do not know how to handle the LogicalCorrelate.
In the presence of views the same query can lead to the following exception during compilation.
CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2; EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id = t2.id);
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=HIVE, sort=[], dist=any. All the inputs have relevant nodes, however the cost is still infinite. Root: rel#185:RelSubset#3.HIVE.[].any Original rel: HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 0.0 io}, id = 178 LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}]): rowcount = 4.0, cumulative cost = {16.0 rows, 9.0 cpu, 0.0 io}, id = 176 HiveTableScan(table=[[default, t1]], table:alias=[t1]): rowcount = 4.0, cumulative cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 111 HiveValues(tuples=[[]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 139 Sets: Set#0, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN ROW__IS__DELETED) rel#180:RelSubset#0.HIVE.[].any, best=rel#111 rel#111:HiveTableScan.HIVE.[].any(table=[default, t1],htColumns=[0, 1, 2, 3, 4, 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null), rowcount=4.0, cumulative cost={4.0 rows, 5.0 cpu, 0.0 io} Set#1, type: RecordType(NULL _o__c0) rel#181:RelSubset#1.HIVE.[].any, best=rel#139 rel#139:HiveValues.HIVE.[].any(type=RecordType(NULL _o__c0),tuples=[]), rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io} Set#2, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN ROW__IS__DELETED) rel#183:RelSubset#2.NONE.[].any, best=null rel#182:LogicalCorrelate.NONE.[].any(left=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}), rowcount=4.0, cumulative cost={inf} Set#3, type: RecordType(INTEGER id) rel#185:RelSubset#3.HIVE.[].any, best=null rel#184:HiveProject.HIVE.[].any(input=RelSubset#183,inputs=0,synthetic=false), rowcount=4.0, cumulative cost={inf} Graphviz: digraph G { root [style=filled,label="Root"]; subgraph cluster0{ label="Set 0 RecordType(INTEGER id, VARCHAR(10) val, BIGINT BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN ROW__IS__DELETED)"; rel111 [label="rel#111:HiveTableScan\ntable=[default, t1],htColumns=[0, 1, 2, 3, 4, 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null\nrows=4.0, cost={4.0 rows, 5.0 cpu, 0.0 io}",color=blue,shape=box] subset180 [label="rel#180:RelSubset#0.HIVE.[].any"] } subgraph cluster1{ label="Set 1 RecordType(NULL _o__c0)"; rel139 [label="rel#139:HiveValues\ntype=RecordType(NULL _o__c0),tuples=[]\nrows=1.0, cost={1.0 rows, 1.0 cpu, 0.0 io}",color=blue,shape=box] subset181 [label="rel#181:RelSubset#1.HIVE.[].any"] } subgraph cluster2{ label="Set 2 RecordType(INTEGER id, VARCHAR(10) val, BIGINT BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN ROW__IS__DELETED)"; rel182 [label="rel#182:LogicalCorrelate\nleft=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}\nrows=4.0, cost={inf}",shape=box] subset183 [label="rel#183:RelSubset#2.NONE.[].any"] } subgraph cluster3{ label="Set 3 RecordType(INTEGER id)"; rel184 [label="rel#184:HiveProject\ninput=RelSubset#183,inputs=0,synthetic=false\nrows=4.0, cost={inf}",shape=box] subset185 [label="rel#185:RelSubset#3.HIVE.[].any"] } root -> subset185; subset180 -> rel111[color=blue]; subset181 -> rel139[color=blue]; subset183 -> rel182; rel182 -> subset180[label="0"]; rel182 -> subset181[label="1"]; subset185 -> rel184; rel184 -> subset183; } at org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:742) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:365) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:520) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyMaterializedViewRewriting(CalcitePlanner.java:2058) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1722) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1591) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1343) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12820) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) ~[test-classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) ~[junit-4.13.2.jar:4.13.2] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) ~[junit-4.13.2.jar:4.13.2] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) ~[junit-4.13.2.jar:4.13.2] at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.run(ParentRunner.java:413) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.Suite.runChild(Suite.java:128) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.Suite.runChild(Suite.java:27) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) ~[junit-4.13.2.jar:4.13.2] at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.junit.rules.RunRules.evaluate(RunRules.java:20) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) ~[junit-4.13.2.jar:4.13.2] at org.junit.runners.ParentRunner.run(ParentRunner.java:413) ~[junit-4.13.2.jar:4.13.2] at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377) ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138) ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465) ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4] at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451) ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4]
The goal of this ticket is to get rid of the redundant correlation to avoid compilation failures but also for unlocking further simplifications and improving plan readability.
The plan can be simplified further based on the following observations.
If the right side of the correlate is empty then the whole correlate is empty when joinType is SEMI/INNER. Moreover if correlate type is LEFT then we can also drop the correlate and use t1 padded with nulls for the right side. Lastly, if the type is ANTI then result is the entire t1 so the correlate can also be dropped. RIGHT and FULL correlations are invalid and should never appear in the plan.
If the left side of the correlate is empty the result is empty and the correlation can be dropped for every legal joinType (INNER/SEMI/ANTI/LEFT).
Attachments
Issue Links
- is related to
-
HIVE-27296 HiveRelDecorrelator does not handle correlation with Values
- Open
- links to