Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1788

Conflicting column names in join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Description

      Drill doesn't support multiple columns within a batch having the same name. when doing a join where there are matching column names, the planner will insert a project to rename one of the columns to avoid this conflict.

      However, it appears that there is some case-sensitive matching somewhere in the code path, because there are some cases where this rewrite does not happen:

      For example, this query does do the column name change (see 01-03):

      0: jdbc:drill:> explain plan for select n3.n_name from (select n2.n_name from cp.`tpch/nation.parquet` n1, cp.`tpch/nation.parquet` n2 where n1.n_name = n2.n_name) n3 join cp.`tpch/nation.parquet` n4 on n3.n_name = n4.n_name;

      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      UnionExchange
      01-01        Project(n_name=[$0])
      01-02          HashJoin(condition=[=($0, $1)], joinType=[inner])
      01-04            HashToRandomExchange(dist0=[[$0]])
      02-01              Project(n_name=[$1])
      02-02                HashJoin(condition=[=($0, $1)], joinType=[inner])
      02-04                  HashToRandomExchange(dist0=[[$0]])
      04-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
      02-03                  Project(n_name0=[$0])
      02-05                    HashToRandomExchange(dist0=[[$0]])
      05-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
      01-03            Project(n_name0=[$0])
      01-05              HashToRandomExchange(dist0=[[$0]])
      03-01                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
      

      But if I change the one of the letters in one of the identifiers to uppercase, the rename goes away:

      0: jdbc:drill:> explain plan for select n3.n_name from (select n2.n_name from cp.`tpch/nation.parquet` n1, cp.`tpch/nation.parquet` n2 where n1.N_name = n2.n_name) n3 join cp.`tpch/nation.parquet` n4 on n3.n_name = n4.n_name;
      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      UnionExchange
      01-01        Project(n_name=[$0])
      01-02          HashJoin(condition=[=($0, $1)], joinType=[inner])
      01-04            HashToRandomExchange(dist0=[[$0]])
      02-01              Project(n_name=[$1])
      02-02                HashJoin(condition=[=($0, $1)], joinType=[inner])
      02-04                  HashToRandomExchange(dist0=[[$0]])
      04-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]])
      02-03                  Project(N_name0=[$0])
      02-05                    HashToRandomExchange(dist0=[[$0]])
      05-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]])
      01-03            HashToRandomExchange(dist0=[[$0]])
      03-01              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]])
      

      Running this query without the rewrite results in failure:

      java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
      at java.util.ArrayList.rangeCheck(ArrayList.java:604) ~[na:1.7.0_21]
      at java.util.ArrayList.get(ArrayList.java:382) ~[na:1.7.0_21]
      at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:252) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]
      at org.apache.drill.exec.record.AbstractRecordBatch.getValueAccessorById(AbstractRecordBatch.java:153) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]
      at org.apache.drill.exec.test.generated.HashJoinProbeGen249.doSetup(HashJoinProbeTemplate.java:46) ~[na:na]
      at org.apache.drill.exec.test.generated.HashJoinProbeGen249.setupHashJoinProbe(HashJoinProbeTemplate.java:97) ~[na:na]
      at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:226) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]

      Attachments

        Activity

          People

            amansinha100 Aman Sinha
            sphillips Steven Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: