Details
Description
Here is the example code:
import sql.implicits._
val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", "tres"))).toDF.selectExpr("_1 as id", "_2 as name")
val rawj = sc.parallelize(Seq(("1", "2"), ("1", "3"), ("2", "3"), ("2", "1"))).toDF.selectExpr("_1 as id1", "_2 as id2")
val join1 = rawj.join(objs, objs("id") === rawj("id1"))
.withColumnRenamed("id", "anything")println("works...")
val join2a = join1.join(objs, 'id2 === 'id )
join2a.show()println("works...")
val join2b = objs.join(join1, objs("id") === join1("id2"))
join2b.show()println("do not works...")
val join2c = join1.join(objs, join1("id2") === objs("id") )
join2c.show()
Fisrt two joins work. But the last one gave me this error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) id#2 missing from anything#8,name#14,name#3,id1#6,id2#7,id#13 in operator !Join Inner, Some((id2#7 = id#2));
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105)
Without the first column rename, the error happens in silence since the join get empty:
import sql.implicits._
val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", "tres"))).toDF.selectExpr("_1 as id", "_2 as name")
val rawj = sc.parallelize(Seq(("1", "2"), ("1", "3"), ("2", "3"), ("2", "1"))).toDF.selectExpr("_1 as id1", "_2 as id2")
val join1 = rawj.join(objs, objs("id") === rawj("id1"))
println("do not works...")
val join2c = join1.join(objs, join1("id2") === objs("id") )
join2c.show()
Attachments
Issue Links
- is related to
-
SPARK-15666 Join on two tables generated from a same table throwing query analyzer issue
- Resolved