Description
For example,
val df = sqlContext.range(1, 10).select($"id", rand(0).as('r)) df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === $"b.id").explain(true)
The plan is
== Physical Plan == ShuffledHashJoin [id#55323L], [id#55327L], BuildRight Exchange (HashPartitioning 200) Project [id#55323L,Rand 0 AS r#55324] PhysicalRDD [id#55323L], MapPartitionsRDD[42268] at range at <console>:37 Exchange (HashPartitioning 200) Project [id#55327L,Rand 0 AS r#55325] Filter (LessThan) PhysicalRDD [id#55327L], MapPartitionsRDD[42268] at range at <console>:37
The rand get evaluated twice instead of once.
This is caused by when we push down predicates we replace the attribute reference in the predicate with the actual expression.