Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8599 Improve non-deterministic expression handling
  3. SPARK-9082

Filter using non-deterministic expressions should not be pushed down

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None

    Description

      For example,

      val df = sqlContext.range(1, 10).select($"id", rand(0).as('r))
      df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === $"b.id").explain(true)
      

      The plan is

      == Physical Plan ==
      ShuffledHashJoin [id#55323L], [id#55327L], BuildRight
       Exchange (HashPartitioning 200)
        Project [id#55323L,Rand 0 AS r#55324]
         PhysicalRDD [id#55323L], MapPartitionsRDD[42268] at range at <console>:37
       Exchange (HashPartitioning 200)
        Project [id#55327L,Rand 0 AS r#55325]
         Filter (LessThan)
          PhysicalRDD [id#55327L], MapPartitionsRDD[42268] at range at <console>:37
      

      The rand get evaluated twice instead of once.

      This is caused by when we push down predicates we replace the attribute reference in the predicate with the actual expression.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            yhuai Yin Huai
            Yin Huai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: