Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8599 Improve non-deterministic expression handling
  3. SPARK-9192

add initialization phase for nondeterministic expression

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None

    Description

      Currently nondeterministic expression is broken without a explicit initialization phase.

      Let me take `MonotonicallyIncreasingID` as an example. This expression need a mutable state to remember how many times it has been evaluated, so we use `@transient var count: Long` there. By being transient, the `count` will be reset to 0 and *only* to 0 when serialize and deserialize it, as deserialize transient variable will result to default value. There is no way to use another initial value for `count`, until we add the explicit initialization phase.
      For now no nondeterministic expression need this feature, but we may add new ones with the need of a different initial value for mutable state in the future.

      Another use case is local execution for LocalRelation, there is no serialize and deserialize phase and thus we can't reset mutable states for it.

      Attachments

        Issue Links

          Activity

            People

              cloud_fan Wenchen Fan
              cloud_fan Wenchen Fan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: