Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10494

Multiple Python UDFs together with aggregation or sort merge join may cause OOM (failed to acquire memory)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.5.0
    • 1.5.1, 1.6.0
    • PySpark, SQL
    • None

    Description

      The RDD cache for Python UDF is removed in 1.4, then N Python UDFs in one query stage could end up evaluate upstream (SparkPlan) 2^N times.

      In 1.5, If there is aggregation or sort merge join in upstream SparkPlan, they will cause OOM (failed to acquire memory).

      Attachments

        Issue Links

          Activity

            People

              rxin Reynold Xin
              davies Davies Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: