Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2393

Simple cost estimation and auto selection of broadcast join

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.1.0
    • SQL
    • None

    Description

      Spark SQL should support the common optimization known as cost estimations. For example, each logical operator should be able to estimate its cardinality, based on the estimates from its children.

      As a first step, the framework to support doing so should be added, which might include the interface for the aforementioned cardinality estimation, and/or some other metrics.

      As the first proof of concept usage of this optimization, a simple optimization strategy for certain equi-joins should be added: namely, if one side of a qualifying join has a small estimated physical size (smaller than some threshold), the planner should use a broadcast join physical plan to execute the join, broadcasting the small side and streaming through the bigger side. Similar concept exists in Hive and is explained here.

      Attachments

        Issue Links

          Activity

            People

              ConcreteVitamin Zongheng Yang
              marmbrus Michael Armbrust
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: