Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13261

Consider the effect of NULL keys when choosing BROADCAST vs SHUFFLE join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-6

    Description

      Currently NULL keys are hashed to a single value and sent to a single fragment instance in partitioned joins. This can cause data skew if the number of NULL keys is large.

      The planner could give preference to BROADCAST in LEFT OUTER JOIN when the number of NULLs is large on the probe side.

      Another potential solution for the same problem is IMPALA-13260 - it is about sending rows with NULL keys to local fragment instances in this situation.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: