Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tez
    • None

    Description

      Group by and join only require that the records be grouped together by key. It is not necessary for the keys to be sorted. If we can have a Tez Input/Output implementation that does the grouping using hashmap (memory, spilling, etc have to be handled) it could really speed up group by and join. Combiners on both input and output side can also be fast if serialization/deserialization is not required and that can be used instead of POPartialAgg.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: