Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-988

mapjoin should throw an error if the input is too large

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • Query Processor
    • None
    • Reviewed

    Description

      If the input to the map join is larger than a specific threshold, it may lead to a very slow execution of the join.
      It is better to throw an error, and let the user redo his query as a non map-join query.

      However, the current map-reduce framework will retry the mapper 4 times before actually killing the job.
      Based on a offline discussion with Dhruba, Ning and myself, we came up with the following algorithm:

      Keep a threshold in the mapper for the number of rows to be processed for map-join. If the number of rows
      exceeds that threshold, set a counter and kill that mapper.

      The client (ExecDriver) monitors that job continuously - if this counter is set, it kills the job and also
      shows an appropriate error message to the user, so that he can retry the query without the map join.

      Attachments

        1. HIVE-988_2.patch
          408 kB
          Ning Zhang
        2. HIVE-988_3.patch
          408 kB
          Ning Zhang
        3. HIVE-988_4.patch
          409 kB
          Ning Zhang
        4. HIVE-988.patch
          407 kB
          Ning Zhang

        Activity

          People

            nzhang Ning Zhang
            namit Namit Jain
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: