Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-427

Earlier key-value buffer from MapTask.java is still referenced even though its not required anymore.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Consider the following events for a map task
      Before HADOOP-1965:

      Stage Description Buffers used Memory used
      Stage-1 MapOutputBuffer simply collects KeyVal1 (by collect) io.sort.mb
      Stage-2 KeyVal1 buffer is full and needs spilling so Sort-Spill starts KeyVal1 (by Sort-Spill) io.sort.mb
      Stage-3 Sort-Spill finished KeyVal1 (referenced by comparator ) io.sort.mb
      Stage-4 MapOutputBuffer starts collecting KeyVal2(by collect) + KeyVal1(by comparator) 2*io.sort.mb
      Stage-5 KeyVal2 buffer is full and needs spilling so Sort-Spill starts KeyVal2 (by Sort-Spill) io.sort.mb

      So for the time duration between Stage-4 and Stage-5 the memory used becomes 2 * io.sort.mb which can be totally avoided by removing the comparator's reference to the earlier key-val buffer. So the maximum memory usage can be clamped to io.sort.mb

      After HADOOP-1965:

      Stage Description Buffers used Memory used
      Stage-1 MapOutputBuffer simply collects KeyVal1 (by collect) io.sort.mb/2
      Stage-2 KeyVal1 buffer is full and needs spilling, so Sort-Spill starts in parallel KeyVal1 (by Sort-Spill) io.sort.mb/2
      Stage-3 MapOutputBuffer simply collects + Sort-Spill KeyVal2(by collect) + KeyVal1(by Sort-Spill) io.sort.mb
      Stage-4 MapOutputBuffer simply collects + Sort-Spill finishes, Sort-Impl's are closed but the comparators still hold the reference to KeyVal1 buffer KeyVal2 (by collect) + KeyVal1 (referred by comparator) io.sort.mb
      Stage-5 KeyVal2 buffer is full and needs spilling, so Sort-Spill starts in parallel KeyVal2 (by Sort-Spill) io.sort.mb/2

      So for the time duration between Stage-4 and Stage-5 there is an unwanted reference to the keyval buffer which prevents the GC from claiming it. However the maximum memory usage will be io.sort.mb.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              amar_kamat Amar Kamat
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: