Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Consider the following events for a map task
Before HADOOP-1965:
Stage | Description | Buffers used | Memory used |
---|---|---|---|
Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect) | io.sort.mb |
Stage-2 | KeyVal1 buffer is full and needs spilling so Sort-Spill starts | KeyVal1 (by Sort-Spill) | io.sort.mb |
Stage-3 | Sort-Spill finished | KeyVal1 (referenced by comparator ) | io.sort.mb |
Stage-4 | MapOutputBuffer starts collecting | KeyVal2(by collect) + KeyVal1(by comparator) | 2*io.sort.mb |
Stage-5 | KeyVal2 buffer is full and needs spilling so Sort-Spill starts | KeyVal2 (by Sort-Spill) | io.sort.mb |
So for the time duration between Stage-4 and Stage-5 the memory used becomes 2 * io.sort.mb which can be totally avoided by removing the comparator's reference to the earlier key-val buffer. So the maximum memory usage can be clamped to io.sort.mb
After HADOOP-1965:
Stage | Description | Buffers used | Memory used |
---|---|---|---|
Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect) | io.sort.mb/2 |
Stage-2 | KeyVal1 buffer is full and needs spilling, so Sort-Spill starts in parallel | KeyVal1 (by Sort-Spill) | io.sort.mb/2 |
Stage-3 | MapOutputBuffer simply collects + Sort-Spill | KeyVal2(by collect) + KeyVal1(by Sort-Spill) | io.sort.mb |
Stage-4 | MapOutputBuffer simply collects + Sort-Spill finishes, Sort-Impl's are closed but the comparators still hold the reference to KeyVal1 buffer | KeyVal2 (by collect) + KeyVal1 (referred by comparator) | io.sort.mb |
Stage-5 | KeyVal2 buffer is full and needs spilling, so Sort-Spill starts in parallel | KeyVal2 (by Sort-Spill) | io.sort.mb/2 |
So for the time duration between Stage-4 and Stage-5 there is an unwanted reference to the keyval buffer which prevents the GC from claiming it. However the maximum memory usage will be io.sort.mb.
Attachments
Issue Links
- is duplicated by
-
HADOOP-2919 Create fewer copies of buffer data during sort/spill
- Closed