Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.3
-
None
Description
Task containers can go over their resource limit, and killed by Node Manager. Then MR AM gets notified of the container status and diagnostics information through its heartbeat with RM. However, it is possible that the diagnostics information never gets into .jhist file, so when the job completes, the diagnostics information associated with the failed task attempts is empty. This makes it hard for users to root cause job failures that are often caused by memory leak.
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-4955 NM container diagnostics for excess resource usage can be lost if task fails while being killed
- Open