Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-887

Improve JobManager heap space memory configuration for YARN

    XMLWordPrintableJSON

Details

    Description

      just saw this while testing for the 0.5 release.
      The JM sometimes fails because I forgot to subtract a few %% from the JM heapspace for extra JVM allocations.

      2014-05-29 12:01:53,770 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1401362984347_0002_01_000001 has processes older than 1 iteration running over the configured limit. Limit=1073741824, current usage = 1125031936
      2014-05-29 12:01:53,776 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=2477,containerID=container_1401362984347_0002_01_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.7 GB of 5 GB virtual memory used. Killing container.
      Dump of the process-tree for container_1401362984347_0002_01_000001 :
      	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
      	|- 2483 2477 2477 2477 (java) 5021 548 1711837184 274360 /usr/java/latest/bin/java -Xmx1000M -Dlog.file=/mnt/var/log/hadoop/userlogs/application_1401362984347_0002/container_1401362984347_0002_01_000001/jobmanager-log4j.log -Dlog4j.configuration=file:log4j.properties eu.stratosphere.yarn.ApplicationMaster 
      	|- 2477 2094 2477 2477 (bash) 0 1 117805056 306 /bin/bash -c /usr/java/latest/bin/java -Xmx1000M  -Dlog.file=/mnt/var/log/hadoop/userlogs/application_1401362984347_0002/container_1401362984347_0002_01_000001/jobmanager-log4j.log -Dlog4j.configuration=file:log4j.properties eu.stratosphere.yarn.ApplicationMaster  1>/mnt/var/log/hadoop/userlogs/application_1401362984347_0002/container_1401362984347_0002_01_000001/jobmanager-stdout.log 2>/mnt/var/log/hadoop/userlogs/application_1401362984347_0002/container_1401362984347_0002_01_000001/jobmanager-stderr.log 
      
      2014-05-29 12:01:53,777 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1401362984347_0002_01_000001 transitioned from RUNNING to KILLING
      

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/stratosphere/stratosphere/issues/887
      Created by: rmetzger
      Labels: bug, YARN,
      Assignee: rmetzger
      Created at: Thu May 29 14:47:59 CEST 2014
      State: open

      Attachments

        Activity

          People

            rmetzger Robert Metzger
            github-import GitHub Import
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: