Details
Description
Resource Manager is running out of memory after every 2-3 days in dev cluster,
After Analyzing the memory dump , it looks like HDFS is leaking configuration object causing YARN RM OOM.
GC Logs:
PSYoungGen total 52736K, used 37813K [0x00000000eab00000, 0x00000000eec80000, 0x0000000100000000)
eden space 39424K, 95% used [0x00000000eab00000,0x00000000ecfed620,0x00000000ed180000)
from space 13312K, 0% used [0x00000000edf80000,0x00000000edf80000,0x00000000eec80000)
to space 13824K, 0% used [0x00000000ed180000,0x00000000ed180000,0x00000000edf00000)
ParOldGen total 699392K, used 699329K [0x00000000c0000000, 0x00000000eab00000, 0x00000000eab00000)
object space 699392K, 99% used [0x00000000c0000000,0x00000000eaaf04a8,0x00000000eab00000)
Metaspace used 98178K, capacity 99932K, committed 100440K, reserved 1138688K
class space used 10481K, capacity 10829K, committed 10880K, reserved 1048576K
More than 8K objects of org/apache/Hadoop/Conf and most frequent code path to create Hadoop Configuration object is coming from org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider and all these object are kept in memory, see the attached screenshot for the path to GC root for conf object.
Attachments
Attachments
Issue Links
- relates to
-
HDFS-13848 Refactor NameNode failover proxy providers
- Resolved