Details
Description
Stack:
Thread 456 (Edit log tailer): State: RUNNABLE Blocked count: 1139 Waited count: 12 Stack: org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getNumLiveDataNodes(DatanodeManager.java:1259) org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.areThresholdsMet(BlockManagerSafeMode.java:570) org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.checkSafeMode(BlockManagerSafeMode.java:213) org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.adjustBlockTotals(BlockManagerSafeMode.java:265) org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:1087) org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1118) org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1126) org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:468) org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258) org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161) org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:892) org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321) org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410) org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414) org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) Thread 455 (pool-16-thread-1):
code:
private boolean areThresholdsMet() { assert namesystem.hasWriteLock(); int datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes(); synchronized (this) { return blockSafe >= blockThreshold && datanodeNum >= datanodeThreshold; } }
According to the code, each time the method areThresholdsMet() is called, the value of datanodeNum is need to be calculated. However, in the scenario of datanodeThreshold is equal to 0(0 is the default value of the configuration), This expression datanodeNum >= datanodeThreshold always returns true.
Calling the method getNumLiveDataNodes() is time consuming at a scale of 10,000 datanode clusters. Therefore, we add the judgment condition, and only when the datanodeThreshold is greater than 0, the datanodeNum is calculated, which improves the perfomance greatly.
The Call Tree graph is shown in the attached file.
Attachments
Attachments
Issue Links
- is duplicated by
-
HDFS-14613 BlockManagerSafeMode should avoid to check datanode thresholds with default zero value.
- Resolved
- is superceded by
-
HDFS-14859 Prevent unnecessary evaluation of costly operation getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes is not zero
- Resolved
- relates to
-
HDFS-12914 Block report leases cause missing blocks until next report
- Resolved
-
HDFS-14366 Improve HDFS append performance
- Resolved
-
HDFS-14632 Reduce useless #getNumLiveDataNodes call in SafeModeMonitor
- Resolved
-
HDFS-15594 Lazy calculate live datanodes in safe mode tip
- Resolved