Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.6.0
-
None
Description
If there're rack failures which end up leaving only 1 rack available, BlockPlacementPolicyDefault#chooseRandom may get InvalidTopologyException when calling NetworkTopology#chooseRandom, which then throws all the way out to BlockManager's ReplicationMonitor thread and terminate the NN.
Log:
2016-02-24 09:22:01,514 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2016-02-24 09:22:01,958 ERROR org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor thread received Runtime exception. org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Failed to find datanode (scope="" excludedScope="/rack_a5"). at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:729) at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:694) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:635) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3746) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3711) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1400) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1306) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3682) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3634) at java.lang.Thread.run(Thread.java:745)
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-14369 NetworkTopology calls expensive toString() when logging
- Resolved
-
HADOOP-15317 Improve NetworkTopology chooseRandom's loop
- Resolved
- supercedes
-
HDFS-4937 ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()
- Closed