Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
We have seen multiple incidents at production sites that there are long delays for DNs to register to the NN when upgrading to post 2.6 release.
Further investigation shows that the DN is blocked when upgrading the storage layout introduced in HDFS-6482. The new storage layout requires making up to 64k directories in the underlying file system. Unfortunately the current implementation calls mkdirs() sequentially and upgrades each volume in sequential order.
As a result, upgrading a DN with a lot of disks or with blocks that have random block ID takes a long time (usually in hours), and the DN won't register to the NN unless it finishes upgrading all the storage directory. The excessive delays confuse operations and break the assumption of rolling upgrades.
Attachments
Issue Links
- is broken by
-
HDFS-6482 Use block ID-based block layout on datanodes
- Closed
- is duplicated by
-
HDFS-8578 On upgrade, Datanode should process all storage/data dirs in parallel
- Closed
- is related to
-
HDFS-8578 On upgrade, Datanode should process all storage/data dirs in parallel
- Closed
- relates to
-
HDFS-8791 block ID-based DN storage layout can be very slow for datanode on ext4
- Resolved