Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.2.0
-
None
-
None
-
~30 node cluster
Description
DFS files are still rotting.
I suspect that there's a problem with block accounting/detecting identical hosts in the namenode. I have 30 physical nodes, with various numbers of local disks, meaning that my current 'bin/hadoop dfs -report" shows 80 nodes after a full restart. However, when I discovered the problem (which resulted in losing about 500gb worth of temporary data because of missing blocks in some of the larger chunks) -report showed 96 nodes. I suspect somehow there were extra datanodes running against the same paths, and that the namenode was counting those as replicated instances, which then showed up over-replicated, and one of them was told to delete its local block, leading to the block actually getting lost.
I will debug it more the next time the situation arises. This is at least the 5th time I've had a large amount of file data "rot" in DFS since January.
Attachments
Attachments
Issue Links
- duplicates
-
HADOOP-94 disallow more than one datanode running on one computing sharing the same data directory
- Closed