[HADOOP-124] don't permit two datanodes to run from same dfs.data.dir - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.2.0
Fix Version/s: 0.3.0
Component/s: None
Labels:
None
Environment:

~30 node cluster

Description

DFS files are still rotting.

I suspect that there's a problem with block accounting/detecting identical hosts in the namenode. I have 30 physical nodes, with various numbers of local disks, meaning that my current 'bin/hadoop dfs -report" shows 80 nodes after a full restart. However, when I discovered the problem (which resulted in losing about 500gb worth of temporary data because of missing blocks in some of the larger chunks) -report showed 96 nodes. I suspect somehow there were extra datanodes running against the same paths, and that the namenode was counting those as replicated instances, which then showed up over-replicated, and one of them was told to delete its local block, leading to the block actually getting lost.

I will debug it more the next time the situation arises. This is at least the 5th time I've had a large amount of file data "rot" in DFS since January.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Hadoop-124-v3.patch
31/May/06 07:36
90 kB
Konstantin Shvachko
Hadoop-124.patch
26/May/06 10:42
81 kB
Konstantin Shvachko
DatanodeRegister.txt
13/May/06 06:31
4 kB
Konstantin Shvachko

Issue Links

duplicates

HADOOP-94 disallow more than one datanode running on one computing sharing the same data directory

Closed

Activity

People

Assignee:: Konstantin Shvachko

Reporter:: Bryan Pendleton

Votes:: 2 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Apr/06 01:45

Updated:: 08/Jul/09 16:41

Resolved:: 01/Jun/06 01:53