[HADOOP-1652] Rebalance data blocks when new data nodes added or data nodes become full - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.13.0
Fix Version/s: 0.16.0
Component/s: None
Labels:
None

Description

When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism.

This jira aims to find an approach to redistribute data blocks when imbalance occurs in the cluster. An solution should meet the following requirements:
1. It maintains data availablility guranteens in the sense that rebalancing does not reduce the number of replicas that a block has or the number of racks that the block resides.
2. An adminstrator should be able to invoke and interrupt rebalancing from a command line.
3. Rebalancing should be throttled so that rebalancing does not cause a namenode to be too busy to serve any incoming request or saturate the network.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Balancer.html
13/Oct/09 21:31
20 kB
Ravi Phulari
balancer8.patch
05/Dec/07 18:46
72 kB
Hairong Kuang
balancer7.patch
05/Dec/07 00:52
72 kB
Hairong Kuang
balancer6.patch
04/Dec/07 22:10
71 kB
Hairong Kuang
BalancerUserGuide2.pdf
04/Dec/07 19:48
14 kB
Hairong Kuang
balancer5.patch
04/Dec/07 18:50
71 kB
Hairong Kuang
balancer4.patch
29/Nov/07 18:17
71 kB
Hairong Kuang
balancer3.patch
26/Nov/07 21:18
75 kB
Hairong Kuang
BalancerAdminGuide1.pdf
20/Nov/07 08:43
14 kB
Hairong Kuang
balancer2.patch
20/Nov/07 08:38
71 kB
Hairong Kuang
balancer1.patch
09/Nov/07 20:12
65 kB
Hairong Kuang
BalancerAdminGuide.pdf
08/Nov/07 23:26
13 kB
Hairong Kuang
balancer.patch
26/Oct/07 20:51
51 kB
Hairong Kuang
RebalanceDesign6.pdf
23/Oct/07 22:48
50 kB
Hairong Kuang
RebalanceDesign5.pdf
22/Aug/07 21:08
45 kB
Hairong Kuang
RebalanceDesign4.pdf
10/Aug/07 22:55
47 kB
Hairong Kuang

Issue Links

depends upon

HADOOP-1846 DatanodeReport should distinguish live datanodes from dead datanodes

Closed

HADOOP-1912 Datanode should support block replacement

Closed

HADOOP-1914 HDFS should have a NamenodeProtocol to allow secondary namenodes and rebalancing processes to communicate with a primary namenode

Closed

HADOOP-1266 Remove DatanodeDescriptor dependency from NetworkTopology

Closed

is depended upon by

HBASE-57 [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness

Closed

Activity

People

Assignee:: Hairong Kuang

Reporter:: Hairong Kuang

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 25/Jul/07 18:47

Updated:: 25/Apr/17 20:31

Resolved:: 05/Dec/07 19:49