[HBASE-18164] Much faster locality cost function and candidate generator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0, 2.0.0-alpha-2, 2.0.0
Component/s: Balancer
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
New locality cost function and candidate generator that use caching and incremental computation to allow the stochastic load balancer to consider ~20x more cluster configurations for big clusters.
Flags:

Patch

Description

We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration.

Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations.

One design decision I made is to consider locality cost as the difference between the best locality that is possible given the current cluster state, and the currently measured locality. The old locality computation would measure the locality cost as the difference from the current locality and 100% locality, but this new computation instead takes the difference between the current locality for a given region and the best locality for that region in the cluster.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-18164-00.patch
06/Jun/17 22:16
26 kB
Kahlil Oppenheimer
HBASE-18164-01.patch
07/Jun/17 18:03
38 kB
Kahlil Oppenheimer
HBASE-18164-02.patch
08/Jun/17 15:56
31 kB
Kahlil Oppenheimer
HBASE-18164-04.patch
16/Jun/17 15:29
31 kB
Kahlil Oppenheimer
HBASE-18164-05.patch
19/Jun/17 16:56
31 kB
Kahlil Oppenheimer
HBASE-18164-06.patch
26/Jun/17 14:00
31 kB
Kahlil Oppenheimer
HBASE-18164-07.patch
26/Jun/17 14:13
2 kB
Kahlil Oppenheimer
HBASE-18164-08.patch
26/Jun/17 16:45
2 kB
Kahlil Oppenheimer
18164.branch-1.addendum.txt
26/Jun/17 19:09
1 kB
Ted Yu

Issue Links

is related to

HBASE-21006 Balancer - data locality drops 30-40% across all nodes after every cluster-wide rolling restart, not migrating regions back to original RegionServers?

Resolved

Activity

People

Assignee:: Kahlil Oppenheimer

Reporter:: Kahlil Oppenheimer

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 05/Jun/17 17:46

Updated:: 03/Aug/18 16:55

Resolved:: 26/Jun/17 19:42