[HBASE-14129] If any regionserver gets shutdown uncleanly during full cluster restart, locality looks to be lost - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

We were doing a cluster restart the other day. Some regionservers did not shut down cleanly. Upon restart our locality went from 99% to 5%. Upon looking at the AssignmentManager.joinCluster() code it calls AssignmentManager.processDeadServersAndRegionsInTransition().
If the failover flag gets set for any reason it seems we don't call assignAllUserRegions(). Then it looks like the balancer does the work in assigning those regions, we don't use a locality aware balancer and we lost our region locality.

I don't have a solid grasp on the reasoning for these checks but there could be some potential workarounds here.

1. After shutting down your cluster, move your WALs aside (replay later).
2. Clean up your zNodes

That seems to work, but requires a lot of manual labor. Another solution which I prefer would be to have a flag for ./start-hbase.sh --clean

If we start master with that flag then we do a check in AssignmentManager.processDeadServersAndRegionsInTransition() thus if this flag is set we call: assignAllUserRegions() regardless of the failover state.

I have a patch for the later solution, that is if I am understanding the logic correctly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-14129.patch
21/Jul/15 20:41
4 kB
churro morales

Issue Links

is related to

HBASE-18036 HBase 1.x : Data locality is not maintained after cluster restart or SSH

Resolved

HBASE-15251 During a cluster restart, Hmaster thinks it is a failover by mistake

Resolved

HBASE-17791 Locality should not be affected for non-faulty region servers at startup

Open

Activity

People

Assignee:: Unassigned

Reporter:: churro morales

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/Jul/15 20:06

Updated:: 01/Jul/22 20:43

Resolved:: 14/Mar/17 13:56