Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.4.3
-
None
-
Reviewed
Description
When a region P splits into A and B, following a master failover the newly active master reports that P is in an inconsistent state. This seems to be a regression introduced in HBASE-25847 (cc andrew.purtell@gmail.com) which changed regionInfo.isParentSplit() to regionState.isSplit(). The region state after restart is CLOSED (rather than SPLIT), so both region state and region info should be checked, presumably with regionState.isSplit() || regionInfo.isSplit(). This situation resolves itself on its own when a major compaction occurs and P is GCed, but having the master incorrectly report inconsistencies is pretty bad. We had a pretty big outage due to a series of operator errors as our SRE team was trying to fix this inconsistency that, in fact, didn't even exist.
Thanks to Stack for helping look over this issue and Vlad Hanciuta for root causing the bug.
Attachments
Issue Links
- is broken by
-
HBASE-25847 More DEBUG and TRACE level logging in CatalogJanitor and HbckChore
- Resolved
- links to