Details
Description
Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.
Putting in 0.96 so it gets a look at least. Can always punt.
Attachments
Attachments
Issue Links
- depends upon
-
HBASE-14028 DistributedLogReplay drops edits when ITBLL 125M
- Closed
- is related to
-
HBASE-8560 TestMasterShutdown failing in trunk 0.95/trunk -- "Unable to get data of znode /hbase/meta-region-server because node does not exist (not an error)"
- Closed
-
HBASE-8567 TestDistributedLogSplitting#testLogReplayForDisablingTable fails on hadoop 2.0
- Closed
-
HBASE-7825 Retire non distributed log splitting related code
- Closed
- is required by
-
HBASE-5843 Improve HBase MTTR - Mean Time To Recover
- Closed
- relates to
-
HBASE-11280 Document distributed log replay and distributed log splitting
- Closed
-
HBASE-8701 distributedLogReplay need to apply wal edits in the receiving order of those edits
- Closed
-
HBASE-8729 distributedLogReplay may hang during chained region server failure
- Closed
-
HBASE-8568 Test case TestDistributedLogSplitting#testWorkerAbort failed intermittently
- Closed
-
HBASE-8573 Store last flushed sequence id for each store of region for Distributed Log Replay
- Closed
-
HBASE-8617 Introducing a new config to disable writes during recovering
- Closed
-
HBASE-8575 TestDistributedLogSplitting#testMarkRegionsRecoveringInZK fails intermittently due to lack of online region
- Closed
- supercedes
-
HBASE-6984 Serve writes during log split
- Closed