Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.2
-
None
Description
When a WriterThread runs into an exception (ie: NotServingRegionException), the exception is stored in the controller. It is never removed and can not be overwritten either.
public void run() { try { doRun(); } catch (Throwable t) { LOG.error("Exiting thread", t); controller.writerThreadError(t); } }
Thanks to this every time PipelineController.checkForErrors() is called the same old exception is rethrown.
For example in RegionReplicaReplicationEndpoint.replicate there is a while loop that does the actual replicating. Every time it loops, it calls checkForErrors(), catches the rethrown exception, logs it but does nothing about it. This results in ~2GB log files in ~5min in my experience.
My proposal would be to clean up the stored exception when it reaches RegionReplicaReplicationEndpoint.replicate and make sure we restart the WriterThread that died throwing it.