Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0, 1.0.2
Description
I've been trying to track down a cause of some of our issues with some exceptions leaving Storm workers in a zombified state for some time. I believe I've isolated the bug to the behaviour in :report-error-and-die/reportErrorAndDie in the executor. Essentially:
:report-error-and-die (fn [error] (try ((:report-error <>) error) (catch Exception e (log-message "Error while reporting error to cluster, proceeding with shutdown"))) (if (or (exception-cause? InterruptedException error) (exception-cause? java.io.InterruptedIOException error)) (log-message "Got interrupted excpetion shutting thread down...") ((:suicide-fn <>))))
has the grouping for the if statement slightly wrong. It shouldn't log OR die from InterruptedException/InterruptedIOException, but it should log under that condition, and ALWAYS die.
Basically:
:report-error-and-die (fn [error] (try ((:report-error <>) error) (catch Exception e (log-message "Error while reporting error to cluster, proceeding with shutdown"))) (if (or (exception-cause? InterruptedException error) (exception-cause? java.io.InterruptedIOException error)) (log-message "Got interrupted excpetion shutting thread down...")) ((:suicide-fn <>)))
After digging into the Java port of this code, it looks like a different bug was introduced while porting:
if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e) || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) { LOG.info("Got interrupted exception shutting thread down..."); suicideFn.run(); }
Was how this was initially ported, and STORM-2142 changed this to:
if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e) || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) { LOG.info("Got interrupted exception shutting thread down..."); } else { suicideFn.run(); }
However, I believe the correct port is as described above:
if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e) || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) { LOG.info("Got interrupted exception shutting thread down..."); } suicideFn.run();
I'll look into providing patches for the 1.x and 2.x branches shortly.
Attachments
Attachments
Issue Links
- relates to
-
STORM-2440 Kafka outage can lead to lockup of topology
- Resolved
- links to