Description
I have around 40 consumers taking messages from a single queue. After awhile 1 or 2 consumers stop receiveing any messages. Going to JMX and stopping corresponding connection causes re-connect and messages are delivered again.
I reproduced it twice in QA enviroment and now it happened in production. I tried to instrument the code and set the log in debug, but that changed timing and I failed to reproduce it after the changes.
I suspect that runtime association b/w Queue and Consumer objects is lost on the Broker side.
One of the suspects is the empty catch block in the RoundRobinDispatchPolicy (line 64) class. It is possible that the CopyOnWrite array list is messed up and it fails when removed consumer is added back.
BTW CopyOnWrite list is good when you mostly read, but not so good when you write for every message delivery and empty catch blocks are bad in any case.
if (firstMatchingConsumer != null) {
// Rotate the consumer list.
try
catch (Throwable bestEffort) {
}
}