Details
Description
I'm running ActiveMQ 5.5.0 and clients using Net::Stomp 0.38_99 and I'm seeing infrequent NullPointerExceptions in TransportConnection:
Exception in thread "ActiveMQ Connection Dispatcher: /172.31.201.11:50607" java.lang.NullPointerException at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:327) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:179) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69) at org.apache.activemq.transport.stomp.StompTransportFilter.sendToActiveMQ(StompTransportFilter.java:81) at org.apache.activemq.transport.stomp.StompSubscription.onMessageDispatch(StompSubscription.java:79) at org.apache.activemq.transport.stomp.ProtocolConverter.onActiveMQCommand(ProtocolConverter.java:596) at org.apache.activemq.transport.stomp.StompTransportFilter.oneway(StompTransportFilter.java:58) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40) at org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1270) at org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:815) at org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:851) at org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:104) at org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:42)
This seems to happen 1-2 times per month or so but the result is dire: new messages aren't delivered to the affected client (you can see the number of pending messages increasing in the admin web interface) until the client or ActiveMQ is restarted.
Relevant code snippet from TransportConnection.java,
326 if (context != null) { 327 if (context.isDontSendReponse()) {
implies that we are dealing with a race condition. I'm not familiar with the ActiveMQ code base but it looks like it grabs a lock (serviceLock) before entering that function, so not sure what's going on.
Since there's no timestamp associated with the stack trace I'm not completly sure what's going on on the client side. I've tried to reproduce it by writing a small script that uses Net::Stomp in a similar way to my real clients, but no luck so far.
No idea if it's relevant, but my affected clients have been both consuming and producing, and sending/receiving on both topics and queues.