Description
Under abnormally heavy database loads we get a lot of transactions timeouts in our application as one would expect. Our application uses XA with Postgres and ActiveMQ. Problem is that after the abnormality goes away, the system does not recover.
During these failures, we get a NPE that causes ActiveMQ to lose a database connection and the connection is never returned to the connection pool (Hikari). After the abnormality is removed, and the database is responsive again, the system never recovers as the connection pool is out-of-resources.
Through debugging, we believe the following causes the connection leak in ActiveMQs handing:
Caused by: javax.jms.JMSException: java.lang.NullPointerException
at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:54) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1403) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1436) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.TransactionContext.rollback(TransactionContext.java:538) ~[activemq-client-5.15.11.jar:5.15.11]
... 134 more
Caused by: java.lang.NullPointerException
at org.apache.activemq.store.jdbc.JDBCPersistenceAdapter.commitRemove(JDBCPersistenceAdapter.java:795) ~[activemq-jdbc-store-5.15.11.jar:5.15.11]
at org.apache.activemq.store.jdbc.JdbcMemoryTransactionStore.rollback(JdbcMemoryTransactionStore.java:171) ~[activemq-jdbc-store-5.15.11.jar:5.15.11]
at org.apache.activemq.transaction.XATransaction.rollback(XATransaction.java:146) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.broker.TransactionBroker.rollbackTransaction(TransactionBroker.java:257) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.broker.BrokerFilter.rollbackTransaction(BrokerFilter.java:149) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.broker.BrokerFilter.rollbackTransaction(BrokerFilter.java:149) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.broker.TransportConnection.processRollbackTransaction(TransportConnection.java:553) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.command.TransactionInfo.visit(TransactionInfo.java:104) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:336) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:200) ~[activemq-broker-5.15.11.jar:5.15.11]
at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:125) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:301) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:233) ~[activemq-client-5.15.11.jar:5.15.11]
at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:215) ~[activemq-client-5.15.11.jar:5.15.11]
... 1 more
By overloading the method 'commitRemoved(...)' in 'JDBCPersistenceAdapter' and converting the NullPointerException above to an IOException, the connection handling code behaves as expected, we see no connection leak, and the system recovers correctly after the load abnormality has passed.
There is a very large number of things going wrong when these NPEs occur and its near impossible for us (not being experts at ActiveMQ) to see what the underlying cause for these exceptions are. However, for us, the most important is that we recover-
Attachments
Issue Links
- links to