Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
qpid-cpp-1.36.0
-
None
-
RedHat Enterprise Linux 6
-
Patch
Description
When doing HA testing we found that our application often crashed inside the Qpid Messaging library.
Our test:
- One ActiveMQ broker.
- Two proxies connecting to the AMQP port on the broker. At the start, only one of the proxies are running.
- Test program configured to use failover between the two proxies. Protocol is "amqp1.0". It reads messages in a loop using a transactional session. On error it closes the connection and opens a new.
- Three queues are read from in parallel, each reader using its own connection in a thread. Nothing is shared between the threads in the client code.
- Send some messages and let the test program process them.
- Stop proxy1 and start proxy2.
- Send some more messages and let the test program process them.
- Stop proxy2 and start proxy1.
- And so on...
After a couple of switches the test program crashes, but not always. It's a timing thing.
A typical error message that we see before the crash:
Exception when trying to close the qpid connection: Transaction outcome unknown: transport failure
The reason for the crash is that the poller thread is still active when the connection is being deleted. The destructor of the qpid::messaging::ConnectionContext class deletes the TcpTransport instance at the same time as, or right before, the poller thread is calling a callback on it (qpid::messaging::amqp::TcpTransport::disconnected).
I have attached a patch to solve the issue, at least for this use case.
I cannot test this on 1.37.0 as I cannot build that version on RHEL6 as it uses Python 2.6 which is no longer supported in 1.37.0. The code in question is identical in 1.36.0 and 1.37.0 though.