Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
-
None
-
ActiveMQ-CPP ver - 3.4.0
Broker 5.3.1
Machine: Linux mars 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
gcc version: 4.1.2 20080704 (Red Hat 4.1.2-44))
Description
The problem description:
We run Network of brokers ( 4 in number ) .
Broker URI : broker URI 'failover://(tcp://10.10.13.20:61616,tcp://10.10.13.22:61616,tcp://10.10.13.24:61616,tcp://10.10.13.26:61616)?randomize=true&connection.closeTimeout=10000&transport.soTimeout=3000&timeout=3000&connection.useAsyncSend=true&connection.alwaysSyncSend=false'
Producer loads broker with 1000 message/sec . We testing the producer behavior while failover by restarting all brokers in row ( all 4 ) while sending the messages and get deadlock as shown below .
Note: the problem tested only with network on brokers .
The backtrace ( only relevant threads ):
Thread 16 (process 26892):
#0 0x00000032ef00ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000032ef008874 in _L_lock_106 () from /lib64/libpthread.so.0
#2 0x00000032ef0082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000dc5a04 in decaf::internal::util::concurrent::MutexImpl::lock (handle=0xfefdd38) at decaf/internal/util/concurrent/unix/MutexImpl.cpp:77
#4 0x0000000000bd9092 in decaf::util::concurrent::Mutex::lock (this=0xff54100) at decaf/util/concurrent/Mutex.cpp:111
#5 0x0000000000d51f3f in decaf::util::AbstractCollection<decaf::lang::Pointer<activemq::transport::Transport, decaf::util::concurrent::atomic::AtomicRefCounter> >::lock (this=0xff540f8) at ./decaf/util/AbstractCollection.h:331
#6 0x0000000000bd86c9 in decaf::util::concurrent::Lock::lock (this=0x4c7b9c90) at decaf/util/concurrent/Lock.cpp:54
#7 0x0000000000bd883a in Lock (this=0x4c7b9c90, object=0xff54188, intiallyLocked=true) at decaf/util/concurrent/Lock.cpp:32
#8 0x0000000000d47a77 in activemq::transport::failover::CloseTransportsTask::add (this=0xff540e8, transport=@0x4c7b9cf0) at activemq/transport/failover/CloseTransportsTask.cpp:46
#9 0x0000000000b1b748 in activemq::transport::failover::FailoverTransport::handleTransportFailure (this=0xffed498, error=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransport.cpp:483
#10 0x0000000000b41a06 in activemq::transport::failover::FailoverTransportListener::onException (this=0xfde2e58, ex=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransportListener.cpp:76
#11 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
#12 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
#13 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
#14 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
#15 0x0000000000d554c8 in activemq::transport::inactivity::InactivityMonitor::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/inactivity/InactivityMonitor.cpp:312
#16 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
#17 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
#18 0x0000000000d326f2 in activemq::transport::IOTransport::fire (this=0xdce10b8, ex=@0x4c7b9ee0) at activemq/transport/IOTransport.cpp:87
#19 0x0000000000d32982 in activemq::transport::IOTransport::run (this=0xdce10b8) at activemq/transport/IOTransport.cpp:264
#20 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0x105871d8) at decaf/lang/Thread.cpp:137
#21 0x0000000000ba9068 in threadWorker (arg=0x105871d8) at decaf/lang/Thread.cpp:190
#22 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
#23 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
Thread 9 (process 14470):
#0 0x00000032ef00a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000dc54b3 in decaf::internal::util::concurrent::ConditionImpl::wait (condition=0x1072d2b8) at decaf/internal/util/concurrent/unix/ConditionImpl.cpp:101
#2 0x0000000000bd9033 in decaf::util::concurrent::Mutex::wait (this=0x105871d8) at decaf/util/concurrent/Mutex.cpp:126
#3 0x0000000000ba8538 in decaf::lang::Thread::join (this=0x12a4a418) at decaf/lang/Thread.cpp:452
#4 0x0000000000d32c28 in activemq::transport::IOTransport::close (this=0xdce10b8) at activemq/transport/IOTransport.cpp:222
#5 0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0x1020c118) at activemq/transport/TransportFilter.cpp:106
#6 0x0000000000b47d3a in activemq::transport::tcp::TcpTransport::close (this=0x1020c118) at activemq/transport/tcp/TcpTransport.cpp:74
#7 0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0xfeeb558) at activemq/transport/TransportFilter.cpp:106
#8 0x0000000000d554ec in activemq::transport::inactivity::InactivityMonitor::close (this=0xfeeb558) at activemq/transport/inactivity/InactivityMonitor.cpp:300
#9 0x0000000000d77867 in activemq::wireformat::openwire::OpenWireFormatNegotiator::close (this=0x10627498) at activemq/wireformat/openwire/OpenWireFormatNegotiator.cpp:248
#10 0x0000000000d478ff in activemq::transport::failover::CloseTransportsTask::iterate (this=0xff540e8) at activemq/transport/failover/CloseTransportsTask.cpp:75
#11 0x0000000000d25891 in activemq::threads::CompositeTaskRunner::iterate (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:173
#12 0x0000000000d25ae4 in activemq::threads::CompositeTaskRunner::run (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:107
#13 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0xfeeb2b8) at decaf/lang/Thread.cpp:137
#14 0x0000000000ba9068 in threadWorker (arg=0xfeeb2b8) at decaf/lang/Thread.cpp:190
#15 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
#16 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
As you can see Thread 16 is on lock_wait for synchronized( &transports ) in activemq::transport::failover::CloseTransportsTask::add .
The synchronized( &transports ) in locked by Thread 9 in activemq::threads::CompositeTaskRunner::iterate . But Thread 9 is on pthread_cond_wait which has to be signalled by the Thread 16.
Kind regards .
Igor.