[QPID-2983] Broker in cluster goes down - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.8
Fix Version/s: None
Component/s: .NET Client, C++ Broker
Labels:
None
Environment:

for broker Centos 5.3
for Client Windows 7

Description

I have 2 .net clients. They sends messages to fanout or direct exchange and both consume messages which where sended. From time to time on of the brokes go down with message in log:

2010-12-20 23:54:19 debug guest@QPID.1228bb68f-12398-14441-1a789-13775359c9489: receiver marked completed: 32 incomplete: { } unknown-completed:

{ [1,32] }

2010-12-20 23:54:19 trace cluster(192.168.44.135:3927 READY) DLVR 6058: Frame[BEbe; channel=0;

{ClusterConnectionDeliverDoOutputBody: limit=2048; }

] control 192.168.44.134:457-267
2010-12-20 23:54:19 debug Sufficient credit for 15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c on guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c, have bytes: 4294967295 msgs: 9884, need 147 bytes
2010-12-20 23:54:19 debug Credit allocated for 15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c on guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c, was bytes: 4294967295 msgs: 9884 now bytes: 4294967295 msgs: 9883
2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: sent cmd 116:

{MessageTransferBody: destination=15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c; accept-mode=0; acquire-mode=0; }

2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: sent cmd 116: header (84 bytes); properties={{MessageProperties: message-id=49568fee-5825-49b6-beb7-7ca4ee942b44; content-type=SerializableObject; content-encoding=SerializableObject; application-headers={}; }{DeliveryProperties: exchange=test; routing-key=; }}
2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: sent cmd 116: content (63 bytes) \x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF\x01\x00\x00\x00\x00\x00\x00\x00\x06\x01\x00\x00\x00'Message I...
2010-12-20 23:54:19 debug No messages to dispatch on queue '15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c'
2010-12-20 23:54:19 trace cluster(192.168.44.135:3927 READY) DLVR 6059: Frame[BEbe; channel=1; {MessageAcceptBody: transfers=

{ [116,116] }; }] data 192.168.44.134:457-267 read-credit=1
2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: recv cmd 123: {MessageAcceptBody: transfers={ [116,116] }

; }
2010-12-20 23:54:19 debug DeliveryRecord::setEnded() id=116
2010-12-20 23:54:19 debug Accepted 116
2010-12-20 23:54:19 debug guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: receiver marked completed: 123 incomplete: { } unknown-completed:

{ [1,123] }

2010-12-20 23:54:20 trace Sending cluster timer wakeup ManagementAgent::periodicProcessing
2010-12-20 23:54:20 trace MCAST Event[192.168.44.135:3927-0 Frame[BEbe; channel=0;

{ClusterTimerWakeupBody: name=ManagementAgent::periodicProcessing; }

]]
2010-12-20 23:54:23 warning LinkRegistry timer woken up 2998ms late
2010-12-20 23:54:25 debug Exception constructed: Cannot mcast to CPG group klaster: library (2)
2010-12-20 23:54:25 critical Multicast error: Cannot mcast to CPG group klaster: library (2)
2010-12-20 23:54:25 notice cluster(192.168.44.135:3927 LEFT) leaving cluster klaster
2010-12-20 23:54:25 trace SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2010-12-20 23:54:25 trace SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2010-12-20 23:54:25 trace SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2010-12-20 23:54:25 debug Shutting down CPG
2010-12-20 23:54:26 notice Shut down

I have 2 C++ brokers run in cluster.
I always start them by:

qpidd --auth no --trace --log-to-file /var/log/QPIDLOG.log --daemon --cluster-name=klaster

Corosync config:

totem {
version: 2
secauth: off
threads: 0
interface

{ ringnumber: 0 ## You must change this address ## bindnetaddr: 192.168.44.0 mcastaddr: 226.94.32.36 mcastport: 5405 }

}

logging {
debug: off
timestamp: on
to_file: yes
logfile: /tmp/aisexec.log
}

amf {
mode: disabled
}

Log from corosync:

Dec 20 21:56:47 corosync [MAIN ] Completed service synchronization, ready to provide service.
Dec 20 23:54:25 corosync [TOTEM ] Process pause detected for 3632 ms, flushing membership messages.
Dec 20 23:54:25 corosync [TOTEM ] A processor failed, forming new configuration.
Dec 20 23:54:27 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 20 23:54:27 corosync [MAIN ] Completed service synchronization, ready to provide service.
Dec 20 23:54:29 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 20 23:54:29 corosync [MAIN ] Completed service synchronization, ready to provide service.
Dec 21 00:03:04 corosync [TOTEM ] A processor failed, forming new configuration.
Dec 21 00:03:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 21 00:03:05 corosync [MAIN ] Completed service synchronization, ready to provide service.
Dec 21 00:13:06 corosync [TOTEM ] A processor failed, forming new configuration.
Dec 21 00:13:07 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 21 00:13:07 corosync [MAIN ] Completed service synchronization, ready to provide service.
Dec 21 03:13:24 corosync [TOTEM ] A processor failed, forming new configuration.
Dec 21 03:13:25 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 21 03:13:25 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 21 03:13:25 corosync [MAIN ] Completed service synchronization, ready to provide service.
Dec 21 07:30:59 corosync [TOTEM ] Process pause detected for 5544 ms, flushing membership messages.

Broker in cluster goes down

Details

Description

Attachments

Activity

People

Dates