Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-2983

Broker in cluster goes down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.8
    • None
    • .NET Client, C++ Broker
    • None
    • for broker Centos 5.3
      for Client Windows 7

    Description

      I have 2 .net clients. They sends messages to fanout or direct exchange and both consume messages which where sended. From time to time on of the brokes go down with message in log:

      2010-12-20 23:54:19 debug guest@QPID.1228bb68f-12398-14441-1a789-13775359c9489: receiver marked completed: 32 incomplete: { } unknown-completed:

      { [1,32] }

      2010-12-20 23:54:19 trace cluster(192.168.44.135:3927 READY) DLVR 6058: Frame[BEbe; channel=0;

      {ClusterConnectionDeliverDoOutputBody: limit=2048; }

      ] control 192.168.44.134:457-267
      2010-12-20 23:54:19 debug Sufficient credit for 15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c on guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c, have bytes: 4294967295 msgs: 9884, need 147 bytes
      2010-12-20 23:54:19 debug Credit allocated for 15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c on guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c, was bytes: 4294967295 msgs: 9884 now bytes: 4294967295 msgs: 9883
      2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: sent cmd 116:

      {MessageTransferBody: destination=15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c; accept-mode=0; acquire-mode=0; }

      2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: sent cmd 116: header (84 bytes); properties={{MessageProperties: message-id=49568fee-5825-49b6-beb7-7ca4ee942b44; content-type=SerializableObject; content-encoding=SerializableObject; application-headers={}; }{DeliveryProperties: exchange=test; routing-key=; }}
      2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: sent cmd 116: content (63 bytes) \x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF\x01\x00\x00\x00\x00\x00\x00\x00\x06\x01\x00\x00\x00'Message I...
      2010-12-20 23:54:19 debug No messages to dispatch on queue '15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c'
      2010-12-20 23:54:19 trace cluster(192.168.44.135:3927 READY) DLVR 6059: Frame[BEbe; channel=1; {MessageAcceptBody: transfers=

      { [116,116] }; }] data 192.168.44.134:457-267 read-credit=1
      2010-12-20 23:54:19 trace guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: recv cmd 123: {MessageAcceptBody: transfers={ [116,116] }

      ; }
      2010-12-20 23:54:19 debug DeliveryRecord::setEnded() id=116
      2010-12-20 23:54:19 debug Accepted 116
      2010-12-20 23:54:19 debug guest@QPID.15c72e661-1a54d-14663-1b3cf-1dfc4e7d90e0c: receiver marked completed: 123 incomplete: { } unknown-completed:

      { [1,123] }

      2010-12-20 23:54:20 trace Sending cluster timer wakeup ManagementAgent::periodicProcessing
      2010-12-20 23:54:20 trace MCAST Event[192.168.44.135:3927-0 Frame[BEbe; channel=0;

      {ClusterTimerWakeupBody: name=ManagementAgent::periodicProcessing; }

      ]]
      2010-12-20 23:54:23 warning LinkRegistry timer woken up 2998ms late
      2010-12-20 23:54:25 debug Exception constructed: Cannot mcast to CPG group klaster: library (2)
      2010-12-20 23:54:25 critical Multicast error: Cannot mcast to CPG group klaster: library (2)
      2010-12-20 23:54:25 notice cluster(192.168.44.135:3927 LEFT) leaving cluster klaster
      2010-12-20 23:54:25 trace SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
      2010-12-20 23:54:25 trace SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
      2010-12-20 23:54:25 trace SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
      2010-12-20 23:54:25 debug Shutting down CPG
      2010-12-20 23:54:26 notice Shut down

      I have 2 C++ brokers run in cluster.
      I always start them by:

      qpidd --auth no --trace --log-to-file /var/log/QPIDLOG.log --daemon --cluster-name=klaster

      Corosync config:

      totem {
      version: 2
      secauth: off
      threads: 0
      interface

      { ringnumber: 0 ## You must change this address ## bindnetaddr: 192.168.44.0 mcastaddr: 226.94.32.36 mcastport: 5405 }

      }

      logging {
      debug: off
      timestamp: on
      to_file: yes
      logfile: /tmp/aisexec.log
      }

      amf {
      mode: disabled
      }

      Log from corosync:

      Dec 20 21:56:47 corosync [MAIN ] Completed service synchronization, ready to provide service.
      Dec 20 23:54:25 corosync [TOTEM ] Process pause detected for 3632 ms, flushing membership messages.
      Dec 20 23:54:25 corosync [TOTEM ] A processor failed, forming new configuration.
      Dec 20 23:54:27 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
      Dec 20 23:54:27 corosync [MAIN ] Completed service synchronization, ready to provide service.
      Dec 20 23:54:29 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
      Dec 20 23:54:29 corosync [MAIN ] Completed service synchronization, ready to provide service.
      Dec 21 00:03:04 corosync [TOTEM ] A processor failed, forming new configuration.
      Dec 21 00:03:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
      Dec 21 00:03:05 corosync [MAIN ] Completed service synchronization, ready to provide service.
      Dec 21 00:13:06 corosync [TOTEM ] A processor failed, forming new configuration.
      Dec 21 00:13:07 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
      Dec 21 00:13:07 corosync [MAIN ] Completed service synchronization, ready to provide service.
      Dec 21 03:13:24 corosync [TOTEM ] A processor failed, forming new configuration.
      Dec 21 03:13:25 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
      Dec 21 03:13:25 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
      Dec 21 03:13:25 corosync [MAIN ] Completed service synchronization, ready to provide service.
      Dec 21 07:30:59 corosync [TOTEM ] Process pause detected for 5544 ms, flushing membership messages.

      Attachments

        Activity

          People

            Unassigned Unassigned
            adamka Adam
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: