[AMQ-443] ReliableTransport / KeepAlive algorithm does not work properly. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2, 3.2.1
Fix Version/s: 4.0
Component/s: Broker, Transport
Labels:
None
Environment:

Solaris 8 / 10. JDK 1.5

Description

The current implementation of KeepAliveDaemon.java will sometimes force disconnections on well behaved connections. The problem may arrise if there is a connection which goes away, and the KeepAlive send to that channel blocks while attempting to reconnect. If this reconnection takes a while, then other channels that were responding fine may get their connections broken. This happens due to the following code in KeepAliveDaemon.java:

if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis())

{ or }

else if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout()) < System.currentTimeMillis()) {

The fact that the receipt timestamp is checked against System.currentTimeMillis() causes the code to break otherwise good connections. If a KeepAlive send (in examineChannel) for a broken channel takes longer than some good channel's KeepAliveTimeout, then the good connection gets broken.

This can, in turn, cause some pretty bad behavior in the Broker. While testing and diagnosing this problem, I could some brokers in a network of brokers stuck. The sequence of events during recovery, which get interrupted due to closing the connections, would sometimes lead to the broker hanging waiting for a receipt, such as during an addConsumer (which eventually calls syncSendWithReceipt).

I have redone the logic in KeepAliveDaemon.java (which required a small change to ReliableTransportChannel as well). This now seems to work.

I'm a bit concerned about the blocking calls, though. This may be a different issue / bug. I thought it looked like there was a mechanism to cancel outstanding receipt waiters - but, every once in a while that mechanism would not get called. This results in the broker basically getting stuck, and does not ever really recover.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--KeepAliveDaemon.java
15/Dec/05 21:02
9 kB
Kevin Yaussy
ASF.LICENSE.NOT.GRANTED--ReliableTransportChannel.java
15/Dec/05 21:02
9 kB
Kevin Yaussy

Activity

People

Assignee:: Unassigned

Reporter:: Kevin Yaussy

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 15/Dec/05 21:02

Updated:: 15/Jun/06 19:42

Resolved:: 15/Jun/06 10:47