Description
I want the ability to specify a threshold on a network connection where a message will not be delivered if it is within N milliseconds of its expiration time. This will allow me to tell my ActiveMQ brokers that I expect latency to exist on a given network connection, so the broker can avoid using bandwidth to deliver messages that will be dead on arrival and instead use that bandwidth to deliver messages that will be alive when they are received.
The setting should be on the networkConnector, to allow different values for different broker-to-broker links, and should control whether to deliver messages in both directions. Since it's on the networkConnector, that setting can only be applied to broker-to-broker connections.
It might also be worth having a setting on the transportConnector that would apply to all connections to that connector, to simplify configuration if all networkConnectors into a given broker will have the same latency; that would also allow the setting to be applied to the delivery of messages to non-broker consumers (though there should probably be a flag for whether to apply it to non-broker consumers as well). The setting on a networkConnector should override the value on a transportConnector, since it is specific to a single connection. But having a setting on the transportConnector is a lower priority than having a setting on the networkConnector.
The default should be 0 (so all messages that haven't actually expired would be forwarded, just as they are now), but if I know that my network path has a certain latency, I should be able to configure the broker to not even try delivering messages that I know aren't likely to make it to an end consumer, so that messages that will can be sent instead.
It would be great to eventually determine this adaptively to allow the brokers to react to changing network conditions and to make configuration simpler, but for a first implementation, manual static configuration would be fine. That longer-term implementation would probably need to account for the full end-to-end RTT for messages from producers to consumers (because looking at only the next network link wouldn't guarantee that the message wouldn't get discarded at a second slow link later in the path), so I don't expect it to happen anytime soon, maybe ever.
— PROBLEM DESCRIPTION —
When my producer on one side of a high-latency WAN sends faster than our meager allocation of the WAN's bandwidth, I quickly see all messages fail to be delivered to the end consumer.
These are the three critical elements of the problem, which all have to be present for it to happen:
1. Messages have a TTL set (the same for all messages), so they'll eventually expire.
2. Producers are sending messages faster (in aggregate) than our bandwidth allocation on the WAN. This means we're guaranteed to not deliver some of the messages to the end consumer.
3. There is a non-trivial amount of latency across the WAN.
As messages are sent, they begin queuing on the sender-side broker. As time goes on, the messages that are still in the producer-side broker's message store get closer and closer to expiring, until eventually the message at the head of the message store is within the WAN's latency value (e.g. 100ms) of the message's expiration time. The amount of time it takes for this to happen depends on how long it takes messages to time out and on the difference between the producer's send rate and the WAN's bandwidth, but it will eventually happen. This message will be sent by the producer-side broker (because although it's really close to expiring, it hasn't expired yet), but when it's received by the consumer-side broker, an amount of time equal to the WAN latency has passed, so it's expired and gets discarded by the consumer-side broker instead of getting delivered to the consumer.
From this point onwards, no messages will get delivered to the consumer. As the messages in the producer-side broker's message store get closer to and eventually reach their expiration times, each message at the head of the message store will either be within the WAN latency of its timeout or after its timeout. If the former, it will get sent across the WAN but discarded by the consumer-side broker; if the latter, it will get discarded by the producer-side broker and that broker will find the next message in the message store that isn't yet expired (but will be by the time it arrives) and send it instead. As a result, all messages from that point onward either expire on the producer-side broker or the consumer-side broker. Even though there are lots of messages in the producer-side broker's message store that could be delivered successfully, ActiveMQ instead sends the first message in the message store even though an outside observer knows it will just get thrown away.
There should be a way to have ActiveMQ prioritize messages that are expected to reach an end consumer over ones that are expected to time out before they get there, to minimize wasteful use of scarce resources such as network links.