Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
5.15.8
-
None
-
None
Description
A collegue of mine showed me an an odd behaviour with two brokers connected as Master/Slave via JDBC (on MySQL 5.7) today and we looked a bit into it:
When the first broker was stopped, the second one didn't take over immediately but only 12 minutes later. We could repeat this.
When the first broker was then restarted, it immediately took the lock back from the second one, if that was Master yet, causing it to go into slave mode again.
When the second one had not yet taken over, the first just took back the lock upon restart.
As we had multiple similar setups running and not seen a similar behaviour anywhere else, we looked into what's happening on the DB used for the synchronization.
We discovered that the timestamp, written into the activemq_lock table's TIME column was filled, using the system time of the broker writing to it.
We could then verify that the NTP synchronization on the 2nd machine was off by the 12 minutes. So the slave broker in its relative past was working as designed, waiting orderly until the lock had expired.
TL;DR:
The DB's system time should be used as source of truth instead of the individual broker's time, so the brokers just need to ask the DB if the lock is still valid (setting it with TIME = SYSDATE + <validity interval>, checking if TIME < SYSDATE)
A hint in the documentation to keep the time synchronization could probably also solve it