Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.13.1, 1.12.4
Description
In a streaming job with multiple RMQSources, a stop-with-savepoint request has unexpected behavior. Regular checkpoints and savepoints complete successfully, it is only the stop-with-savepoint request where this behavior is seen.
Expected Behavior:
The stop-with-savepoint request stops the job with a FINISHED state.
Actual Behavior:
The stop-with-savepoint request either times out or hangs indefinitely unless a message arrives in all the queues that the job consumes from after the stop-with-savepoint request is made.
Current workaround:
Send a sentinel value to each of the queues consumed by the job that the deserialization schema checks in its isEndOfStream method. This is cumbersome and makes it difficult to do stateful upgrades, as coordination with another system is now necessary.
The TaskManager thread dump is attached.
Attachments
Attachments
Issue Links
- is related to
-
FLINK-23322 RMQSourceITCase.testStopWithSavepoint fails on azure due to timeout
- Resolved
- links to