Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
The commit ca683cb9e27bae76424a687bc6c3af5a73c501b9 is not backwards compatible. The last section of the commit
4. Modified the Health Checker and redefined the meaning initial_interval_secs.
has serious, unintended consequences.
Consider the following health check config:
initial_interval_secs: 10 interval_secs: 5 max_consecutive_failures: 1
On the 0.16.0 executor, no health checking will occur for the first 10 seconds. Here the earliest a task can cause failure is at the 10th second.
On master, health checking starts right away which means the task can fail at the first second since max_consecutive_failures is set to 1.
This is not backwards compatible and needs to be fixed.
I think a good solution would be to revert the meaning change to initial_interval_secs and have the task transition into RUNNING when max_consecutive_successes is met.
An investigation shows initial_interval_secs was set to 5 but the task failed health checks right away:
D1011 19:52:13.295877 6 health_checker.py:107] Health checks enabled. Performing health check. D1011 19:52:13.306816 6 health_checker.py:126] Reset consecutive failures counter. D1011 19:52:13.307032 6 health_checker.py:132] Initial interval expired. W1011 19:52:13.307130 6 health_checker.py:135] Failed to reach minimum consecutive successes.
Attachments
Issue Links
- is related to
-
AURORA-1793 Revert Commit ca683 which is not backwards compatible
- Resolved