[AURORA-690] Add support for external update coordination - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Epic
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.8.0
Component/s: Client, Scheduler
Labels:
None

Epic Name:
Heartbeats
Sprint:
Aurora Q4 Sprint 1

Description

With the introduction of scheduler-driven job update orchestration (~~AURORA-610~~) it will be a bit harder for a user to interrupt a job update process went wrong (i.e. bad binary, incorrect settings, changed external conditions and etc.). Instead of aborting the update process via CTRL-C (client updater) users would have to run abort/pause command that risk to never reach scheduler in case of client network partitioning.

To compensate the above, it would be great for the scheduler to optionally support an inverted dependency model where the updater would willingly pause job update progress upon reaching certain checkpoints and wait for the client/external service to explicitly "ack" on it (i.e. resumeJobUpdate RPC). Such checkpoints could be:

predefined number of instances reached
percentage of completion
time-based heartbeat (HB) intervals

Arguably, the time-based HB approach should be the most versatile addressing the majority case.

Generalizing further, this feature would be useful for building external update coordination services where Aurora service job upgrades are controlled by application specific health tracking systems throttling individual job updates based on the internal health/traffic metrics.

Attachments

Issue Links

relates to

AURORA-610 Job update orchestration in the scheduler

Resolved

Activity

People

Assignee:: Maxim Khutornenko

Reporter:: Maxim Khutornenko

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Sep/14 17:23

Updated:: 23/Feb/15 22:58

Resolved:: 23/Feb/15 22:58