Details
-
Umbrella
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Slider 0.30
-
None
-
None
-
Slider August #1, Slider August #2, Slider September #1
Description
Slider-deployed applications should be robust against failure of
- the App Master: app to keep running, component registration data to remain; restarted AM to bring application back into desired state with minimal container loss/restart
- the RM: AM and containers to continue
- Nodes & Racks containing containers: AM to request replacement containers.
- network partitions. Rely on NMs to kill containers, treat as a rack/node failure.
Attachments
Issue Links
- depends upon
-
SLIDER-285 Slider Agents to bind and work with restarted AM
- Resolved
1.
|
use a window for tracking container failures | Resolved | Steve Loughran |
|
||||||||||
2.
|
Slider to work on HA NNs | Resolved | Steve Loughran |
|
||||||||||
3.
|
slider registry code to gracefully handle (transient) ZK outages | Resolved | Unassigned | |||||||||||
4.
|
add integral/configurable chaos monkey to slider AM | Resolved | Steve Loughran |
|
||||||||||
5.
|
Implement scalable failure threshold based on percentage of instances failing over a time period | Resolved | Unassigned | |||||||||||
6.
|
AM must react to NM failure events by releasing containers | Resolved | Steve Loughran | |||||||||||
7.
|
AM to notify providers when restarted | Resolved | Steve Loughran |
|
||||||||||
8.
|
AM to notify containers on managed container release | Resolved | Steve Loughran | |||||||||||
9.
|
add scheduled executor to AM and event queue for async action dispatch | Resolved | Steve Loughran |
|
||||||||||
10.
|
failure thresholds to be settable per-role | Resolved | Steve Loughran |
|
||||||||||
11.
|
factor out internal keys into class InternalKeys | Resolved | Steve Loughran |
|