[YUNIKORN-251] Post recovery release a pod may cause the release of pods within the same app even they are running - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9
Component/s: shim - kubernetes
Labels:
- pull-request-available

Description

I found this issue while testing recovery. It can be reproduced with the following steps:

Create an application, it launches multiple pods, keeps them running
Restart the scheduler, the scheduler will recover the allocations based on allocated pods
App gets recovered, so as its pods
Kill one of the pod

Expectation: only one pod gets released and removed from this app. But I saw: all existing allocations are released.

Attachments

Issue Links

links to

GitHub Pull Request #140

GitHub Pull Request #179

Activity

People

Assignee:: Weiwei Yang

Reporter:: Weiwei Yang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jun/20 22:02

Updated:: 21/Jan/22 21:47

Resolved:: 26/Jun/20 06:31