[FLINK-33096] Flink on k8s，if one taskmanager pod was crashed，the whole flink job will be failed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.14.3
Fix Version/s: None
Component/s: Deployment / Kubernetes
Labels:
None

Language:
- English

Description

The Flink version is 1.14.3, and the job is submitted to Kubernetes using the Native Kubernetes application mode. During the scheduling process, when a TaskManager pod crashes due to an exception, Kubernetes will attempt to start a new TaskManager pod. However, the scheduling process is halted immediately, resulting in the entire Flink job being terminated. On the other hand, if the JobManager pod crashes, Kubernetes is able to successfully schedule a new JobManager pod. This observation was made during application usage. Can you please help analyze the underlying issue?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: wawa

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Sep/23 13:45

Updated:: 21/Sep/23 14:52