Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.14.3
-
None
-
None
Description
The Flink version is 1.14.3, and the job is submitted to Kubernetes using the Native Kubernetes application mode. During the scheduling process, when a TaskManager pod crashes due to an exception, Kubernetes will attempt to start a new TaskManager pod. However, the scheduling process is halted immediately, resulting in the entire Flink job being terminated. On the other hand, if the JobManager pod crashes, Kubernetes is able to successfully schedule a new JobManager pod. This observation was made during application usage. Can you please help analyze the underlying issue?