Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
2022-04-22 10:19:18,307 o.a.f.k.o.c.FlinkDeploymentController [WARN ][default/flink-example-statemachine] Attempt count: 5, last attempt: true 2022-04-22 10:19:18,329 i.j.o.p.e.ReconciliationDispatcher [ERROR][default/flink-example-statemachine] Error during event processing ExecutionScope{ resource id: CustomResourceID{name='flink-example-statemachine', namespace='default'}, version: 4979543} failed. org.apache.flink.kubernetes.operator.exception.ReconciliationException: java.lang.NullPointerException at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:110) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:53) at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:101) at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:76) at io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34) at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:75) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:143) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:109) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:50) at io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:349) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: java.lang.NullPointerException at org.apache.flink.kubernetes.operator.utils.FlinkUtils.lambda$deleteJobGraphInKubernetesHA$0(FlinkUtils.java:253) at java.base/java.util.ArrayList.forEach(Unknown Source) at org.apache.flink.kubernetes.operator.utils.FlinkUtils.deleteJobGraphInKubernetesHA(FlinkUtils.java:248) at org.apache.flink.kubernetes.operator.service.FlinkService.submitApplicationCluster(FlinkService.java:130) at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.deployFlinkJob(ApplicationReconciler.java:205) at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.restoreFromLastSavepoint(ApplicationReconciler.java:218) at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.reconcile(ApplicationReconciler.java:117) at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.reconcile(ApplicationReconciler.java:56) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:106) ... 13 more
The root cause is that the Kubernetes HA implementation has changed from 1.15. When the job is cancelled, the data of leader ConfigMap will be cleared.
Attachments
Issue Links
- duplicates
-
FLINK-27359 Kubernetes operator throws NPE when testing with Flink 1.15
- Closed
- is related to
-
FLINK-24038 DispatcherResourceManagerComponent fails to deregister application if no leading ResourceManager
- Closed
- links to