Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Context:
In native Kubernetes deployment Flink creates a headless service for JobManager's RPC calls. The description down below is only relevant for Flink deployment in Application mode.
When there are livenessProbe and/or readinessProbe are defined with initialDelaySeconds, created instances of TaskManager have to wait until JobManager's probes are green, before they are able to connect to the JobManager.
Probes configuration:
- name: flink-main-container livenessProbe: httpGet: path: /jobs/overview port: rest scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 6 successThreshold: 1 timeoutSeconds: 5 readinessProbe: httpGet: path: /jobs/overview port: rest scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 6 successThreshold: 1 timeoutSeconds: 5
During this period there are log messages in the TaskManager like:
Failed to connect to [dev-pipeline.dev-namespace:6123] from local address [dev-pipeline-taskmanager-1-1/11.41.6.81] with timeout [200] due to: dev-pipeline.dev-namespace
Issue:
Because initialization time of different Flink jobs (read: Flink deployments) can vary in a wide range, it would be convenient to have a common configuration for livenessProbe and/or readinessProbe for all deployments, which will then cover the worst case, instead of tuning it for every deployment. On the other hand, it would be nice to reduce the job's bootstrap time as a whole, because the jobs' re-deployment in our case happens often and it affects response time of incoming requests from clients.
Solution:
To reduce the job's bootstrap time as a whole one solution could be to set publishNotReadyAddresses flag via config parameter in jobmanager's RPC Kubernetes service, so that created instance of a taskmanager can connect to the jobmanager immediately.
Publishing "not ready" JobManager's RPC should not cause any issue, because the TaskManager instances in Kubernetes native deployment are created by a ResourceManager, which is part of the JobManager, which in turn guarantees, that JobManager is ready and ExecutionGraph was built successfully when a TaskManager is starting.
Making this flag optional guarantees, that such approach will work correctly, when the flag is disabled and JobManager High Availability is defined, which in turn involves the leader election.
Affected Classes:
- org.apache.flink.kubernetes.kubeclient.services.HeadlessClusterIPService - by adding one line .withPublishNotReadyAddresses(kubernetesJobManagerParameters.isPublishNotReadyAddresses()) in {{Service buildUpInternalService(
KubernetesJobManagerParameters kubernetesJobManagerParameters)}} - org.apache.flink.kubernetes.configuration.KubernetesConfigOptions - by adding something like kubernetes.jobmanager.rpc.service.publish-not-ready-addresses option
- org.apache.flink.kubernetes.kubeclient.parameters.KubernetesJobManagerParameters - by adding the get method for the parameter: {{public boolean isPublishNotReadyAddresses() { return flinkConfig.getBoolean(KubernetesConfigOptions.KUBERNETES_JOBMANAGER_RPC_SERVICE_PUBLISH_NOT_READY_ADDRESSES); }}}
- Tests to cover the new parameter
If there is a decision, that such improvement worth to be part of Flink, I am ready to provide a PR for it.