Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
In AWS EMR on EKS service, the driver real pod's ownerReference is configmap.
And placeholder's ownerReference is also the driver configmap.
When user cancels emr-containers job, the job-submitter is terminated,
but the placeholder still remains in pending state.
https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html
Environment
- EKS 1.22
- EMR 6.9 release (Spark 3.3.0)
- Yunikorn 1.2
- gang scheduling enabled
placeholders event log
Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yamlEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduling 19m yunikorn namespace/tg-driver-spark-000000031ttjn13iom3-0 is queued and waiting for allocation Normal PodUnschedulable 19m yunikorn Task namespace/tg-driver-spark-000000031ttjn13iom3-0 is pending for the requested resources become available Warning FailedProvisioning 19m karpenter Failed to provision new node
placeholders spec
apiVersion: v1 kind: Pod metadata: name: tg-driver-spark-000000031tu35ohgkc6-0 namespace: namespace uid: 80601a03-565c-4d0e-88c7-8c66b590871e resourceVersion: '546358515' creationTimestamp: '2023-04-26T15:06:06Z' labels: applicationId: spark-000000031tu35ohgkc6 placeholder: 'true' queue: root.beta annotations: yunikorn.apache.org/placeholder: 'true' yunikorn.apache.org/schedulingPolicyParameters: placeholderTimeoutSeconds=300 yunikorn.apache.org/task-group-name: driver yunikorn.apache.org/task-groups: >- [{"name": "driver","minResource":{"cpu": "1","memory":"2Gi"},"minMember":1,"nodeSelector":{"karpenter.sh/provisioner-name":"test"}},{"name": "executor","minResource":{"cpu": "1","memory":"5Gi"},"minMember":1,"nodeSelector":{"karpenter.sh/provisioner-name":"test"}}] ownerReferences: - apiVersion: batch/v1 kind: ConfigMap name: 000000031tu35ohgkc6-spark-defaults uid: a3044750-c8b5-47b4-9efa-81bd4b064798 controller: false blockOwnerDeletion: true - manager: k8s_yunikorn_scheduler operation: Update apiVersion: v1 time: '2023-04-26T15:06:08Z' fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: .: {} k:{"type":"PodScheduled"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} subresource: status selfLink: >- /api/v1/namespaces/namespace/pods/tg-driver-spark-000000031tu35ohgkc6-0 status: phase: Pending conditions: - type: PodScheduled status: 'False' lastProbeTime: null lastTransitionTime: '2023-04-26T15:06:08Z' reason: Unschedulable message: request is waiting for cluster resources become available qosClass: Burstable spec: volumes: - name: kube-api-access-gvxxk projected: sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: name: kube-root-ca.crt items: - key: ca.crt path: ca.crt - downwardAPI: items: - path: namespace fieldRef: apiVersion: v1 fieldPath: metadata.namespace defaultMode: 420 containers: - name: pause image: registry.k8s.io/pause:3.7 resources: requests: cpu: '1' memory: 2Gi volumeMounts: - name: kube-api-access-gvxxk readOnly: true mountPath: /var/run/secrets/kubernetes.io/serviceaccount terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent restartPolicy: Never terminationGracePeriodSeconds: 30 nodeSelector: karpenter.sh/provisioner-name: test serviceAccountName: default serviceAccount: default securityContext: runAsUser: 1000 runAsGroup: 3000 schedulerName: yunikorn priority: 0 preemptionPolicy: PreemptLowerPriority
Attachments
Issue Links
- links to