[YUNIKORN-829] Produce metrics on queue-level resource utilization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: None
Component/s: core - scheduler, shim - kubernetes
Labels:
None

Target Version:

1.0.0

Description

YuniKorn already has metrics on the resources requested/allocated for each queue. But we have no visibility into how much of the allocated resources are actually being used. Take Spark as an example, an under-optimized job may request 1 TB of total executor memory but the actual processing logic only uses 100 GB. This has the consequence that other jobs might not be able to fit in the queue. Having a metric that shows the real utilization will help members of a queue better understand their job characteristics and optimize the jobs.

K8s metrics server has metrics on real utilization. YK may be able to perform some aggregations to arrive at the stats at the queue level. This is a k8s-specific solution though.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Chaoran Yu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Aug/21 00:27

Updated:: 05/Jan/22 02:28

Resolved:: 05/Jan/22 02:28