[YARN-11277] trigger deletion of log-dir by size for NonAggregatingLogHandler - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: nodemanager
Labels:
- pull-request-available

Target Version/s:

3.4.0
Hadoop Flags:

Reviewed

Description

In our yarn cluster, the log files of some containers are too large, which causes the NodeManager to frequently switch to the unhealthy state. For logs that are too large, we can consider deleting them directly without delaying yarn.nodemanager.log.retain-seconds.

Cluster environment:

8k nodes+
50w+ apps / day

Configuration:

yarn.nodemanager.log.retain-seconds=3days
yarn.log-aggregation-enable=false

Attachments

Issue Links

links to

GitHub Pull Request #4797

Activity

People

Assignee:: Xianming Lei

Reporter:: Xianming Lei

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Aug/22 03:16

Updated:: 05/Jun/23 03:10

Resolved:: 05/Jun/23 03:08