Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.4.0
-
Reviewed
Description
In our yarn cluster, the log files of some containers are too large, which causes the NodeManager to frequently switch to the unhealthy state. For logs that are too large, we can consider deleting them directly without delaying yarn.nodemanager.log.retain-seconds.
Cluster environment:
- 8k nodes+
- 50w+ apps / day
Configuration:
- yarn.nodemanager.log.retain-seconds=3days
- yarn.log-aggregation-enable=false