Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
-
Patch
Description
Before submiting a log aggreation runnable, LogAggregationService will try to create the aggreated log dir.
In some case, it may fail(e.g dir num exceed max limit)
When it did failed and submitted to LogAggregationService, the runnable may run forever if some app statue flip misbehavior(e.g not handling application complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl be always true).
In our production(Version 2.7.3), this cause huge number of dangling aggregator(~400+ LogAggregationService threads alive for some node, in which nodemanager configured only 50+ vCPUs).
The patch try to early throw the creation exception, avoiding starting unnecessary log polling.
Attachments
Attachments
Issue Links
- duplicates
-
YARN-4984 LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
- Resolved
- links to