Details
Description
Suppose the mapred user has no access to the remote folder. Pinging the JHS if it's online in every few seconds will produce the following entry in the log:
2020-05-19 00:17:20,331 WARN org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController: Unable to determine if the filesystem supports append operation java.nio.file.AccessDeniedException: test-bucket: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role for the group(s) associated with the authenticated user. (user: mapred) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204) [...] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513) at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157) at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149) at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135) at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.<init>(LogAggregationFileControllerFactory.java:139) at org.apache.hadoop.yarn.server.webapp.LogServlet.<init>(LogServlet.java:66) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.<init>(HsWebServices.java:99) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance(<generated>) at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) [...] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) at java.lang.Thread.run(Thread.java:748)
We should only create the LogAggregationFactory instance when we actually need it, not every time the LogServlet object is instantiated (so definitely not in the constructor). In this way we prevent pressure on the S3A auth side, especially if the authentication request is a costly operation.