Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6489

Fail fast rogue tasks that write too much to local disk

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 3.0.0-alpha1
    • task
    • None
    • Reviewed

    Description

      Tasks of the rogue jobs can write too much to local disk, negatively affecting the jobs running in collocated containers. Ideally YARN will be able to limit amount of local disk used by each task: YARN-4011. Until then, the mapreduce task can fail fast if the task is writing too much (above a configured threshold) to local disk.

      As we discussed here the suggested approach is that the MapReduce task checks for BYTES_WRITTEN counter for the local disk and throws an exception when it goes beyond a configured value. It is true that written bytes is larger than the actual used disk space, but to detect a rogue task the exact value is not required and a very large value for written bytes to local disk is a good indicative that the task is misbehaving.

      Attachments

        1. MAPREDUCE-6489.001.patch
          9 kB
          Maysam Yabandeh
        2. MAPREDUCE-6489.002.patch
          10 kB
          Maysam Yabandeh
        3. MAPREDUCE-6489.003.patch
          12 kB
          Maysam Yabandeh
        4. MAPREDUCE-6489-branch-2.003.patch
          12 kB
          Maysam Yabandeh

        Issue Links

          Activity

            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Hdfs-trunk #2458 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2458/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/CHANGES.txt
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2458 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2458/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #521 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/521/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            • hadoop-mapreduce-project/CHANGES.txt
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #521 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/521/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Mapreduce-trunk #2509 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2509/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/CHANGES.txt
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2509 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2509/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Yarn-trunk #1297 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1297/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            • hadoop-mapreduce-project/CHANGES.txt
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1297 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1297/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #576 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/576/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            • hadoop-mapreduce-project/CHANGES.txt
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #576 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/576/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java hadoop-mapreduce-project/CHANGES.txt
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #562 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/562/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            • hadoop-mapreduce-project/CHANGES.txt
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #562 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/562/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java hadoop-mapreduce-project/CHANGES.txt
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-trunk-Commit #8675 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8675/)
            MAPREDUCE-6489. Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5)

            • hadoop-mapreduce-project/CHANGES.txt
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
            • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8675 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8675/ ) MAPREDUCE-6489 . Fail fast rogue tasks that write too much to local disk. (jlowe: rev cb26cd4bee8ab75b304ebad6dc7c77523d0e9ce5) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestTaskProgressReporter.java

            Thanks, Maysam! I committed this to trunk and branch-2.

            jlowe Jason Darrell Lowe added a comment - Thanks, Maysam! I committed this to trunk and branch-2.

            +1, committing this.

            jlowe Jason Darrell Lowe added a comment - +1, committing this.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 17m 36s Pre-patch branch-2 compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
            +1 javac 6m 39s There were no new javac warning messages.
            +1 javadoc 10m 33s There were no new javadoc warning messages.
            +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
            -1 checkstyle 0m 48s The applied patch generated 2 new checkstyle issues (total was 726, now 724).
            +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            +1 install 1m 18s mvn install still works.
            +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
            +1 findbugs 1m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 mapreduce tests 1m 50s Tests passed in hadoop-mapreduce-client-core.
                41m 11s  



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12767701/MAPREDUCE-6489-branch-2.003.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision branch-2 / 4921420
            checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt
            hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
            Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/testReport/
            Java 1.7.0_55
            uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 36s Pre-patch branch-2 compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 6m 39s There were no new javac warning messages. +1 javadoc 10m 33s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 48s The applied patch generated 2 new checkstyle issues (total was 726, now 724). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 18s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 1m 50s Tests passed in hadoop-mapreduce-client-core.     41m 11s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12767701/MAPREDUCE-6489-branch-2.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision branch-2 / 4921420 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/console This message was automatically generated.

            Thanks jlowe for the review. I am attaching MAPREDUCE-6489-branch-2.003.patch for branch-2.

            maysamyabandeh Maysam Yabandeh added a comment - Thanks jlowe for the review. I am attaching MAPREDUCE-6489 -branch-2.003.patch for branch-2.

            +1 latest patch looks good to me. However the patch does not apply cleanly to branch-2. maysamyabandeh could you provide a branch-2 patch as well?

            jlowe Jason Darrell Lowe added a comment - +1 latest patch looks good to me. However the patch does not apply cleanly to branch-2. maysamyabandeh could you provide a branch-2 patch as well?
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 16m 55s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
            +1 javac 7m 47s There were no new javac warning messages.
            +1 javadoc 10m 17s There were no new javadoc warning messages.
            -1 release audit 0m 19s The applied patch generated 1 release audit warnings.
            -1 checkstyle 0m 49s The applied patch generated 2 new checkstyle issues (total was 734, now 732).
            +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            +1 install 1m 31s mvn install still works.
            +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
            +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 mapreduce tests 1m 56s Tests passed in hadoop-mapreduce-client-core.
                41m 37s  



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 55s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 47s There were no new javac warning messages. +1 javadoc 10m 17s There were no new javadoc warning messages. -1 release audit 0m 19s The applied patch generated 1 release audit warnings. -1 checkstyle 0m 49s The applied patch generated 2 new checkstyle issues (total was 734, now 732). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 31s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 1m 56s Tests passed in hadoop-mapreduce-client-core.     41m 37s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12766040/MAPREDUCE-6489.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / db93047 Release Audit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6066/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6066/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6066/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6066/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6066/console This message was automatically generated.

            Thanks jlowe for detailed comments. The updated patch includes the following changes:

            • Interpreting negative limits as no limits
            • Updating the description of config var to explain that it only covers writes that affect BYTES_WRITTEN
            • Using the next unused exit code, i.e., 69
            • Using FATAL level for the log message. Also calling umbilical.fatalError.
            • Changing the conf var name to mapreduce.task.local-fs.write-limit.bytes. bytes is separated with a dot to be consistent with the existing var names in which the unit is specified at the end separated with a dot, e.g., ".bytes", ".kb", ".mb"
            • Updating the test to actually write to local file system and test the specified limits.

            Let me add that I am not entirely sure how the combination of umbilical.fatalError and SystemUtil.exit works out when the event handler used by umbilical is async. In this case the diagnosis update event would be queued, and might never actually be handled if SystemUtil.exit is invoked sooner.

            maysamyabandeh Maysam Yabandeh added a comment - Thanks jlowe for detailed comments. The updated patch includes the following changes: Interpreting negative limits as no limits Updating the description of config var to explain that it only covers writes that affect BYTES_WRITTEN Using the next unused exit code, i.e., 69 Using FATAL level for the log message. Also calling umbilical.fatalError. Changing the conf var name to mapreduce.task.local-fs.write-limit.bytes. bytes is separated with a dot to be consistent with the existing var names in which the unit is specified at the end separated with a dot, e.g., ".bytes", ".kb", ".mb" Updating the test to actually write to local file system and test the specified limits. Let me add that I am not entirely sure how the combination of umbilical.fatalError and SystemUtil.exit works out when the event handler used by umbilical is async. In this case the diagnosis update event would be queued, and might never actually be handled if SystemUtil.exit is invoked sooner.

            Thanks for the patch, Maysam!

            I think it would be a little easier for users if they could configure zero or a negative number in the limit to disable it rather than a giant value. The description of the property should explain that this limit only applies to writes that go through the Hadoop filesystem APIs within the task process (i.e.: writes that will update the local filesystem BYTES_WRITTEN counter). It does not cover other writes such as logging, sideband writes from subprocesses (e.g.: streaming jobs), etc.

            Should we be using exit code 65? I got the impression the original code was trying to use different exit codes for different task failure reasons. Seems like this would deserve a separate code.

            The warn message should say why the task is being killed, otherwise users will have little clue if the INFO message is suppressed. Speaking of users, this doesn't fail in a very graceful way for users to determine what happened. The history will just show the task exiting with exit code 65 (or some other number), with no useful diagnostic message sent to the AM explaining what went wrong. We should use umbilical.fatalError to report the fatal error before tearing down so the UI and history has a useful diagnostic that clearly shows why the task failed.

            TASK_LOCAL_WRITE_LIMIT and DEFAULT_TASK_LOCAL_WRITE_LIMIT should be public static final.

            Nit: write.limit.bytes should be write-limit-bytes to be consistent with local-fs, otherwise one would expect local.fs.write.limit.bytes. Normally dots separate namespaces and dashes take the place of spaces for an identifier within the namespace. Granted the existing code is very inconsistent about this.

            It would be nice to see a test that verifies writes to the local filesystem trigger the failure. As it is now the test would pass even if we were looking at the wrong counter or for some reason the counter wasn't working properly.

            jlowe Jason Darrell Lowe added a comment - Thanks for the patch, Maysam! I think it would be a little easier for users if they could configure zero or a negative number in the limit to disable it rather than a giant value. The description of the property should explain that this limit only applies to writes that go through the Hadoop filesystem APIs within the task process (i.e.: writes that will update the local filesystem BYTES_WRITTEN counter). It does not cover other writes such as logging, sideband writes from subprocesses (e.g.: streaming jobs), etc. Should we be using exit code 65? I got the impression the original code was trying to use different exit codes for different task failure reasons. Seems like this would deserve a separate code. The warn message should say why the task is being killed, otherwise users will have little clue if the INFO message is suppressed. Speaking of users, this doesn't fail in a very graceful way for users to determine what happened. The history will just show the task exiting with exit code 65 (or some other number), with no useful diagnostic message sent to the AM explaining what went wrong. We should use umbilical.fatalError to report the fatal error before tearing down so the UI and history has a useful diagnostic that clearly shows why the task failed. TASK_LOCAL_WRITE_LIMIT and DEFAULT_TASK_LOCAL_WRITE_LIMIT should be public static final. Nit: write.limit.bytes should be write-limit-bytes to be consistent with local-fs, otherwise one would expect local.fs.write.limit.bytes. Normally dots separate namespaces and dashes take the place of spaces for an identifier within the namespace. Granted the existing code is very inconsistent about this. It would be nice to see a test that verifies writes to the local filesystem trigger the failure. As it is now the test would pass even if we were looking at the wrong counter or for some reason the counter wasn't working properly.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 42m 7s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
            +1 javac 7m 47s There were no new javac warning messages.
            +1 javadoc 10m 16s There were no new javadoc warning messages.
            -1 release audit 0m 16s The applied patch generated 1 release audit warnings.
            -1 checkstyle 2m 52s The applied patch generated 3 new checkstyle issues (total was 733, now 732).
            +1 whitespace 0m 1s The patch has no lines that end in whitespace.
            +1 install 1m 27s mvn install still works.
            +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
            +1 findbugs 1m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 mapreduce tests 1m 53s Tests passed in hadoop-mapreduce-client-core.
                68m 42s  



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 42m 7s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 47s There were no new javac warning messages. +1 javadoc 10m 16s There were no new javadoc warning messages. -1 release audit 0m 16s The applied patch generated 1 release audit warnings. -1 checkstyle 2m 52s The applied patch generated 3 new checkstyle issues (total was 733, now 732). +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 27s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 1m 53s Tests passed in hadoop-mapreduce-client-core.     68m 42s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12764968/MAPREDUCE-6489.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 30e2f83 Release Audit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6054/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6054/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6054/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6054/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6054/console This message was automatically generated.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 17m 27s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
            +1 javac 8m 4s There were no new javac warning messages.
            +1 javadoc 10m 31s There were no new javadoc warning messages.
            -1 release audit 0m 16s The applied patch generated 1 release audit warnings.
            -1 checkstyle 0m 46s The applied patch generated 10 new checkstyle issues (total was 733, now 743).
            -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
            +1 install 1m 31s mvn install still works.
            +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse.
            +1 findbugs 1m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 mapreduce tests 2m 3s Tests passed in hadoop-mapreduce-client-core.
                42m 49s  



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 27s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 8m 4s There were no new javac warning messages. +1 javadoc 10m 31s There were no new javadoc warning messages. -1 release audit 0m 16s The applied patch generated 1 release audit warnings. -1 checkstyle 0m 46s The applied patch generated 10 new checkstyle issues (total was 733, now 743). -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 31s mvn install still works. +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse. +1 findbugs 1m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 2m 3s Tests passed in hadoop-mapreduce-client-core.     42m 49s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12764961/MAPREDUCE-6489.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 30e2f83 Release Audit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6053/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6053/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6053/artifact/patchprocess/whitespace.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6053/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6053/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6053/console This message was automatically generated.

            Uploading the first patch. The patch updates TaskReporter thread to also check the limits on the counters each time they are updated. If BYTES_WRITTEN counter exceeds the configured limit it fails fast with ExitUtil.terminate()

            Reviews are appreciated.

            maysamyabandeh Maysam Yabandeh added a comment - Uploading the first patch. The patch updates TaskReporter thread to also check the limits on the counters each time they are updated. If BYTES_WRITTEN counter exceeds the configured limit it fails fast with ExitUtil.terminate() Reviews are appreciated.

            People

              maysamyabandeh Maysam Yabandeh
              maysamyabandeh Maysam Yabandeh
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: