Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11666

NullPointerException in TestSLSRunner.testSimulatorRunning

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • Operating System: macOS (Sanoma 14.2.1 (23C71))

      Hardware: MacBook Air 2023

      IDE: IntelliJ IDEA (2023.3.2 (Ultimate Edition))

      Java Version: OpenJDK version "1.8.0_292"

    Description

      What happened: 

      In the TestSLSRunner class of the Apache Hadoop YARN SLS (Simulated Load Scheduler) framework, a NullPointerException is thrown during the teardown process of parameterized tests. This exception is thrown when the stop method is called on the ResourceManager (rm) object in RMRunner.java. This issue occurs under test conditions that involve mismatches between trace types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios where the rm object may not be properly initialized before the stop method is invoked.

       

      Buggy code:

      The issue is located in the RMRunner.java file within the stop method:

      public void stop() {
        rm.stop();
      }
      

      The root cause of the NullPointerException is the lack of a null check for the rm object before calling its stop method. Under any condition where the ResourceManager fails to initialize correctly, attempting to stop the ResourceManager leads to a null pointer dereference.

       

      After fixing in RMRunner.java , TaskRunner.java should also be fixed.

      TaskRunner.java

      public void stop() throws InterruptedException {
        executor.shutdownNow();
        executor.awaitTermination(20, TimeUnit.SECONDS);
      }
      

       

      How to trigger this bug:

      • Change the parameterized unit test's(TestSLSRunner.java) data method to include one/both of the following test cases:
      • {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
      • {capScheduler, "SYNTH", slsTraceFile, nodeFile }
      • Execute the TestSLSRunner test suite, particularly the testSimulatorRunning method.
      • Observe the resulting NullPointerException in the test output(triggered in RMRunner.java).

      *you can use the attachments(reproduce.sh which uses add_test_cases.patchpatch) to easily reproduce the bug

      Example stack trace from the test output:

      [ERROR] testSimulatorRunning[Testing with: SYNTH, org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler, (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 s <<< ERROR!
      java.lang.NullPointerException
      at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
      at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
      at org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
      ...

       

      How To Fix

      The bug can be fixed by implementing a null check for the rm object within the RMRunner.java stop method before calling any methods on it.(same for executor object in TaskRunner.java)

      Attachments

        1. reproduce.sh
          0.5 kB
          Elen Chatikyan
        2. add_test_cases.patch
          0.9 kB
          Elen Chatikyan

        Activity

          People

            Unassigned Unassigned
            elenc2 Elen Chatikyan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: