Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
Operating System: macOS (Sanoma 14.2.1 (23C71))
Hardware: MacBook Air 2023
IDE: IntelliJ IDEA (2023.3.2 (Ultimate Edition))
Java Version: OpenJDK version "1.8.0_292"
Description
What happened:
In the TestSLSRunner class of the Apache Hadoop YARN SLS (Simulated Load Scheduler) framework, a NullPointerException is thrown during the teardown process of parameterized tests. This exception is thrown when the stop method is called on the ResourceManager (rm) object in RMRunner.java. This issue occurs under test conditions that involve mismatches between trace types (RUMEN, SLS, SYNTH) and their corresponding trace files, leading to scenarios where the rm object may not be properly initialized before the stop method is invoked.
Buggy code:
The issue is located in the RMRunner.java file within the stop method:
public void stop() {
rm.stop();
}
The root cause of the NullPointerException is the lack of a null check for the rm object before calling its stop method. Under any condition where the ResourceManager fails to initialize correctly, attempting to stop the ResourceManager leads to a null pointer dereference.
After fixing in RMRunner.java , TaskRunner.java should also be fixed.
public void stop() throws InterruptedException { executor.shutdownNow(); executor.awaitTermination(20, TimeUnit.SECONDS); }
How to trigger this bug:
- Change the parameterized unit test's(TestSLSRunner.java) data method to include one/both of the following test cases:
- {capScheduler, "SYNTH", rumenTraceFile, nodeFile }
- {capScheduler, "SYNTH", slsTraceFile, nodeFile }
- Execute the TestSLSRunner test suite, particularly the testSimulatorRunning method.
- Observe the resulting NullPointerException in the test output(triggered in RMRunner.java).
*you can use the attachments(reproduce.sh which uses add_test_cases.patchpatch) to easily reproduce the bug
[ERROR] testSimulatorRunning[Testing with: SYNTH, org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler, (nodeFile null)](org.apache.hadoop.yarn.sls.TestSLSRunner) Time elapsed: 3.027 s <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.RMRunner.stop(RMRunner.java:127)
at org.apache.hadoop.yarn.sls.SLSRunner.stop(SLSRunner.java:320)
at org.apache.hadoop.yarn.sls.BaseSLSRunnerTest.tearDown(BaseSLSRunnerTest.java:68)
...
How To Fix
The bug can be fixed by implementing a null check for the rm object within the RMRunner.java stop method before calling any methods on it.(same for executor object in TaskRunner.java)