[YARN-10460] Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.4.0, 3.3.1, 2.10.2, 3.2.3
Component/s: nodemanager, test
Labels:
None

Hadoop Flags:

Reviewed

Description

In our downstream build environment, we're using JUnit 4.13. Recently, we discovered a truly weird test failure in TestNodeStatusUpdater.

The problem is that timeout handling has changed in Junit 4.13. See the difference between these two snippets:

4.12

    @Override
    public void evaluate() throws Throwable {
        CallableStatement callable = new CallableStatement();
        FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
        threadGroup = new ThreadGroup("FailOnTimeoutGroup");
        Thread thread = new Thread(threadGroup, task, "Time-limited test");
        thread.setDaemon(true);
        thread.start();
        callable.awaitStarted();
        Throwable throwable = getResult(task, thread);
        if (throwable != null) {
            throw throwable;
        }
    }

4.13

    @Override
    public void evaluate() throws Throwable {
        CallableStatement callable = new CallableStatement();
        FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
        ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
        Thread thread = new Thread(threadGroup, task, "Time-limited test");
        try {
            thread.setDaemon(true);
            thread.start();
            callable.awaitStarted();
            Throwable throwable = getResult(task, thread);
            if (throwable != null) {
                throw throwable;
            }
        } finally {
            try {
                thread.join(1);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            try {
                threadGroup.destroy();  <---- This
            } catch (IllegalThreadStateException e) {
                // If a thread from the group is still alive, the ThreadGroup cannot be destroyed.
                // Swallow the exception to keep the same behavior prior to this change.
            }
        }
    }

The change comes from https://github.com/junit-team/junit4/pull/1517.

Unfortunately, destroying the thread group causes an issue because there are all sorts of object caching in the IPC layer. The exception is:

java.lang.IllegalThreadStateException
	at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
	at java.lang.Thread.init(Thread.java:402)
	at java.lang.Thread.init(Thread.java:349)
	at java.lang.Thread.<init>(Thread.java:675)
	at java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
	at com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
	at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
	at org.apache.hadoop.ipc.Client.call(Client.java:1458)
	at org.apache.hadoop.ipc.Client.call(Client.java:1405)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
	at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
	at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)

Both the clientExecutor in org.apache.hadoop.ipc.Client and the client object in ProtobufRpcEngine/ProtobufRpcEngine2 are stored as long as they're needed. But since the backing thread group is destroyed in the previous test, it's no longer possible to create new threads.

A quick workaround is to stop the clients and completely clear the ClientCache in ProtobufRpcEngine before each testcase. I tried this and it solves the problem but it feels hacky. Not sure if there is a better approach.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-10460-branch-2.10.002.patch
19/Apr/21 21:33
3 kB
Eric Badger
YARN-10460-branch-3.2.002.patch
16/Apr/21 21:25
3 kB
Eric Badger
YARN-10460-002.patch
17/Oct/20 11:25
3 kB
Peter Bacsko
YARN-10460-001.patch
16/Oct/20 16:31
3 kB
Peter Bacsko
YARN-10460-POC.patch
14/Oct/20 11:16
3 kB
Peter Bacsko

Issue Links

breaks

HADOOP-17315 Use shaded guava in ClientCache.java

Resolved

is related to

HADOOP-17316 Upgrade JUnit to 4.13.1

Open

relates to

MAPREDUCE-7302 Upgrading to JUnit 4.13 causes testcase TestFetcher.testCorruptedIFile() to fail

Resolved

Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates