Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.2, 2.0.0
-
None
-
AMD64 box with only 2 cores
Description
repeatedly failing task that crashes JVM *** FAILED ***
The code passed to failAfter did not complete within 100000 milliseconds. (DistributedSuite.scala:128)
This test started failing and DistrbutedSuite hanging following https://github.com/apache/spark/pull/13055
It looks like the extra message to remove the BlockManager deadlocks as there are only 2 message processing loop threads. Related to https://issues.apache.org/jira/browse/SPARK-13906
/** Thread pool used for dispatching messages. */ private val threadpool: ThreadPoolExecutor = { val numThreads = nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads", math.max(2, Runtime.getRuntime.availableProcessors())) val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, "dispatcher-event-loop") for (i <- 0 until numThreads) { pool.execute(new MessageLoop) } pool }
Setting a minimum of 3 threads alleviates this issue but I'm not sure there isn't another underlying problem.