Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.9.0
-
None
-
zeppelin-0.9.0-SNAPSHOT build from the Master
Spark-2.4
Description
Step1:
set
zeppelin.interpreter.lifecyclemanager.class = org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager zeppelin.interpreter.lifecyclemanager.timeout.threshold = 300000
Now It works well, the paragraph bound with Spark Interpreter is running well while the Progressbar showing the percentage .
Step2:
After 5 minutes later, rerun the same paragraph. This time the paragraph's status is PENDING all the time and the Progressbar is missing.
The reason of this issue:
- When RemoteInterpreter expired, TimeoutLifecycleManager will call RemoteInterpreterEventServer.unRegisterInterpreterProcess which only removes the RemoteInterpreterGroup without close it.
- When the paragraph runs again, one new RemoteInterpreterGroup is instanced which asks the SchedulerFactory for one RemoteScheduler to submit the paragraph.
- SchedulerFactory always find existed RemoteScheduler, so the previous RemoteScheduler which hold the old RemoteInterpreter returned .
- The JobStatusPoller which started by the RemoteScheduler uses the old RemoteInterpreter to get status, thus an exception was thrown and it fails.
How to Fix :
The way to fix is simple, just add the following codes to the RemoteInterpreterEventServer.unRegisterInterpreterProcess function:
// Close RemoteInterpreter when RemoteInterpreterServer already timeout. // Otherwise the ProgressBar will be missing when rerun after the RemoteInterpreterServer timeout and old RemoteInterpreterGroups will always alive after GC interpreterGroup.close();