[SPARK-6449] Driver OOM results in reported application result SUCCESS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: YARN
Labels:
None

Description

I ran a job yesterday that according to the History Server and YARN RM finished with status SUCCESS.

Clicking around on the history server UI, there were too few stages run, and I couldn't figure out why that would have been.

Finally, inspecting the end of the driver's logs, I saw:

15/03/20 15:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/03/20 15:08:13 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "Driver" scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:485)
15/03/20 15:08:13 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
15/03/20 15:08:13 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/03/20 15:08:13 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/03/20 15:08:13 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1426705269584_0055

The driver OOM'd, the catch block that presumably should have caught it threw a MatchError, and then SUCCESS was returned to YARN and written to the event log.

This should be logged as a failed job and reported as such to YARN.

Attachments

Issue Links

duplicates

SPARK-6018 NoSuchMethodError in Spark app is swallowed by YARN AM

Closed

links to

[Github] Pull Request #5130 (ryan-williams)

Activity

People

Assignee:: Unassigned

Reporter:: Ryan Williams

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 21/Mar/15 22:57

Updated:: 24/Mar/15 10:58

Resolved:: 24/Mar/15 10:58