Details
-
Question
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
3.1.2
-
None
-
None
Description
My submitted spark application keeps running into the following error:
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$Lambda$751/0x0000000840662040.get$Lambda(Unknown Source) at java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) at java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder) at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:2036) at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$2.run(BlockManager.scala:2002) Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.lang.OutOfMemoryError: Java heap space at scala.collection.immutable.HashSet$HashTrieSet.updated0(HashSet.scala:551) at scala.collection.immutable.HashSet.$plus(HashSet.scala:84) at scala.collection.immutable.HashSet.$plus(HashSet.scala:35) at scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) at scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) at scala.collection.generic.Growable.$anonfun$$plus$plus$eq$1(Growable.scala:62) at scala.collection.generic.Growable$$Lambda$9/0x0000000840063840.apply(Unknown Source) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.SetBuilder.$plus$plus$eq(SetBuilder.scala:24) at scala.collection.TraversableLike.to(TraversableLike.scala:678) at scala.collection.TraversableLike.to$(TraversableLike.scala:675) at scala.collection.AbstractTraversable.to(Traversable.scala:108) at scala.collection.TraversableOnce.toSet(TraversableOnce.scala:309) at scala.collection.TraversableOnce.toSet$(TraversableOnce.scala:309) at scala.collection.AbstractTraversable.toSet(Traversable.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.containsChild$lzycompute(TreeNode.scala:122) at org.apache.spark.sql.catalyst.trees.TreeNode.containsChild(TreeNode.scala:122) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$1(TreeNode.scala:270) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$withNewChildren$4(TreeNode.scala:283) at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$2239/0x0000000840e8c040.apply(Unknown Source) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.TraversableLike$$Lambda$17/0x000000084012e840.apply(Unknown Source) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108) 12-29-2021 12:13:28 PM ERROR Utils: uncaught error in thread Spark Context Cleaner, stopping SparkContext java.lang.OutOfMemoryError: Java heap space 12-29-2021 12:13:28 PM ERROR Utils: throw uncaught fatal error in thread Spark Context Cleaner java.lang.OutOfMemoryError: Java heap space Exception in thread "Spark Context Cleaner" java.lang.OutOfMemoryError: Java heap space
A dataframe is created from a JDBC query to a Postgres database
var dataframeVariable = sparkSession.read .format("jdbc") .option("url", urlVariable) .option("driver", driverVariable) .option("user", usernameVariable) .option("password", passwordVariable) .option("query", "select max(timestamp) as timestamp from \"" + tableNameVariable + "\"") .load()
The error occurs when the program tries to extract a value from the dataframe. The dataframe contains only a single row and column. Here are the methods that I have used but have resulted in the application hanging and eventually getting the OOM error.
var lastTimestamp = dataframeVariable().getDouble(0)
var timeStampVal = dataframeVariable(col("timestamp")).collect()
After some looking around, several people suggested changing the spark configurations for memory management to address this issues but I am not sure where to start in regards to that. Any guidance would be helpful.
Currently using: Spark 3.1.2, Scala 2.12, Java 11
Spark Cluster Spec: 8 workers, 48 cores, 64GB Memory
Application Submitted Spec: 1 worker, 4 driver and executor cores, 4GB driver and executor memory