[SPARK-17465] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0, 1.6.1, 1.6.2
Fix Version/s: 1.6.3, 2.0.1, 2.1.0
Component/s: Spark Core
Labels:
None

Description

After updating Spark from 1.5.0 to 1.6.0, I found that it seems to have a memory leak on my Spark streaming application.

Here is the head of the heap histogram of my application, which has been running about 160 hours:

 num     #instances         #bytes  class name
----------------------------------------------
   1:         28094       71753976  [B
   2:       1188086       28514064  java.lang.Long
   3:       1183844       28412256  scala.collection.mutable.DefaultEntry
   4:        102242       13098768  <methodKlass>
   5:        102242       12421000  <constMethodKlass>
   6:          8184        9199032  <constantPoolKlass>
   7:            38        8391584  [Lscala.collection.mutable.HashEntry;
   8:          8184        7514288  <instanceKlassKlass>
   9:          6651        4874080  <constantPoolCacheKlass>
  10:         37197        3438040  [C
  11:          6423        2445640  <methodDataKlass>
  12:          8773        1044808  java.lang.Class
  13:         36869         884856  java.lang.String
  14:         15715         848368  [[I
  15:         13690         782808  [S
  16:         18903         604896  java.util.concurrent.ConcurrentHashMap$HashEntry
  17:            13         426192  [Lscala.concurrent.forkjoin.ForkJoinTask;

It shows that scala.collection.mutable.DefaultEntry and java.lang.Long have unexpected big numbers of instances. In fact, the numbers started growing at streaming process began, and keep growing proportional to total number of tasks.

After some further investigation, I found that the problem is caused by some inappropriate memory management in releaseUnrollMemoryForThisTask and unrollSafely method of class org.apache.spark.storage.MemoryStore.

In Spark 1.6.x, a releaseUnrollMemoryForThisTask operation will be processed only with the parameter memoryToRelease > 0:
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L530-L537
But in fact, if a task successfully unrolled all its blocks in memory by unrollSafely method, the memory saved in unrollMemoryMap would be set to zero:
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L322

So the result is, the memory saved in unrollMemoryMap will be released, but the key of that part of memory will never be removed from the hash map. The hash table will keep increasing, while new tasks keep incoming. Although the speed of increase is comparatively slow (about dozens of bytes per task), it is possible that result into OOM after weeks or months.

Attachments

Issue Links

links to

[Github] Pull Request #15022 (saturday-shi)

Activity

People

Assignee:: Xing Shi

Reporter:: Xing Shi

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 09/Sep/16 06:35

Updated:: 15/Mar/17 01:41

Resolved:: 14/Sep/16 20:47