[SPARK-13980] Incrementally serialize blocks while unrolling them in MemoryStore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: Block Manager, Spark Core
Labels:
None

Target Version/s:

2.0.0

Description

When a block is persisted in the MemoryStore at a serialized storage level, the current MemoryStore.putIterator() code will unroll the entire iterator as Java objects in memory, then will turn around and serialize an iterator obtained from the unrolled array. This is inefficient and doubles our peak memory requirements. Instead, I think that we should incrementally serialize blocks while unrolling them. A downside to incremental serialization is the fact that we will need to deserialize the partially-unrolled data in case there is not enough space to unroll the block and the block cannot be dropped to disk. However, I'm hoping that the memory efficiency improvements will outweigh any performance losses as a result of extra serialization in that hopefully-rare case.

Attachments

Issue Links

blocks

SPARK-13992 Add support for off-heap caching

Resolved

is blocked by

SPARK-13921 Store serialized blocks as multiple chunks in MemoryStore

Resolved

links to

[Github] Pull Request #11791 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Mar/16 18:57

Updated:: 17/May/20 18:21

Resolved:: 25/Mar/16 00:34