[SPARK-13921] Store serialized blocks as multiple chunks in MemoryStore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: Block Manager, Spark Core
Labels:
None

Target Version/s:

2.0.0

Description

Instead of storing serialized blocks in individual ByteBuffers, the BlockManager should be capable of storing a serialized block in multiple chunks, each occupying a separate ByteBuffer.

This change will help to improve the efficiency of memory allocation and the accuracy of memory accounting when serializing blocks. Our current serialization code uses a ByteBufferOutputStream, which doubles and re-allocates its backing byte array; this increases the peak memory requirements during serialization (since we need to hold extra memory while expanding the array). In addition, we currently don't account for the extra wasted space at the end of the ByteBuffer's backing array, so a 129 megabyte serialized block may actually consume 256 megabytes of memory. After switching to storing blocks in multiple chunks, we'll be able to efficiently trim the backing buffers so that no space is wasted.

This change is also a prerequisite to being able to cache blocks which are larger than 2GB (although full support for that depends on several other changes which have not bee implemented yet).

Attachments

Issue Links

blocks

SPARK-13980 Incrementally serialize blocks while unrolling them in MemoryStore

Resolved

links to

[Github] Pull Request #11748 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Mar/16 00:49

Updated:: 17/May/20 18:21

Resolved:: 18/Mar/16 03:01