Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Summary
Provide direct off-heap memory access to an external non-JVM program such as a c++ library within the Spark running JVM/executor. As Spark moves to storing all data into off heap memory it makes sense to provide access points to the memory for non-JVM programs.
Assumptions
- Zero copies will be made during the call into non-JVM library
- Access into non-JVM libraries will be accomplished via JNI
- A generic JNI interface will be created so that developers will not need to deal with the raw JNI call
- C++ will be the initial target non-JVM use case
- memory management will remain on the JVM/Spark side
- the API from C++ will be similar to dataframes as much as feasible and NOT require expert knowledge of JNI
- Data organization and layout will support complex (multi-type, nested, etc.) types
Design
- Initially Spark JVM -> non-JVM will be supported
- Creating an embedded JVM with Spark running from a non-JVM program is initially out of scope
Technical
- GetDirectBufferAddress is the JNI call used to access byte buffer without copy
Attachments
Issue Links
- is part of
-
SPARK-9697 Project Tungsten (Phase 2)
- Resolved
- relates to
-
REEF-580 Add a Block Management Service to REEF
- Open
- links to
(3 links to)
1.
|
UnsafeRow, UnsafeArrayData, and UnsafeMapData use MemoryBlock | Resolved | Unassigned |