Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Later
-
3.0.0
-
None
-
None
Description
SPARK-13809 introduced a framework for state management for computing Streaming Aggregates. The default implementation was in-memory hashmap which was backed up in HDFS complaint file system at the end of every micro-batch.
Current implementation suffers from Performance and Latency Issues. It uses Executor JVM memory to store the states. State store size is limited by the size of the executor memory. Also
Executor JVM memory is shared by state storage and other tasks operations. State storage size will impact the performance of task execution
Moreover, GC pauses, executor failures, OOM issues are common when the size of state storage increases which increases overall latency of a micro-batch
RocksDb is an embedded DB which can provide major performance improvements. Other major streaming frameworks have rocksdb as default state storage.
Attachments
Issue Links
- is related to
-
SPARK-34198 Add RocksDB StateStore implementation
- Resolved
- links to