Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Incomplete
-
1.3.1
-
None
Description
Spark Streaming has trouble dealing with situations where
batch processing time > batch interval
Meaning a high throughput of input data w.r.t. Spark's ability to remove data from the queue.
If this throughput is sustained for long enough, it leads to an unstable situation where the memory of the Receiver's Executor is overflowed.
This aims at transmitting a back-pressure signal back to data ingestion to help with dealing with that high throughput, in a backwards-compatible way.
The original design doc can be found here:
https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing
The second design doc, focusing on the first sub-task (without all the background info, and more centered on the implementation) can be found here:
https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing
Attachments
Issue Links
- relates to
-
SPARK-10420 Implementing Reactive Streams based Spark Streaming Receiver
- Resolved
- supercedes
-
SPARK-6691 Abstract and add a dynamic RateLimiter for Spark Streaming
- Resolved