Details
-
New Feature
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
Important
Description
When creating a Pipeline through a KafkaIO object, I want to be able to specify the starting offset of consumption, and when traversing the message later, I can get the offset of the current message for storage in a relational database / NoSQL.
This feature is used to implement the exactly-once semantics of spark streaming consumption.
In the "Your own data store" section of the following url content, you can find how to achieve exactly-once semantics with spark streaming:
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html