Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Right now you can only run a Streaming Query starting from either the earliest or latests offsets available at the moment the query is started. Sometimes this is a lot of data. It would be nice to be able to do the following:
- seek to user specified offsets for manually specified topicpartitions
currently agreed on plan:
Mutually exclusive subscription options (only assign is new to this ticket)
.option("subscribe","topicFoo,topicBar") .option("subscribePattern","topic.*") .option("assign","""{"topicfoo": [0, 1],"topicbar": [0, 1]}""")
where assign can only be specified that way, no inline offsets
Single starting position option with three mutually exclusive types of value
.option("startingOffsets", "earliest" | "latest" | """{"topicFoo": {"0": 1234, "1": -2}, "topicBar":{"0": -1}}""")
startingOffsets with json fails if any topicpartition in the assignments doesn't have an offset.