Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
1.5.3, 1.6.0, 1.7.0, 1.11.3, 1.12.0
-
None
Description
While going over the ZooKeeper based stores (ZooKeeperSubmittedJobGraphStore, ZooKeeperMesosWorkerStore, ZooKeeperCompletedCheckpointStore) and the underlying ZooKeeperStateHandleStore I noticed several inconsistencies which were introduced with past incremental changes.
- Depending whether ZooKeeperStateHandleStore#getAllSortedByNameAndLock or ZooKeeperStateHandleStore#getAllAndLock is called, deserialization problems will either lead to removing the Znode or not
- ZooKeeperStateHandleStore leaves inconsistent state in case of exceptions (e.g. #getAllAndLock won't release the acquired locks in case of a failure)
- ZooKeeperStateHandleStore has too many responsibilities. It would be better to move RetrievableStateStorageHelper out of it for a better separation of concerns
- ZooKeeperSubmittedJobGraphStore overwrites a stored JobGraph even if it is locked. This should not happen since it could leave another system in an inconsistent state (imagine a changed JobGraph which restores from an old checkpoint)
- Redundant but also somewhat inconsistent put logic in the different stores
- Shadowing of ZooKeeper specific exceptions in ZooKeeperStateHandleStore which were expected to be caught in ZooKeeperSubmittedJobGraphStore
- Getting rid of the SubmittedJobGraphListener would be helpful
These problems made me think how reliable these components actually work. Since these components are very important, I propose to refactor them.
Attachments
Attachments
Issue Links
- relates to
-
FLINK-10011 Old job resurrected during HA failover
- Resolved
-
FLINK-10694 ZooKeeperHaServices Cleanup
- Closed
-
FLINK-21979 Job can be restarted from the beginning after it reached a terminal state
- Closed
-
FLINK-11225 Error state of addedJobGraphs when Dispatcher with concurrent revoking and granting leadership
- Closed
-
FLINK-4233 Simplify leader election / leader session ID assignment
- Open
- links to
1.
|
Replace ZooKeeperStateHandleStore#getAllSortedByNameAndLock by getAllAndLock | Closed | Till Rohrmann | |||||||||
2.
|
Move RetrievableStateStorageHelper out of ZooKeeperStateHandleStore | Open | Unassigned | |||||||||
3.
|
Create common ZooKeeperStateStore based on ZooKeeperStateHandleStore and RetrievableStateStorageHelper | Open | Unassigned | |||||||||
4.
|
Use ZooKeeperStateStore in ZooKeeperSubmittedJobGraphStore | Open | Unassigned | |||||||||
5.
|
Use ZooKeeperStateStore in ZooKeeperCompletedCheckpointStore | Open | Unassigned | |||||||||
6.
|
Rethink SubmittedJobGraphListener | Open | Unassigned | |||||||||
7.
|
Use ZooKeeperStateStore in ZooKeeperMesosWorkerStore | Closed | Unassigned | |||||||||
8.
|
Introduce ZooKeeperLeaderElectionServiceNG | Closed | Unassigned |
|