Details
Description
At present SimpleInMemoryKeyValueStorage already has compaction functionality but it is question: who and when should invoke compact method.
Upd:
- It's still an open question: who and when should invoke compact method.
- Besides that, it's required to fix storage compaction -
IGNITE-16444 - Seems that we, might reuse inner ms cursors meta in order to prevent compaction of cursors over witch we are currently iterating.
- It's still however possible that revision-based get(), range(), and watch(), invoke(), etc will throw CompactionException on corresponding initial calls.
UPD 2:
For this ticket we decided to implement time-based compaction by creating a timestamp (watermark)->revision mapping. Watermark provider will be implemented in a separate ticket: https://issues.apache.org/jira/browse/IGNITE-19417
UPD 3:
Time to Revision mapping mechanism
1. We can leverage the current implementation of the MetaStorage which has a hackish feature of MetaStorage RAFT group’s leader handles the write command’s HybridTimestamp from the node that initiates the write operation. Leader adjusts its clock and sets the current time of the adjusted clock to the command, so that the new adjusted time will be replicated to the followers and learners. We can then use this time to update the time to revision mapping.
The mapping itself can be stored in a RocksDB column family.
2. The time to revision mapping can be sacked in favor of replacement of revisions with timestamps. Timestamp is also an 8-byte long that monotonously increases, so it seems like it can be used instead of auto-incremented revisions.
3. As soon as we have MetaStorage based on the ReplicaService (indirect usage of RAFT groups via the primary replica), we will be able to generate timestamps for commands on the lease holder (and we should get rid of the hack mentioned in the first point).
We can tie the compaction of the MetaStorage with the Garbage Collection of partitions and use low watermark value in the compaction process. All the rules that apply to garbage collection should be applied to the compaction, i.e. we don’t remove the entry with a timestamp below MetaStorage’s LWM if it’s the only entry for a given key.
A scheduled background task should be triggering the compaction. We can also be triggering the compaction out of schedule every time the LWM is changed.
In addition it may be beneficial to perform a compaction on every insertion, just like we do with garbage collection, however this is an optimisation and should be benchmarked.
MetaStorage’s low watermark:
1. MS LWM cannot be higher than the Partition LWM.
2. MS LWM cannot be higher than any cursor's timestamp.
3. MS LWM cannot be higher than the timestamp of any schema version for which a tuple exists that is serialized using this schema.
4. MS LWM is increased as soon as new LWM doesn't contradict all of the 3 points above.
See schema synchronization IEP: https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization
Attachments
Issue Links
- causes
-
IGNITE-19417 Provide low watermark for metastorage compaction
- Resolved
- links to