[IGNITE-14734] Implement compaction functionality management for meta storage. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0
Component/s: None
Labels:
- iep-61
- ignite-3

Ignite Flags:

Docs Required, Release Notes Required

Description

At present SimpleInMemoryKeyValueStorage already has compaction functionality but it is question: who and when should invoke compact method.

Upd:

It's still an open question: who and when should invoke compact method.
Besides that, it's required to fix storage compaction - ~~IGNITE-16444~~
Seems that we, might reuse inner ms cursors meta in order to prevent compaction of cursors over witch we are currently iterating.
It's still however possible that revision-based get(), range(), and watch(), invoke(), etc will throw CompactionException on corresponding initial calls.

UPD 2:

For this ticket we decided to implement time-based compaction by creating a timestamp (watermark)->revision mapping. Watermark provider will be implemented in a separate ticket: https://issues.apache.org/jira/browse/IGNITE-19417

UPD 3:

Time to Revision mapping mechanism
1. We can leverage the current implementation of the MetaStorage which has a hackish feature of MetaStorage RAFT group’s leader handles the write command’s HybridTimestamp from the node that initiates the write operation. Leader adjusts its clock and sets the current time of the adjusted clock to the command, so that the new adjusted time will be replicated to the followers and learners. We can then use this time to update the time to revision mapping.
The mapping itself can be stored in a RocksDB column family.

2. The time to revision mapping can be sacked in favor of replacement of revisions with timestamps. Timestamp is also an 8-byte long that monotonously increases, so it seems like it can be used instead of auto-incremented revisions.

3. As soon as we have MetaStorage based on the ReplicaService (indirect usage of RAFT groups via the primary replica), we will be able to generate timestamps for commands on the lease holder (and we should get rid of the hack mentioned in the first point).

We can tie the compaction of the MetaStorage with the Garbage Collection of partitions and use low watermark value in the compaction process. All the rules that apply to garbage collection should be applied to the compaction, i.e. we don’t remove the entry with a timestamp below MetaStorage’s LWM if it’s the only entry for a given key.
A scheduled background task should be triggering the compaction. We can also be triggering the compaction out of schedule every time the LWM is changed.
In addition it may be beneficial to perform a compaction on every insertion, just like we do with garbage collection, however this is an optimisation and should be benchmarked.

MetaStorage’s low watermark:
1. MS LWM cannot be higher than the Partition LWM.
2. MS LWM cannot be higher than any cursor's timestamp.
3. MS LWM cannot be higher than the timestamp of any schema version for which a tuple exists that is serialized using this schema.
4. MS LWM is increased as soon as new LWM doesn't contradict all of the 3 points above.

See schema synchronization IEP: https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization

Attachments

Issue Links

causes

IGNITE-19417 Provide low watermark for metastorage compaction

Resolved

links to

GitHub Pull Request #2019

Activity

People

Assignee:: Semyon Danilov

Reporter:: Andrey N. Gura

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/May/21 09:13

Updated:: 04/Sep/24 16:19

Resolved:: 08/May/23 11:22

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 50m