Details

    Description

      On big repositories compaction can take quite a while to run as it needs to create a full deep copy of the current root node state. For such cases it could be beneficial if we could partially compact the repository thus splitting full compaction over multiple cycles.
      Partial compaction would run compaction on a sub-tree just like we now run it on the full tree. Afterwards it would create a new root node state by referencing the previous root node state replacing said sub-tree with the compacted one.

      Todo: Asses feasibility and impact, implement prototype.

      Attachments

        1. compaction-time.png
          37 kB
          Michael Dürig
        2. cycle-count.png
          36 kB
          Michael Dürig
        3. post-gc-size.png
          33 kB
          Michael Dürig

        Issue Links

          Activity

            See my comment on OAK-4122 on how such an approach could look like.

            mduerig Michael Dürig added a comment - See my comment on OAK-4122 on how such an approach could look like.
            mduerig Michael Dürig added a comment - - edited

            An alternative approach (let's call it tail compaction) would be to rebase some revisions on a previous, already compacted revision instead of rebasing all revisions on the empty revision (as we currently do): Let r0 be an initial, compact revisions and r1, r2, r3 be subsequent revisions in that order created by normal repository operation. The current compaction approach rewrites r3, which also rewrites everything in r0 (which is already compact). Instead we could rebase the difference from r3 to r0 on r0, effectively only rewriting the changes that came in with r1,r2 and r3. The latter approach should be much lighter as it only rewrites recent changes, leaving everything alone that was compacted previously already.

            mduerig Michael Dürig added a comment - - edited An alternative approach (let's call it tail compaction) would be to rebase some revisions on a previous, already compacted revision instead of rebasing all revisions on the empty revision (as we currently do): Let r0 be an initial, compact revisions and r1 , r2 , r3 be subsequent revisions in that order created by normal repository operation. The current compaction approach rewrites r3 , which also rewrites everything in r0 (which is already compact). Instead we could rebase the difference from r3 to r0 on r0 , effectively only rewriting the changes that came in with r1 , r2 and r3 . The latter approach should be much lighter as it only rewrites recent changes, leaving everything alone that was compacted previously already.
            stillalex Alex Deparvu added a comment -

            I think a prerequisite is to add the revision to the gc.log, so that the 'last compacted state' is easily available.

            stillalex Alex Deparvu added a comment - I think a prerequisite is to add the revision to the gc.log, so that the 'last compacted state' is easily available.

            Initial POC for tail compaction at https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC.

            The following graphs show the effect of this approach in SegmentCompactionIT through 20-something hourly gc runs.

            Post compaction size is comparable up until the 12th runs. At this point tail compaction probably starts to accumulate too much "previous compacted state". A full compaction might be beneficial here.

            Looking at compaction time tail compaction is actually much faster! Only with the 20th run does it seem to start to degenerate. (Later on it seems to recover... I have the test still running and will post further graphs once I have the data).

            The graph of the number of cycles per gc run confirms above picture: tail compaction only went into a force compact (6-th cycle) once before the 20th run, where the base line did much earlier.

            mduerig Michael Dürig added a comment - Initial POC for tail compaction at https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC . The following graphs show the effect of this approach in SegmentCompactionIT through 20-something hourly gc runs. Post compaction size is comparable up until the 12th runs. At this point tail compaction probably starts to accumulate too much "previous compacted state". A full compaction might be beneficial here. Looking at compaction time tail compaction is actually much faster! Only with the 20th run does it seem to start to degenerate. (Later on it seems to recover... I have the test still running and will post further graphs once I have the data). The graph of the number of cycles per gc run confirms above picture: tail compaction only went into a force compact (6-th cycle) once before the 20th run, where the base line did much earlier.

            Turns out that the patch has a bug causing still referenced external binaries to be collected by the DSGC. Working on a test/fix.

            mduerig Michael Dürig added a comment - Turns out that the patch has a bug causing still referenced external binaries to be collected by the DSGC. Working on a test/fix.

            Fixed at: https://github.com/mduerig/jackrabbit-oak/commit/aad097d4c241651abefda501455970438625999b

            The problem was that collection of blob references did not take the new even / odd scheme of the segments under tail compaction. The fix passes an explicit predicate encoding this to the actual collector instead of having an ad-hoc implementation there.

            mduerig Michael Dürig added a comment - Fixed at: https://github.com/mduerig/jackrabbit-oak/commit/aad097d4c241651abefda501455970438625999b The problem was that collection of blob references did not take the new even / odd scheme of the segments under tail compaction. The fix passes an explicit predicate encoding this to the actual collector instead of having an ad-hoc implementation there.
            mduerig Michael Dürig added a comment - - edited
            Implementation note on tail compaction

            In contrast to the existing compaction approach (full compaction) tail compaction rebases all changes since the last compaction on top of the result of that last compaction. Cleanup subsequently cleans up the uncompacted changes. Each tail compaction cycle creates a new generation incrementing the generation number. Cleanup remove all non compacted segments whose generation is no bigger than the current generation minus a certain amount of retained generations (2 by default).

            To make this work we need to be able to determine the age of a segment (in number of generations) and whether a segment has been written by the compactor or by a regular writer (and is thus uncompacted). The POC implemented this by assigning even generation numbers to regular segments and odd ones to segment written by tail compaction while at the same time completely removing support for full compaction.

            To combine tail compaction with full compaction I suggest to introduce a young generation field in the segment header, which is used by tail compaction as described. The existing generation field will thus keep being used for full compaction without changing its semantics.

            The proposed approach has the advantage of tail and full compaction being completely orthogonal. You can run either of which or both without one affecting or influencing the other.
            Both compaction and cleanup methods solely rely on the information in the segment headers. A predicate for determining which segments to retain can be inferred from the segment containing the head revision. There is no need to rely on auxiliary information with the small exception of tail compaction using the gc.log file to determine the base revision to compact onto. This is not problematic though wrt. to resilience as we can always fall back to full compaction should the base revision be invalid. (A base revision can be invalid in two ways: either is is not found or it is one not written by the compactor. Both cases can only occur after manual tampering with the journal.log.)
            Finally the approach plays well with upgrading: while the additional young generation field requires us to bump the segment version we can easily maintain backwards compatibility and do a rolling upgrade segment by segment. Segments of the prevision version will just not be eligible for cleanup under tail compaction.

            mduerig Michael Dürig added a comment - - edited Implementation note on tail compaction In contrast to the existing compaction approach (full compaction) tail compaction rebases all changes since the last compaction on top of the result of that last compaction. Cleanup subsequently cleans up the uncompacted changes. Each tail compaction cycle creates a new generation incrementing the generation number. Cleanup remove all non compacted segments whose generation is no bigger than the current generation minus a certain amount of retained generations (2 by default). To make this work we need to be able to determine the age of a segment (in number of generations) and whether a segment has been written by the compactor or by a regular writer (and is thus uncompacted). The POC implemented this by assigning even generation numbers to regular segments and odd ones to segment written by tail compaction while at the same time completely removing support for full compaction. To combine tail compaction with full compaction I suggest to introduce a young generation field in the segment header, which is used by tail compaction as described. The existing generation field will thus keep being used for full compaction without changing its semantics. The proposed approach has the advantage of tail and full compaction being completely orthogonal. You can run either of which or both without one affecting or influencing the other. Both compaction and cleanup methods solely rely on the information in the segment headers. A predicate for determining which segments to retain can be inferred from the segment containing the head revision. There is no need to rely on auxiliary information with the small exception of tail compaction using the gc.log file to determine the base revision to compact onto. This is not problematic though wrt. to resilience as we can always fall back to full compaction should the base revision be invalid. (A base revision can be invalid in two ways: either is is not found or it is one not written by the compactor. Both cases can only occur after manual tampering with the journal.log .) Finally the approach plays well with upgrading: while the additional young generation field requires us to bump the segment version we can easily maintain backwards compatibility and do a rolling upgrade segment by segment. Segments of the prevision version will just not be eligible for cleanup under tail compaction.
            mduerig Michael Dürig added a comment - - edited

            adulceanu, frm do above notes make sense to you? Is there anything fundamental I forgot? Especially re. tooling and, upgrade, standby etc.

            mduerig Michael Dürig added a comment - - edited adulceanu , frm do above notes make sense to you? Is there anything fundamental I forgot? Especially re. tooling and, upgrade, standby etc.

            Following a short description of tail / full compaction and the proposed data structure in pseudo code:

            For each segment S we encode in its header: a young generation number y, a full generation number f and a flag indicating segments written by tail compaction c.

            Full compaction:

            • determine the segment S of the current head state h
            • clone h and increase the full generation number for all written Segments T to T.f = S.f + 1

            Tail compaction:

            • determine the segment S of the current head state h
            • determine the head state created by the last compaction (full or tail) h'
            • rebase h on top of h' and increase the young generation number for all written Segments T to T.y = S.y + 1

            Cleanup:

            • let r be the number of retained generations
            • determine the segment S of the current head state
            • For each segment T
              • if T.f <= S.f - r reclaim T.
              • if T.y <= S.y - r && T.c == false reclaim T.

            Below I tried to visualise the evolution of a repository through a couple of tail / full compaction cycles. Segment retention is 2 generations. A sequence of segments is symbolised by a triple in square brackets. The first component indicates the operations that created the segments (w: a regular write operation, tc: a tail compaction, fc: a full compaction). The second component is the young generation pertaining to tail compaction and the third component is the full generation pertaining to full compaction.

            Write content:
            [w,0,0] 
            
            Tail compact:
            [w,0,0] [tc,1,0]
            
            Clean up (no effect since segment retention is 2 young generations):
            [w,0,0] [tc,1,0]
            
            Write content:
            [w,0,0] [tc,1,0] [w,1,0] 
            
            Tail compact:
            [w,0,0] [tc,1,0] [w,1,0] [tc,2,0]
            
            Clean up (remove w segments that are at least 2 young generations old):
            [tc,1,0] [w,1,0] [tc,2,0]
            
            Write content:
            [tc,1,0] [w,1,0] [tc,2,0] [w,2,0]
            
            Tail compact:
            [tc,1,0] [w,1,0] [tc,2,0] [w,2,0] [tc,3,0]
            
            Clean up (remove w segments that are at least 2 young generations old):
            [tc,1,0] [tc,2,0] [w,2,0] [tc,3,0]
            
            ...
            
            After a couple of more write / tail compaction cycles:
            [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] 
            
            Full compact:
            [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1]
            
            Cleanup looking (no effect since segment retention is 2 full generations)
            [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1]
            
            Write content:
            [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1]
            
            Tail compact:
            [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] [tc,5,1]
            
            Clean up (remove w segments that are at least 2 young generations old):
            [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] [tc,5,1]
            
            ...
            
            After a couple of more write / tail compaction cycles:
            [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1]
            
            Full compact:
            [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] [fc,7,2]
            
            Cleanup up (remove all segments that are at least 2 full generations old)
            [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] [fc,7,2]
            
            mduerig Michael Dürig added a comment - Following a short description of tail / full compaction and the proposed data structure in pseudo code: For each segment S we encode in its header: a young generation number y , a full generation number f and a flag indicating segments written by tail compaction c . Full compaction: determine the segment S of the current head state h clone h and increase the full generation number for all written Segments T to T.f = S.f + 1 Tail compaction: determine the segment S of the current head state h determine the head state created by the last compaction (full or tail) h' rebase h on top of h' and increase the young generation number for all written Segments T to T.y = S.y + 1 Cleanup: let r be the number of retained generations determine the segment S of the current head state For each segment T if T.f <= S.f - r reclaim T . if T.y <= S.y - r && T.c == false reclaim T . Below I tried to visualise the evolution of a repository through a couple of tail / full compaction cycles. Segment retention is 2 generations. A sequence of segments is symbolised by a triple in square brackets. The first component indicates the operations that created the segments (w: a regular write operation, tc: a tail compaction, fc: a full compaction). The second component is the young generation pertaining to tail compaction and the third component is the full generation pertaining to full compaction. Write content: [w,0,0] Tail compact: [w,0,0] [tc,1,0] Clean up (no effect since segment retention is 2 young generations): [w,0,0] [tc,1,0] Write content: [w,0,0] [tc,1,0] [w,1,0] Tail compact: [w,0,0] [tc,1,0] [w,1,0] [tc,2,0] Clean up (remove w segments that are at least 2 young generations old): [tc,1,0] [w,1,0] [tc,2,0] Write content: [tc,1,0] [w,1,0] [tc,2,0] [w,2,0] Tail compact: [tc,1,0] [w,1,0] [tc,2,0] [w,2,0] [tc,3,0] Clean up (remove w segments that are at least 2 young generations old): [tc,1,0] [tc,2,0] [w,2,0] [tc,3,0] ... After a couple of more write / tail compaction cycles: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] Full compact: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] Cleanup looking (no effect since segment retention is 2 full generations) [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] Write content: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] Tail compact: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] [tc,5,1] Clean up (remove w segments that are at least 2 young generations old): [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] [tc,5,1] ... After a couple of more write / tail compaction cycles: [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] Full compact: [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] [fc,7,2] Cleanup up (remove all segments that are at least 2 full generations old) [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] [fc,7,2]
            frm Francesco Mari added a comment -

            mduerig, thanks for the example, it helped me a lot to clarify how cleanup interacts with partial compaction. While we discussed offline about the young and full generation, how are we going to encode T.c in the segments? Are we going to use some of the unused space in the header?

            frm Francesco Mari added a comment - mduerig , thanks for the example, it helped me a lot to clarify how cleanup interacts with partial compaction. While we discussed offline about the young and full generation, how are we going to encode T.c in the segments? Are we going to use some of the unused space in the header?

            how are we going to encode T.c in the segments?

            I would not use an extra slot in the segment header just for this flag but rather encode it within the generation number. The POC uses a even odd scheme to encode this within the young generation number (i.e. an odd number indicated a segment written by tail compaction). We could either stick with this or preferably use a positive/negative scheme using a negative sign to indicate segments written by tail compaction.

            mduerig Michael Dürig added a comment - how are we going to encode T.c in the segments? I would not use an extra slot in the segment header just for this flag but rather encode it within the generation number. The POC uses a even odd scheme to encode this within the young generation number (i.e. an odd number indicated a segment written by tail compaction). We could either stick with this or preferably use a positive/negative scheme using a negative sign to indicate segments written by tail compaction.
            frm Francesco Mari added a comment -

            Both the solutions sound reasonable to me. Another question: the algorithm uses the same r to compare both full and young generation numbers. Would it make sense to have two different r for the two different generation numbers?

            frm Francesco Mari added a comment - Both the solutions sound reasonable to me. Another question: the algorithm uses the same r to compare both full and young generation numbers. Would it make sense to have two different r for the two different generation numbers?

            Would it make sense to have two different r for the two different generation numbers?

            So far I don't think so as the intended semantics is that revisions no older than r generations are retained at any point in time. This is currently the case no matter how we interleave tail and full compaction runs.

            mduerig Michael Dürig added a comment - Would it make sense to have two different r for the two different generation numbers? So far I don't think so as the intended semantics is that revisions no older than r generations are retained at any point in time. This is currently the case no matter how we interleave tail and full compaction runs.

            A further optimisation we could do (possibly as an afterthought once above is implemented) is to clean up tail compacted generations once no w segments reference them any more. This is the case after a full compaction and at least n further tail compactions where n is the number of retained generations. One question here is how to best detect that above criteria is fulfilled.

            mduerig Michael Dürig added a comment - A further optimisation we could do (possibly as an afterthought once above is implemented) is to clean up tail compacted generations once no w segments reference them any more. This is the case after a full compaction and at least n further tail compactions where n is the number of retained generations. One question here is how to best detect that above criteria is fulfilled.

            I rebased my earlier branch OAK-3349-POC branch on top of the current Oak trunk. I used an overnight run of this SCIT configuration to confirm cleanup is as efficient as before rebasing.

            On the OAK-3349 branch I started integrating tail compaction and full compaction. In a first step I replaced the integer representing the gc generation with an ADT GCGeneration capturing the same semantics (passes ITs).
            In a next step I started integrating tail compaction and full compaction. There are many loose ends still:

            • The tail generation is not going into the tar index which makes cleanup awkward and inefficient and precise blob reference collection impossible.
            • Deduplication cache generations are not properly selected as they only take the full generation in account.
            • There is no way to schedule/trigger full vs. tail compaction.
            • Graceful and correct degradation to full compaction in the case where a base version cannot be determined in not implemented.
            • The gc.log does not reflect tail compactions.
            • No UT and IT coverage for tail compaction.

            All respective locations in the code are marked with FIXME OAK-3349.

            mduerig Michael Dürig added a comment - I rebased my earlier branch OAK-3349-POC branch on top of the current Oak trunk. I used an overnight run of this SCIT configuration to confirm cleanup is as efficient as before rebasing. On the OAK-3349 branch I started integrating tail compaction and full compaction. In a first step I replaced the integer representing the gc generation with an ADT GCGeneration capturing the same semantics (passes ITs). In a next step I started integrating tail compaction and full compaction. There are many loose ends still: The tail generation is not going into the tar index which makes cleanup awkward and inefficient and precise blob reference collection impossible. Deduplication cache generations are not properly selected as they only take the full generation in account. There is no way to schedule/trigger full vs. tail compaction. Graceful and correct degradation to full compaction in the case where a base version cannot be determined in not implemented. The gc.log does not reflect tail compactions. No UT and IT coverage for tail compaction. All respective locations in the code are marked with FIXME OAK-3349 .

            Added a missing bit in the predicate for cleaning up old segments: segments generated by tail compaction must not be cleaned up unless the full generation is far enough in the past. See https://github.com/mreutegg/jackrabbit-oak/commit/9a4301e8ba450bf1a41e721f1c773e803f3d9ca8

            mduerig Michael Dürig added a comment - Added a missing bit in the predicate for cleaning up old segments: segments generated by tail compaction must not be cleaned up unless the full generation is far enough in the past. See https://github.com/mreutegg/jackrabbit-oak/commit/9a4301e8ba450bf1a41e721f1c773e803f3d9ca8
            frm Francesco Mari added a comment -

            Fixed at r1803146.

            frm Francesco Mari added a comment - Fixed at r1803146.
            frm Francesco Mari added a comment -

            Minor cleanup made at r1803152.

            frm Francesco Mari added a comment - Minor cleanup made at r1803152.

            Bulk close for 1.7.5

            edivad Davide Giannella added a comment - Bulk close for 1.7.5

            People

              frm Francesco Mari
              mduerig Michael Dürig
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: