Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When using the MarkSweepGarbageCollector (using for example a file data store), if the blob id file (from the BlobIdTracker) contains records that don't exist in the datastore, then a warning is logged when trying to remove the (unreferenced) file:
*WARN* org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Error occurred while deleting blob with id [...] org.apache.jackrabbit.core.data.DataStoreException: Record ... does not exist at org.apache.jackrabbit.core.data.AbstractDataStore.getRecord(AbstractDataStore.java:59) [org.apache.jackrabbit.jackrabbit-data:2.16.3] at org.apache.jackrabbit.oak.plugins.blob.datastore.OakFileDataStore.getRecordForId(OakFileDataStore.java:259) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore.getRecordForId(DataStoreBlobStore.java:520) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore.countDeleteChunks(DataStoreBlobStore.java:426) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector$BlobCollectionType.sweepInternal(MarkSweepGarbageCollector.java:859) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.sweep(MarkSweepGarbageCollector.java:423) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.markAndSweep(MarkSweepGarbageCollector.java:287) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.collectGarbage(MarkSweepGarbageCollector.java:194) [org.apache.jackrabbit.oak-blob-plugins:1.8.9]
That means it tried to remove a file that doesn't exist.
This indicates a problem in the process; for example, the blob id tracker file(s) was/were restored from an older backup. (Possibly there are other cases how this could occur).
Now, the next time the garbage collection is run, the same files will try to be removed, and that again fails.
It would be better if the files that don't exist are removed from the blob id tracker file, so that they are not tried to be removed later again and again.
If the blob id tracker file(s) are incorrect, I think it would be better to delete and rebuild them, otherwise some of the unreferenced binaries will never be removed. Possibly a warning should be logged, with instructions on how to rebuild these files.