Description
Cassandra is used within distributed James product to hold messages and mailboxes metadata.
Cassandra holds the following tables:
- mailboxPathV2 + mailbox allowing to retrieve mailboxes informations
- acl + UserMailboxACL holds denormalized information
- messageIdTable & imapUidTable allows to retrieve mailbox context information
- messageV2 table holds message matadata
- attachmentV2 holds attachment for messages
- References to these attachments are contained within the attachmentOwner and attachmentMessageId tables
Currently, the deletion only deletes the first level of metadata. Lower level metadata stay unreachable. The data looks
deleted but references are actually still present.
Concretely:
- Upon mailbox deletion, only mailboxPathV2 & mailbox content is deleted. messageIdTable, imapUidTable, messageV2,
attachmentV2 & attachmentMessageId metadata is left undeleted. - Upon mailbox deletion, acl + UserMailboxACL is not deleted.
- Upon message deletion, only messageIdTable & imapUidTable content is deleted. messageV2, attachmentV2 &
attachmentMessageId metadata is left undeleted.
This jeopardize efforts to regain disk space and privacy, for example through blobStore garbage collection.
We need to cleanup Cassandra metadata. They can be retrieved from dandling metadata after the delete operation had been
conducted out. We need to delete the lower levels first so that upon failures undeleted metadata can still be reached.
This cleanup is not needed for strict correctness from a MailboxManager point of view thus it could be carried out
asynchronously, via mailbox listeners so that it can be retried.
Mailbox listener failures leads to eventBus retrying their execution, we need to ensure the result of the deletion to be
idempotent. This might have consequences on the blobStore garbage collection design.