Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8544 Snapshot feature Phase 2 : Further enhancements for Ozone Snapshots
  3. HDDS-11452

OmSnapshotPurgeRequest is not atomic and can lead to SnapshotChain Corruption

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      OmSnapshotPurgeRequest updates the snapshot chain and also updates the cache & in case of any failure these changes are not rolled back. In case of checked exception thrown(This could be any exception ranging from proto exception or any random IOException), the request gobbles up the exception and returns an error response. The problem with this is, we have partially updated snapshot info table cache which is not coherrent with the snapshot chain and all these changes won't be flushed to disk. On restart this could lead to all sorts of snapshot chain & snapshot info corruption. 

      The proposal here is to make the entire request atomic:

      1) Update the snapshot chain & maintain the updated snapshot infos in local uncommitted space.

      2) In case of an exception, roll back all deleted snapshots by putting it back to the snapshot chain(P.S. this needs to be done in the reverse order of removal) & return an error response.

      3) If no exception is thrown, update the snapshot info table cache.

      4) Send it to double buffer

      cc: hemantk ppogde 

      Attachments

        Issue Links

          Activity

            People

              swamirishi Swaminathan Balachandran
              swamirishi Swaminathan Balachandran
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: