Description
I saw a test failure which seems to be due to the following sequence:
1) Log: REPLICATE 1.8 ALTER_SCHEMA
2) Log: REPLICATE 1.9 WRITE
3) Log: COMMIT 1.9 WRITE
4) TabletMetadata::Flush()
5) crash (before COMMIT 1.8 ALTER_SCHEMA)
During bootstrap, we then have an issue that, because we haven't seen a commit message for 1.8, we consider operation 1.9 to be still pending. We are relying on the tablet peer's FlushInFlightsToLogCallback to ensure that we don't flush metadata until the COMMIT message in the log, but that isn't strong enough – we need to actually wait until COMMIT messages are in the log for all prior operations, not just all prior writes. The implementation currently uses MvccManager::WaitForAllInFlightToCommit, but since AlterSchema doesn't use MvccManager, we aren't waiting for it.
Attachments
Attachments
Issue Links
- relates to
-
KUDU-304 Support ALTER_TABLE transactions in consensus replicas
- Resolved