Details
-
Improvement
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
The current rootCA certificate expiration happens in somewhat over 5 years after the certificate was created.
This event invalidates all certificates that are signed in the trust chain for which the rootCA certificate is the base of trust, this means that rotation and renewal of this certificate is time consuming at once, as it includes the renewal of all certificates.
In order to renew the rootCA certificate, instead of a full security re-bootstrap we would like to follow the following procedure:
- before the rootCA certificate expires, we create a new rootCA certificate
- with the new rootCA certificate we rotate the sub-CA certificate of all 3 SCMs
- once that is done, we make the new rootCA certificate available for other services via an SCM API
- other services are starting to poll for the new rootCA certificate at a time when it is most likely already generated and available via the SCM API
- once the new rootCA certificate is present, services update their TrustStores and after a random delay that leaves room for most if not all of the other services to refresh their TrustStores, every service renews it own certificate regardless of expiration, and gets a new certificate signed by the new sub-CA certificate of the leader.
During this process the start for polling the rootCA certificate happens around the same time, but this is a short request and the response payload is the rootCA certificate only, so SCM might experience a short peak here so we might want to introduce a jitter for this if necessary.
During this process the issuance of new certificates is a resource intensive task on the leader SCM, so we definitely want to introduce a jitter in that, a configurable one, in order to be able to shorten this period for testing.
More information on the failure scenarios and the whole process can be found in the attached pdf document.
Attachments
Attachments
Issue Links
- blocks
-
HDDS-7331 Ozone PKI improvements
- Open
- contains
-
HDDS-752 Functionality to handle key rotation in SCM
- Resolved
- is blocked by
-
HDDS-9420 [Compatibility]Enabling GRPC encryption causes SCM startup failure.
- Resolved
-
HDDS-9442 Token verification from OMs at DT renew happens in the wrong login context.
- Resolved
- links to