Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7391

Automated live rotation of CA certificates in a cluster with established trust

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.4.0
    • Security

    Description

      The current rootCA certificate expiration happens in somewhat over 5 years after the certificate was created.
      This event invalidates all certificates that are signed in the trust chain for which the rootCA certificate is the base of trust, this means that rotation and renewal of this certificate is time consuming at once, as it includes the renewal of all certificates.

      In order to renew the rootCA certificate, instead of a full security re-bootstrap we would like to follow the following procedure:

      • before the rootCA certificate expires, we create a new rootCA certificate
      • with the new rootCA certificate we rotate the sub-CA certificate of all 3 SCMs
      • once that is done, we make the new rootCA certificate available for other services via an SCM API
      • other services are starting to poll for the new rootCA certificate at a time when it is most likely already generated and available via the SCM API
      • once the new rootCA certificate is present, services update their TrustStores and after a random delay that leaves room for most if not all of the other services to refresh their TrustStores, every service renews it own certificate regardless of expiration, and gets a new certificate signed by the new sub-CA certificate of the leader.

      During this process the start for polling the rootCA certificate happens around the same time, but this is a short request and the response payload is the rootCA certificate only, so SCM might experience a short peak here so we might want to introduce a jitter for this if necessary.

      During this process the issuance of new certificates is a resource intensive task on the leader SCM, so we definitely want to introduce a jitter in that, a configurable one, in order to be able to shorten this period for testing.

      More information on the failure scenarios and the whole process can be found in the attached pdf document.

      Attachments

        1. CA_cert_rotation_design.pdf
          56 kB
          István Fajth

        Issue Links

          Activity

            People

              pifta István Fajth
              pifta István Fajth
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: