Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5317

BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.

    XMLWordPrintableJSON

Details

    Description

      GetSCMCertificate can happen non-leader SCM, as rootCA is only run on primary SCM.
      So, when an SCM is bootstrapped, let's say it connects first to a bootstrapped SCM, we fail with a SCMSecurityResponse with status set to NOT_A_PRIMARY_SCM. As we return with a response, failOver will not happen.

      SCMSecurityProtocolClientSideTranslatorPB

        private SCMSecurityResponse handleError(SCMSecurityResponse resp)
            throws SCMSecurityException {
          if (resp.getStatus() != SCMSecurityProtocolProtos.Status.OK) {
            throw new SCMSecurityException(resp.getMessage(),
                SCMSecurityException.ErrorCode.values()[resp.getStatus().ordinal()]);
          }
          return resp;
        }
      

      To solve this issue, one possible solution is on server check if it is SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a RetriableWithFailOverException. In this way, FailOverProxyProvider performs failOver and Retry to the next SCM.

      The exception message is available in comments.

      Attachments

        Issue Links

          Activity

            People

              bharat Bharat Viswanadham
              bharat Bharat Viswanadham
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: