Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7985

[SCM HA] On SCM Disk failure recovery causes Datanode Failure on startup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      Recovery from an SCM disk failure when no backup is avail requires,

      • Clean ozone.scm.db.dirs __ and __ ozone.metadata.dirs locations

      and bootstrapping the SCM.  Whether SCM is primodial or not an error occurs when recovering from a failed disk with no backup when starting a datanode after SCM recovery. 

       

      Datanodes brought up after SCM disk failure recovery are unable to start due to a CA certificate error observed, stating the number of certificates received from the SCM is greater than the number expected:

      ozonesecure-ha-datanode1-1  | 2023-02-17 00:46:40 INFO  HAUtils:457 - Expected CA list size 4, where as received CA List size 5.

      In this case when listing the certificates stored by the SCM, it reports a total of 5 scm certificates after SCM2 recovers from disk failure:

       

      CN=scm@scm1.org
      CN=scm-sub@scm1.org
      CN=scm-sub@scm2.org
      CN=scm-sub@scm3.org
      CN=scm-sub@scm2.org
       
      

      It appears to have 2 entries for SCM 2 (the scm disk failure recovery node)

       

      $ ozone admin certs list

      bash-4.2$ ozone admin cert list

      Total 12 valid certificates: 
      SerialNumber      Valid From                     Expiry                         Subject                                                                                                       
      1                 Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=scm@scm1.org          
      10760186198072    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=scm-sub@scm1.org      
      10779888473070    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=recon@recon           
      10780166036417    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f99f1a81-7cce-44c9-a09b-9f7bbc48b6ac, CN=scm-sub@scm2.org      
      10788394717480    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=598be6bc-7d86-4cab-84dc-668a162a7ec2, CN=scm-sub@scm3.org      
      10800769855768    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@bd3138308a3f       
      10801305457014    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@e4795cc77124       
      10801871334038    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@3eb28ff965a1       
      10803980992569    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om2                   
      10804543987939    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om3                   
      10806118720884    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om1                   
      10932809284268    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, OU=b4a175f3-c6a4-47fd-bcc5-c081b03de8c7, CN=scm-sub@scm2.org      

      Attachments

        Issue Links

          Activity

            People

              NeilJoshi Neil Joshi
              NeilJoshi Neil Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: