Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15878

Ec2Snitch fails on upgrade in legacy mode

    XMLWordPrintableJSON

Details

    Description

      CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the Ec2Snitch to match AWS conventions.

      The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and keep the same naming as before (while the "standard" mode uses the new naming convention).

      When performing an upgrade in the us-west-2 region, the second node failed to start with the following exception:

       

      ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled snitch appears to be using the legacy naming scheme for regions, but existing nodes in cluster are using the opposite: region(s) = [us-west-2], availability zone(s) = [2a]. Please check the ec2_naming_scheme property in the cassandra-rackdc.properties configuration file for more details.
      ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception encountered during startup
      java.lang.IllegalStateException: null
      	at org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
      	at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
      	at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
      	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
      	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
      	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
      	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
      	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
      

       

      The exception leads back to this piece of code.

      After adding some logging, it turned out the DC name of the first upgraded node was considered invalid as a legacy one:

      INFO  [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC us-west-2
      INFO  [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - dcUsesLegacyFormat=false / usingLegacyNaming=true
      ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name us-west-2
      

       

      The problem is that the regex that's used to identify legacy dc names will match both old and new names : 

      boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
      

      Knowing that some dc names didn't change between the two modes (us-west-2 for example), I don't see how we can use the dc names to detect if the legacy mode is being used by other nodes in the cluster.
       
      The rack names on the other hand are totally different in the legacy and standard modes and can be used to detect mismatching settings.
       
      My go to fix would be to drop the check on datacenters by removing the following lines: https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186

      Attachments

        Issue Links

          Activity

            People

              adejanovski Alexander Dejanovski
              adejanovski Alexander Dejanovski
              Alexander Dejanovski
              Michael Semb Wever
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: