Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19729

Cassandra node failed to startup in dtests when all nodes in the cluster are shutdown and ongoing a restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Triage Needed
    • Normal
    • Resolution: Unresolved
    • None
    • Test/dtest/java
    • None
    • All
    • None

    Description

      What happened

      During an upgrade testing scenario, when all nodes are shutdown and ongoing a new restart, the node instance fails to be restarted due to `IllegalStateException` in Cassandra distributed tests.

      How to reproduce

      Put the following code under `cassandra/test/distributed/org/apache/cassandra/distributed/upgrade`.

       

      package org.apache.cassandra.distributed.upgrade;
      public class JVMDTestUpgradeTest extends UpgradeTestBase
      {
          @Test
          public void shutdownAllAndRestart() throws Throwable
          {
              new TestCase()
                      .nodes(2)
                      .nodesToUpgrade(1)
                      .upgradesToCurrentFrom(v3X)
                      .setup((cluster) -> {
                          cluster.schemaChangeIgnoringStoppedInstances("CREATE TABLE "+KEYSPACE+".tbl1 (id int primary key, i int)");
                      })
                      .runAfterNodeUpgrade((cluster, node) -> {
                          cluster.get(2).shutdown(true).get(1, TimeUnit.MINUTES);
                          cluster.get(1).shutdown(true).get(1, TimeUnit.MINUTES);
                          assertTrue(cluster.get(1).isShutdown());
                          assertTrue(cluster.get(2).isShutdown());
      
                          cluster.get(1).startup();
                          cluster.get(2).startup();
                          assertFalse(cluster.get(1).isShutdown());
                          assertFalse(cluster.get(2).isShutdown());
              }).run();
          }
      } 

      Build and run the above tests with dtest version jars before `4.1`. In my case I'm using `dtest-4.0.1.jar` and `dtest-4.0.2.jar`.
      Run it with the following command:
      ```bash
      $ ant test-jvm-dtest-some -Duse.jdk11=true -Dtest.name=org.apache.cassandra.distributed.upgrade.JVMDTestUpgradeTest
      ```
      You will see the following error message:

      [junit-timeout] Caused by: java.lang.IllegalStateException: Can't use shutdown instances, delegate is null
      [junit-timeout]         at org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:283)
      [junit-timeout]         at org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.getMessagingVersion(DelegatingInvokableInstance.java:90)
      [junit-timeout]         at org.apache.cassandra.distributed.action.GossipHelper.unsafeStatusToNormal(GossipHelper.java:89)
      [junit-timeout]         at org.apache.cassandra.distributed.impl.Instance.lambda$startup$9(Instance.java:555)
      [junit-timeout]         at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
      [junit-timeout]         at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
      [junit-timeout]         at org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:551)
      [junit-timeout]         at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      [junit-timeout]         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      [junit-timeout]         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      [junit-timeout]         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      [junit-timeout]         at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      [junit-timeout]         at java.base/java.lang.Thread.run(Thread.java:829)

       

       

      However, this bug only happens if you shut down all the nodes in the cluster and try to restart them. If you only shut down partial nodes, for example, the code below shuts down one of the two nodes and restarts it without any problem.

       

        @Test
        public void shutdownOneAndRestart() throws Throwable
        {
            new TestCase()
                    .nodes(2)
                    .nodesToUpgrade(1)
                    .upgradesToCurrentFrom(v3X)
                    .setup((cluster) -> {
                        cluster.schemaChangeIgnoringStoppedInstances("CREATE TABLE "+KEYSPACE+".tbl1 (id int primary key, i int)");
                    })
                    .runAfterNodeUpgrade((cluster, node) -> {
                        cluster.get(2).shutdown(true).get(1, TimeUnit.MINUTES);
                        assertTrue(cluster.get(2).isShutdown());
      
                        cluster.get(2).startup();
                        assertFalse(cluster.get(1).isShutdown());
                        assertFalse(cluster.get(2).isShutdown());
                    }).run();
        } 

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            FuzzingTeam ConfX
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: