Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6923

AutoAddReplicas should consult live nodes also to see if a state has changed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.0, 6.0
    • SolrCloud
    • None

    Description

      • I did the following
        ./solr start -e cloud -noprompt
        
        kill -9 <pid-of-node2> //Not the node which is running ZK
        
      • /live_nodes reflects that the node is gone.
      • This is the only message which gets logged on the node1 server after killing node2
      45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
      EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket
          at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
          at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
          at java.lang.Thread.run(Thread.java:745)
      
      • The graph shows the node2 as 'Gone' state
      • clusterstate.json keeps showing the replica as 'active'
      {"collection1":{
          "shards":{"shard1":{
              "range":"80000000-7fffffff",
              "state":"active",
              "replicas":{
                "core_node1":{
                  "state":"active",
                  "core":"collection1",
                  "node_name":"169.254.113.194:8983_solr",
                  "base_url":"http://169.254.113.194:8983/solr",
                  "leader":"true"},
                "core_node2":{
                  "state":"active",
                  "core":"collection1",
                  "node_name":"169.254.113.194:8984_solr",
                  "base_url":"http://169.254.113.194:8984/solr"}}}},
          "maxShardsPerNode":"1",
          "router":{"name":"compositeId"},
          "replicationFactor":"1",
          "autoAddReplicas":"false",
          "autoCreated":"true"}}
      

      One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this.

      On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json?

      Attachments

        1. SOLR-6923.patch
          2 kB
          Varun Thacker

        Activity

          People

            markrmiller@gmail.com Mark Miller
            varun Varun Thacker
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: