Details

    Description

      Not sure if this is the place to ask, please close if it's not.

      I am seeing some behavior that I can't explain since upgrading to 3.5:

      In a 5 member quorum, when server 3 is the leader and each server has this in their configuration: 

      server.1=100.71.255.254:2888:3888:participant;2181
      server.2=100.71.255.253:2888:3888:participant;2181
      server.3=100.71.255.252:2888:3888:participant;2181
      server.4=100.71.255.251:2888:3888:participant;2181
      server.5=100.71.255.250:2888:3888:participant;2181

      If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in the logs:

      2020-03-11 20:23:35,720 [myid:2] - INFO  [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - LOOKING
      2020-03-11 20:23:35,721 [myid:2] - INFO  [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885] - New election. My id =  2, proposed zxid=0x1b8005f4bba
      2020-03-11 20:23:35,733 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (3, 2)
      2020-03-11 20:23:35,734 [myid:2] - INFO  [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection request 100.126.116.201:36140
      2020-03-11 20:23:35,735 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (4, 2)
      2020-03-11 20:23:35,740 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (5, 2)
      2020-03-11 20:23:35,740 [myid:2] - INFO  [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection request 100.126.116.201:36142
      2020-03-11 20:23:35,740 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config version)
      2020-03-11 20:23:35,742 [myid:2] - WARN  [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting for message on queue
      java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
              at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
              at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
              at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
              at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
      2020-03-11 20:23:35,744 [myid:2] - WARN  [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread  id 3 my id = 2
      2020-03-11 20:23:35,745 [myid:2] - WARN  [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interrupting SendWorker

      The only way I can seem to get them to rejoin the quorum is to restart the leader.

      However, if I remove server 4 and 5 from the configuration of server 1 or 2 (so only servers 1, 2, and 3 remain in the configuration file), then they can rejoin the quorum fine. Is this expected and am I doing something wrong? Any help or explanation would be greatly appreciated. Thank you.

      Attachments

        1. zoo-0.log
          211 kB
          Dai Shi
        2. zoo-1.log
          121 kB
          Dai Shi
        3. zoo-2.log
          39 kB
          Dai Shi
        4. zoo-service.yaml
          3 kB
          Dai Shi
        5. configmap.yaml
          2 kB
          Dai Shi
        6. zookeeper.yaml
          2 kB
          Dai Shi
        7. jmx.yaml
          0.7 kB
          Dai Shi
        8. Dockerfile
          2 kB
          Dai Shi
        9. docker-entrypoint.sh
          1 kB
          Dai Shi

        Issue Links

          Activity

            githubbot ASF GitHub Bot logged work - 19/Mar/20 14:12
            • Time Spent:
              10m
               
              symat commented on pull request #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289
               
               
                 Whenever we close the current master ZooKeeper server, a new leader election
                 is triggered. During the new election, a connection will be established between
                 all the servers, by calling the synchronized 'connectOne' method in
                 QuorumCnxManager. The method will open the socket and send a single small
                 initial message to the other server, usually very quickly. If the destination
                 host is unreachable, it should fail immediately.
                 
                 However, when we use Kubernetes, then the destination host is always reachable
                 as it points to Kubernetes services. If the actual container / pod is not
                 available then the 'socket.connect' method will timeout (by default after 5 sec)
                 instead of failing immediately with NoRouteToHostException. As the 'connectOne'
                 method is synchronized, this timeout will block the creation of other
                 connections, so a single unreachable host can cause timeout in the leader
                 election protocol.
                 
                 One workaround is to decrease the socket connection timeout with the
                 '-Dzookeeper.cnxTimeout' stystem property, but the proper fix would be to
                 make the connection initiation fully asynchronous, as using very low timeout can
                 have its own side effect. Fortunately most of the initial message sending
                 is already made async: the SASL authentication can take more time, so the
                 second (authentication + initial message sending) part of the initiation protocol
                 is already called in a separate thread, when Quorum SASL authentication is enabled.
                 
                 In the following patch I made the whole connection initiation async, by
                 always using the async executor (not only when Quorum SASL is enabled) and
                 also moving the socket.connect call into the async thread.
                 
                 I also created a unit test to verify my fix. I added a static socket factory that can be
                 changed by the tests using a packet private setter method. My test failed (and
                 produced the same error logs as we see in the original Jira ticket) before I applied
                 my changes and a time-outed as no leader election succeeded after 15 seconds.
                 After the changes the test runs very quickly, in 1-2 seconds.
                 
                 Note: due to the multiAddress changes, we will need different PRs to the branch 3.5
                 and to the 3.6+ branches. I will submit the other PR once this got reviewed.
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 14:37
            • Time Spent:
              10m
               
              eolivelli commented on pull request #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#discussion_r395073912
               
               

               ##########
               File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
               ##########
               @@ -359,20 +359,49 @@ public Thread newThread(Runnable r) {
                    *
                    * @param sid
                    */
              - public void testInitiateConnection(long sid) throws Exception {
              + public void testInitiateConnection(long sid) {
                       LOG.debug("Opening channel to server {}", sid);
              - Socket sock = new Socket();
              - setSockOpts(sock);
              - InetSocketAddress address = self.getVotingView().get(sid).electionAddr.getReachableOrOne();
              - sock.connect(address, cnxTO);
              - initiateConnection(sock, sid);
              + initiateConnection(self.getVotingView().get(sid).electionAddr, sid);
                   }
               
                   /**
              + * First we create the socket, perform SSL handshake and authentication if needed.
              + * Then we perform the initiaion protocol.
                    * If this server has initiated the connection, then it gives up on the
                    * connection if it loses challenge. Otherwise, it keeps the connection.
                    */
              - public void initiateConnection(final Socket sock, final Long sid) {
              + public void initiateConnection(final MultipleAddresses electionAddr, final Long sid) {
              + Socket sock = null;
              + try {
              + LOG.debug("Opening channel to server {}", sid);
              + if (self.isSslQuorum()) {
               
               Review comment:
                 A better approach would be to add a
                 Socket makeSocket()
                 method
                 and override it with PowerMock
                 But we don't have powermock :(
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 14:37
            • Time Spent:
              10m
               
              eolivelli commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601215372
               
               
                 LGTM
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 14:47
            • Time Spent:
              10m
               
              symat commented on pull request #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#discussion_r395082580
               
               

               ##########
               File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
               ##########
               @@ -359,20 +359,49 @@ public Thread newThread(Runnable r) {
                    *
                    * @param sid
                    */
              - public void testInitiateConnection(long sid) throws Exception {
              + public void testInitiateConnection(long sid) {
                       LOG.debug("Opening channel to server {}", sid);
              - Socket sock = new Socket();
              - setSockOpts(sock);
              - InetSocketAddress address = self.getVotingView().get(sid).electionAddr.getReachableOrOne();
              - sock.connect(address, cnxTO);
              - initiateConnection(sock, sid);
              + initiateConnection(self.getVotingView().get(sid).electionAddr, sid);
                   }
               
                   /**
              + * First we create the socket, perform SSL handshake and authentication if needed.
              + * Then we perform the initiaion protocol.
                    * If this server has initiated the connection, then it gives up on the
                    * connection if it loses challenge. Otherwise, it keeps the connection.
                    */
              - public void initiateConnection(final Socket sock, final Long sid) {
              + public void initiateConnection(final MultipleAddresses electionAddr, final Long sid) {
              + Socket sock = null;
              + try {
              + LOG.debug("Opening channel to server {}", sid);
              + if (self.isSslQuorum()) {
               
               Review comment:
                 The makeSocket() is also a nice idea :)
                 
                 I was thinking 1-2 times already to use PowerMock. On the other hand the new JDK versions seems to be more strict with reflection... so I am not sure how 'future-proof' it would be.
                 
                 I was also thinking if we should use the `zookeeper.serverCnxnFactory` system property in the test, but we don't use the `ServerCnxnFactory` approach for the leaser election AFAICS. And implementing that would have been a longer story. Maybe for 4.0 it would be good, if we would touch the Leader Election part.
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 14:48
            • Time Spent:
              10m
               
              symat commented on pull request #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#discussion_r395082580
               
               

               ##########
               File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
               ##########
               @@ -359,20 +359,49 @@ public Thread newThread(Runnable r) {
                    *
                    * @param sid
                    */
              - public void testInitiateConnection(long sid) throws Exception {
              + public void testInitiateConnection(long sid) {
                       LOG.debug("Opening channel to server {}", sid);
              - Socket sock = new Socket();
              - setSockOpts(sock);
              - InetSocketAddress address = self.getVotingView().get(sid).electionAddr.getReachableOrOne();
              - sock.connect(address, cnxTO);
              - initiateConnection(sock, sid);
              + initiateConnection(self.getVotingView().get(sid).electionAddr, sid);
                   }
               
                   /**
              + * First we create the socket, perform SSL handshake and authentication if needed.
              + * Then we perform the initiaion protocol.
                    * If this server has initiated the connection, then it gives up on the
                    * connection if it loses challenge. Otherwise, it keeps the connection.
                    */
              - public void initiateConnection(final Socket sock, final Long sid) {
              + public void initiateConnection(final MultipleAddresses electionAddr, final Long sid) {
              + Socket sock = null;
              + try {
              + LOG.debug("Opening channel to server {}", sid);
              + if (self.isSslQuorum()) {
               
               Review comment:
                 The makeSocket() is also a nice idea :)
                 
                 I was thinking 1-2 times already to use PowerMock. On the other hand the new JDK versions seems to be more strict with reflection... so I am not sure how 'future-proof' it would be.
                 
                 I was also thinking if we should use the `zookeeper.serverCnxnFactory` system property in the test, but we don't use the `ServerCnxnFactory` approach for the leader election AFAICS. And implementing that would have been a longer story. Maybe for 4.0 it would be good, if we would touch the Leader Election part.
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 14:59
            • Time Spent:
              10m
               
              eolivelli commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601227377
               
               
                 With the new release schedule of JDK the future is now ;)
                 Don't bother too much
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 15:22
            • Time Spent:
              10m
               
              symat commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601240666
               
               
                 retest maven build
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 16:58
            • Time Spent:
              10m
               
              symat commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601298352
               
               
                 restart maven build
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 19/Mar/20 17:42
            • Time Spent:
              10m
               
              eolivelli commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601320396
               
               
                 What was the error on CI?
                 I hope we are not introducing some kind on instability.
                 In theory this change should make more stable the system
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 20/Mar/20 07:31
            githubbot ASF GitHub Bot logged work - 20/Mar/20 08:56
            • Time Spent:
              10m
               
              eolivelli commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601594104
               
               
                 ok
                 ready to ship it as soon as we have an other approval
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 20/Mar/20 19:54
            • Time Spent:
              10m
               
              eolivelli commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-601884191
               
               
                 @anmolnar @nkalmar PTAL
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 23/Mar/20 11:44
            • Time Spent:
              10m
               
              symat commented on pull request #1293: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1293
               
               
                 Whenever we close the current master ZooKeeper server, a new leader election
                 is triggered. During the new election, a connection will be established between
                 all the servers, by calling the synchronized 'connectOne' method in
                 QuorumCnxManager. The method will open the socket and send a single small
                 initial message to the other server, usually very quickly. If the destination
                 host is unreachable, it should fail immediately.
                 
                 However, when we use Kubernetes, then the destination host is always reachable
                 as it points to Kubernetes services. If the actual container / pod is not
                 available then the 'socket.connect' method will timeout (by default after 5 sec)
                 instead of failing immediately with NoRouteToHostException. As the 'connectOne'
                 method is synchronized, this timeout will block the creation of other
                 connections, so a single unreachable host can cause timeout in the leader
                 election protocol.
                 
                 One workaround is to decrease the socket connection timeout with the
                 '-Dzookeeper.cnxTimeout' stystem property, but the proper fix would be to
                 make the connection initiation fully asynchronous, as using very low timeout can
                 have its own side effect. Fortunately most of the initial message sending
                 is already made async: the SASL authentication can take more time, so the
                 second (authentication + initial message sending) part of the initiation protocol
                 is already called in a separate thread, when Quorum SASL authentication is enabled.
                 
                 In the following patch I made the whole connection initiation async, by
                 always using the async executor (not only when Quorum SASL is enabled) and
                 also moving the socket.connect call into the async thread.
                 
                 I also created a unit test to verify my fix. I added a static socket factory that can be
                 changed by the tests using a packet private setter method. My test failed (and
                 produced the same error logs as we see in the original Jira ticket) before I applied
                 my changes and a time-outed as no leader election succeeded after 15 seconds.
                 After the changes the test runs very quickly, in 1-2 seconds.
                 
                 Note: due to the multiAddress changes, we will need different PRs to the branch 3.5
                 and to the 3.6+ branches. I will submit the other PR once this got reviewed.
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 23/Mar/20 11:45
            githubbot ASF GitHub Bot logged work - 23/Mar/20 11:45
            githubbot ASF GitHub Bot logged work - 23/Mar/20 15:20
            • Time Spent:
              10m
               
              asfgit commented on pull request #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289
               
               
                 
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 23/Mar/20 15:21
            • Time Spent:
              10m
               
              nkalmar commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-602670239
               
               
                 Thanks @symat , merged to master and branch-3.6. I'll check the PR for 3.5.
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 23/Mar/20 15:48
            • Time Spent:
              10m
               
              symat commented on issue #1289: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1289#issuecomment-602686312
               
               
                 thanks @eolivelli and @nkalmar for the quick. reviews!
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 26/Mar/20 07:40
            • Time Spent:
              10m
               
              symat commented on issue #1293: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1293#issuecomment-604277486
               
               
                 @eolivelli can you please take a look?
                 this is the same as #1289 just for branch 3.5...
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 26/Mar/20 16:32
            • Time Spent:
              10m
               
              nkalmar commented on issue #1293: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1293#issuecomment-604532914
               
               
                 Merged to branch-3.5, thanks @symat
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org
            githubbot ASF GitHub Bot logged work - 26/Mar/20 16:32
            • Time Spent:
              10m
               
              nkalmar commented on pull request #1293: ZOOKEEPER-3756: Members slow to rejoin quorum using Kubernetes
              URL: https://github.com/apache/zookeeper/pull/1293
               
               
                 
               
              ----------------------------------------------------------------
              This is an automated message from the Apache Git Service.
              To respond to the message, please log on to GitHub and use the
              URL above to go to the specific comment.
               
              For queries about this service, please contact Infrastructure at:
              users@infra.apache.org

            People

              symat Mate Szalay-Beko
              dshi Dai Shi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h