Description
Very often we have a timeout when we call http://server2:8080/solr/admin/collections?action=CLUSTERSTATUS&wt=json
{"responseHeader": {"status": 500, "QTime": 180100}, "error": {"msg": "CLUSTERSTATUS the collection time out:180s", "trace": "org.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:320)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:640)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:220)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1338)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:350)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:630)\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)\n\tat org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:77)\n\tat org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:606)\n\tat org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:46)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:603)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:538)\n\tat java.lang.Thread.run(Thread.java:745)\n", "code": 500}}
The cluster has 3 SolR nodes with 6 small collections replicated on all nodes.
We were using this api to monitor cluster state but it was failing every 10 minutes. We switched by using ZkStateReader in CloudSolrServer and it has been working for a day without problems.
Is there a kind of deadlock as this call was been made on the three nodes concurrently?