Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.1-incubating
-
None
-
"Cluster" of a single gremlin-server/Titan node. Single client. minConnectionPoolSize = 2; maxConnectionPoolSize = 10.
Description
Client gets a GryoMessageSerializerV1d0 decoding error on a couple of requests (different unfiled issue). ConnectionPool gets into the situation where there are 2 dead connections being maintained in the pool, but no new connections are being issued.
A sample of the log output...
2015-08-27 09:57:18,122 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Borrowing connection from pool on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} - timeout in 3000 MILLISECONDS 2015-08-27 09:57:18,122 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Return least used Connection{host=Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin}, isDead=false, borrowed=1, pending=0} on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} 2015-08-27 09:57:18,122 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.Client - Submitted RequestMessage{, requestId=0349c1ce-19dc-4d54-9ca9-30965c9314ae, op='eval', processor='', args={<redacted>}} to - Connection{host=Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin}} 2015-08-27 09:57:18,125 [gremlin-driver-worker-6] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Attempting to return Connection{host=Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin}} on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} 2015-08-27 09:57:18,126 [gremlin-driver-worker-6] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - On Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} pool size of 3 > minPoolSize 2 and borrowed of 0 <= minSimultaneousUsagePerConnection 8 so destroy Connection{host=Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin}, isDead=false, borrowed=0, pending=0} 2015-08-27 09:57:18,126 [gremlin-driver-worker-6] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Connection{host=Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin}, isDead=false, borrowed=0, pending=0} destroyed 2015-08-27 09:57:18,126 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Borrowing connection from pool on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} - timeout in 3000 MILLISECONDS 2015-08-27 09:57:18,126 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Pool was initialized but a connection could not be selected earlier - waiting for connection on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} 2015-08-27 09:57:18,126 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Wait 3000 MILLISECONDS for an available connection on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} with Thread[Grizzly-worker(15),5,main] 2015-08-27 09:57:21,127 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Continue to wait for connection on -1 if {} > 0 2015-08-27 09:57:21,127 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Timed-out waiting for connection on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} - possibly unavailable 2015-08-27 09:57:21,127 [gremlin-driver-worker-3] DEBUG org.apache.tinkerpop.gremlin.driver.Host - Trying to reconnect to dead host at Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} 2015-08-27 09:57:21,128 [Grizzly-worker(15)] DEBUG org.apache.tinkerpop.gremlin.driver.Client - Submitted RequestMessage{, requestId=67761973-7ae7-4a6a-8e62-443fdb55142d, op='eval', processor='', args=<redacted>} to - connection not initialized 2015-08-27 09:57:21,128 [gremlin-driver-worker-3] DEBUG org.apache.tinkerpop.gremlin.driver.ConnectionPool - Trying to re-establish connection on Host{address=localhost/127.0.0.1:18182, hostUri=ws://localhost:18182/gremlin} 2015-08-27 09:57:21,128 [Grizzly-worker(15)] ERROR EdnaMetaDataServiceGremlinImpl - Failed man lookupMerchantAccountName java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException
This behavior continues with each subsequent request. The gremlin-server is available at all times.
As a workaround I have applied the following patch, but I am pretty sure it is not the correct solution
diff --git a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java index 30cd71f..cb8ebfa 100644 --- a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java +++ b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java @@ -352,7 +352,7 @@ final class ConnectionPool { } remaining = to - TimeUtil.timeSince(start, unit); - logger.debug("Continue to wait for connection on {} if {} > 0", remaining); + logger.debug("Continue to wait for connection on {} if {} > 0", host, remaining); } while (remaining > 0); logger.debug("Timed-out waiting for connection on {} - possibly unavailable", host); @@ -371,13 +371,16 @@ final class ConnectionPool { // let the load-balancer know that the host is acting poorly this.cluster.loadBalancingStrategy().onUnavailable(host); - } private boolean tryReconnect(final Host h) { logger.debug("Trying to re-establish connection on {}", host); try { + // remove dead connections + for (Connection c : connections) + if (c.isDead()) + definitelyDestroyConnection(c); connections.add(new Connection(host.getHostUri(), this, cluster, settings().maxInProcessPerConnection)); this.open.set(connections.size());
Attachments
Issue Links
- relates to
-
TINKERPOP-816 Gryo deserialization of error response with null message causes NPE and protocol desync
- Closed
-
TINKERPOP-817 Gryo serialization of large responses fails and causes protocol desync
- Closed
Thanks for reporting this. Is there an "easy" way to get the driver into this state? You mentioned the "gryo" serialization failure...anything I can easily/readily reproduce?