Description
As reported here on SO: https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception
If the host is unavailable at Client initialization then the host is not put in a state where reconnect is possible. Essentially, this test for GremlinServerIntegrateTest should pass:
@Test public void shouldFailOnInitiallyDeadHost() throws Exception { // start test with no server this.stopServer(); final Cluster cluster = TestClientFactory.build().create(); final Client client = cluster.connect(); try { // try to re-issue a request now that the server is down client.submit("g").all().get(3000, TimeUnit.MILLISECONDS); fail("Should throw an exception."); } catch (RuntimeException re) { // Client would have no active connections to the host, hence it would encounter a timeout // trying to find an alive connection to the host. assertThat(re.getCause(), instanceOf(NoHostAvailableException.class)); // // should recover when the server comes back // // restart server this.startServer(); // try a bunch of times to reconnect. on slower systems this may simply take longer...looking at you travis for (int ix = 1; ix < 11; ix++) { // the retry interval is 1 second, wait a bit longer TimeUnit.SECONDS.sleep(5); try { final List<Result> results = client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS); assertEquals(1, results.size()); assertEquals(2, results.get(0).getInt()); } catch (Exception ex) { if (ix == 10) fail("Should have eventually succeeded"); } } } finally { cluster.close(); } }
Note that there is a similar test that first allows a connect to a host and then kills it and then restarts it again called shouldFailOnDeadHost() which demonstrates that reconnection works in that situation.
I thought it might be an easy to fix to simply call considerHostUnavailable() in the ConnectionPool constructor in the event of a CompletionException which should kickstart the reconnect process. The reconnects started firing but they all failed for some reason. I didn't have time to investigate further than than.
Currently the only workaround is to recreate the `Client` if this sort of situation occurs.