Details
Description
This caused a YCSB job to fail:
- a server fell behind for some reason (haven't done root cause on why – maybe just a bit slow)
- leader GCed the logs needed to catch it up, and thus stopped sending it any heartbeats or other messages
- the server had one write pending
- the java client apparently just kept retrying over and over against the same server
The server with the pending txn may actually have been the leader at the time it was written - otherwise not sure why Java keeps retrying it. Or perhaps the Java client got an error on the leader, failed over to try the follower, and RPCs to the follower are timing out.