Description
I'm doing the following:
- Start a cluster of 3 nodes
- Create a table with 3 replicas and 1 partition
- Stop one node
- Make a snapshot on the leader of the partition (twice in a row) - I'm not sure this step is required to reproduce the issue
- Start the stopped node
Then in this code
private <R> CompletableFuture<R> enlistWithRetry(
InternalTransaction tx,
int partId,
BiFunction<TablePartitionId, Long, ReplicaRequest> requestFunction,
int attempts
) {
CompletableFuture<R> result = new CompletableFuture<>();
enlist(partId, tx).<R>thenCompose(
primaryReplicaAndTerm -> {
try {
return replicaSvc.invoke(
primaryReplicaAndTerm.get1(),
requestFunction.apply((TablePartitionId) tx.commitPartition(), primaryReplicaAndTerm.get2())
);
primaryReplicaAndTerm turns out to contain null as first element, 1 as second element, which causes an NPE further. Probably, clusterNodeResolver returned null.
This does not reproduce often, but I saw this a few times.
Attachments
Issue Links
- is related to
-
IGNITE-18605 Account for inherent unreliability of messaging
- Open