Results

I put it right here, because comments can be missed easily.

The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated.
rocksdb based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1.
Nothing really changes when you disable fsync.
Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load.

What's implemented in current JIRA:

Basic logs of rocksdb compaction.
Basic logs of aipersist compaction, that should be expanded in https://issues.apache.org/jira/browse/IGNITE-23056.

Description

Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

6 AWS VMs of type c5d.4xlarge:

Start 3 Ignite nodes (one node per host). Configuration:

Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters:

Client 1: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=5100000 -p insertcount=5000000 -s
Client 2: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=5000000 -s
Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=10200000 -p insertcount=5000000 -s}}

Results from each client are in the separate files (attached).

From these files we can draw transactions-per-second graphs:

Take a look at these sinks. We need to investigate the cause of them.

2024-08-01-11-36-02_192.168.208.148_kv_load.txt
01/Aug/24 13:04
167 kB
Ivan Artiukhov
2024-08-01-11-36-02_192.168.209.141_kv_load.txt
01/Aug/24 13:04
166 kB
Ivan Artiukhov
2024-08-01-11-36-02_192.168.209.191_kv_load.txt
01/Aug/24 13:04
167 kB
Ivan Artiukhov
cl1.png
01/Aug/24 13:05
77 kB
Ivan Artiukhov
cl2.png
01/Aug/24 13:05
77 kB
Ivan Artiukhov
cl3.png
01/Aug/24 13:05
77 kB
Ivan Artiukhov

Dependent

IGNITE-23056 Verbose logging of delta-files compaction

links to

GitHub Pull Request #4224

Estimated:

Not Specified

Remaining:

Logged:

0.5h