Details
Description
Results
I put it right here, because comments can be missed easily.
- The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated.
- rocksdb based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1. - Nothing really changes when you disable fsync.
- Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load.
What's implemented in current JIRA:
- Basic logs of rocksdb compaction.
- Basic logs of aipersist compaction, that should be expanded in https://issues.apache.org/jira/browse/IGNITE-23056.
Description
Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a
Test environment
6 AWS VMs of type c5d.4xlarge:
- vCPU 16
- Memory 32
- Storage 400 NVMe SSD
- Network up to 10 Gbps
Test
Start 3 Ignite nodes (one node per host). Configuration:
- raft.fsync=false
- partitions=16
- replicas=1
Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters:
- Client 1: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=5100000 -p insertcount=5000000 -s
- Client 2: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=5000000 -s
- Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=10200000 -p insertcount=5000000 -s}}
Results
Results from each client are in the separate files (attached).
From these files we can draw transactions-per-second graphs:
Take a look at these sinks. We need to investigate the cause of them.
Attachments
Attachments
Issue Links
- Dependent
-
IGNITE-23056 Verbose logging of delta-files compaction
- Resolved
- links to