Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-22878

Periodic latency sinks on key-value KeyValueView#put

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0
    • 3.0
    • cache
    • Docs Required, Release Notes Required

    Description

      Results

      I put it right here, because comments can be missed easily.

      • The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated.
      • rocksdb based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation.
        Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1.
      • Nothing really changes when you disable fsync.
      • Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup.
        Making checkpoints more frequent helps smoothing out the graph a little.
        Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load.

      What's implemented in current JIRA:

      Description

      Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

      Benchmark: https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java 

      Test environment

      6 AWS VMs of type c5d.4xlarge:

      • vCPU    16
      • Memory    32
      • Storage    400 NVMe SSD
      • Network    up to 10 Gbps

      Test

      Start 3 Ignite nodes (one node per host). Configuration:

      • raft.fsync=false
      • partitions=16
      • replicas=1

      Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters:

      • Client 1: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=5100000 -p insertcount=5000000 -s
      • Client 2: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=5000000 -s
      • Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=15300000 -p warmupops=100000 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=10200000 -p insertcount=5000000 -s}}

      Results

      Results from each client are in the separate files (attached). 

      From these files we can draw transactions-per-second graphs:

      Take a look at these sinks. We need to investigate the cause of them.

      Attachments

        1. 2024-08-01-11-36-02_192.168.208.148_kv_load.txt
          167 kB
          Ivan Artiukhov
        2. 2024-08-01-11-36-02_192.168.209.141_kv_load.txt
          166 kB
          Ivan Artiukhov
        3. 2024-08-01-11-36-02_192.168.209.191_kv_load.txt
          167 kB
          Ivan Artiukhov
        4. cl1.png
          77 kB
          Ivan Artiukhov
        5. cl2.png
          77 kB
          Ivan Artiukhov
        6. cl3.png
          77 kB
          Ivan Artiukhov

        Issue Links

          Activity

            People

              ibessonov Ivan Bessonov
              Artukhov Ivan Artiukhov
              Aleksandr Polovtsev Aleksandr Polovtsev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h