Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-16102

Store all RocksDB partitions in a single column family.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha3
    • 3.0.0-alpha5
    • None

    Description

      Current storage implementation puts each partition in its own column family. This effectively means that every partition lives in it's own database, sharing only WAL and some in-memory resources. Given that each column family has multiple files for LSM trees, the amount of opened file descriptors is bigger than it needs to be.

      Now, the idea is to have a single column family for partitions within a table. And we should think of possibility of storing several tables in the same RocksDB instance, for similar reasons. You can think about is as of cache groups in Ignite 2.x.

      There's also an "optimization" to be implemented that is missing in code - using key hashes as prefixes.

      What should be implemented:

      First of all, code will be heavily refactored. This will lead to simplifications in many places.

      Otherwise, I see the following list of goals to achieve:

      • current implementation allows to derive the list of partitions from the list of column families. This won't be possible, I suggest storing this list explicitly in "meta" CF, in any format that'll be convenient during the implementation
      • there should be a way of having compact "tableId" representation. IgniteUUID or even UUID is too much I think, but it might work as a basis. This problem should be discussed
      • binary representation for keys should now include following information:
        • tableId - fixed-length set of bytes to be used as a prefix
        • partitionId - 2 bytes that will follow the tableId. This layout will allow making range queries for specific partitions of specific tables
        • key hash - 4 bytes. This one is required to optimize comparison time for keys. Generally speaking, it's safe to assume that hashes will be mostly different for different keys, meaning that hashes will be enough to determine keys inequality
        • actual key payload goes after all these prefixes

      Attachments

        Issue Links

          Activity

            People

              apolovtcev Aleksandr Polovtsev
              ibessonov Ivan Bessonov
              Ivan Bessonov Ivan Bessonov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m