Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Do
-
None
-
None
-
None
-
New
Description
We introduced a bitset optimization for extremly low cardinality fields in LUCENE-10233, but medium cardinality fields (like 32/128) can rarely trigger this optimization, This issue is trying to find out a way to speed up them.
In https://github.com/apache/lucene-solr/pull/1538, we made some effort to use readLELongs to speed up BKD id blocks, but did not get a obvious gain on this approach. I think the reason could probably be that we were trying to optimize the unsorted situation (typically happens for high cardinality fields) and the bottleneck of queries on high cardinality fields is visitDocValues but not readDocIds.
However, medium cardinality fields may be tempted for this optimization because they need to read lots of ids for each term. The basic idea is that we can compute the delta of the sorted ids and encode/decode them like what we do in StoredFieldsInts. I benchmarked the optimization by mocking some random longPoint and querying them with PointInSetQuery. As expected, the medium cardinality fields got spped up and high cardinality fields get even results.
Benchmark Result
doc count | field cardinality | query point | baseline(ms) | candidate(ms) | diff percentage | baseline(QPS) | candidate(QPS) | diff percentage |
100000000 | 32 | 1 | 19 | 16 | -15.79% | 52.63 | 62.50 | 18.75% |
100000000 | 32 | 2 | 34 | 14 | -58.82% | 29.41 | 71.43 | 142.86% |
100000000 | 32 | 4 | 76 | 22 | -71.05% | 13.16 | 45.45 | 245.45% |
100000000 | 32 | 8 | 139 | 42 | -69.78% | 7.19 | 23.81 | 230.95% |
100000000 | 32 | 16 | 279 | 82 | -70.61% | 3.58 | 12.20 | 240.24% |
100000000 | 128 | 1 | 17 | 11 | -35.29% | 58.82 | 90.91 | 54.55% |
100000000 | 128 | 8 | 75 | 23 | -69.33% | 13.33 | 43.48 | 226.09% |
100000000 | 128 | 16 | 126 | 25 | -80.16% | 7.94 | 40.00 | 404.00% |
100000000 | 128 | 32 | 245 | 50 | -79.59% | 4.08 | 20.00 | 390.00% |
100000000 | 128 | 64 | 528 | 97 | -81.63% | 1.89 | 10.31 | 444.33% |
100000000 | 1024 | 1 | 3 | 2 | -33.33% | 333.33 | 500.00 | 50.00% |
100000000 | 1024 | 8 | 13 | 8 | -38.46% | 76.92 | 125.00 | 62.50% |
100000000 | 1024 | 32 | 31 | 19 | -38.71% | 32.26 | 52.63 | 63.16% |
100000000 | 1024 | 128 | 120 | 67 | -44.17% | 8.33 | 14.93 | 79.10% |
100000000 | 1024 | 512 | 480 | 133 | -72.29% | 2.08 | 7.52 | 260.90% |
100000000 | 8192 | 1 | 3 | 3 | 0.00% | 333.33 | 333.33 | 0.00% |
100000000 | 8192 | 16 | 18 | 15 | -16.67% | 55.56 | 66.67 | 20.00% |
100000000 | 8192 | 64 | 19 | 14 | -26.32% | 52.63 | 71.43 | 35.71% |
100000000 | 8192 | 512 | 69 | 43 | -37.68% | 14.49 | 23.26 | 60.47% |
100000000 | 8192 | 2048 | 236 | 134 | -43.22% | 4.24 | 7.46 | 76.12% |
100000000 | 1048576 | 1 | 3 | 2 | -33.33% | 333.33 | 500.00 | 50.00% |
100000000 | 1048576 | 16 | 18 | 19 | 5.56% | 55.56 | 52.63 | -5.26% |
100000000 | 1048576 | 64 | 17 | 17 | 0.00% | 58.82 | 58.82 | 0.00% |
100000000 | 1048576 | 512 | 34 | 32 | -5.88% | 29.41 | 31.25 | 6.25% |
100000000 | 1048576 | 2048 | 89 | 93 | 4.49% | 11.24 | 10.75 | -4.30% |
Attachments
Issue Links
- links to