Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
New
Description
Description
In Lucene90DocValuesProducer, BinaryDocValue (as well as SortedNumericDocValues not in singleton case) has code patterns like this:
long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
This means we need to read 2 longs stored together. We could probably push down this info to LongValues and read 2 values together in one call. I think this can make sense because these codes could be rather hot.
Benchmark
In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use BinaryDocValues any more in tasks. So i tried to rollback the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used to use BinaryDocvalues to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if there is a more official way to benchmark BinaryDocValues by chaging some params or add some tasks? ) Anyway, I believe It is still worth optimizing BinarayDocValue though facets do not use it any more
Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justify a speed up in BinaryDocValues)
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% - 6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% - 3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% - 7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% - 2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% - 3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% - 2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% - 6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% - 4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% - 2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% - 9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% - 3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% - 7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% - 7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% - 7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%) 0.0% ( -3% - 3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%) 0.0% ( -3% - 3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%) 0.1% ( -2% - 2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%) 0.1% ( -3% - 3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%) 0.1% ( -3% - 4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%) 0.1% ( -4% - 4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%) 0.2% ( -6% - 7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%) 0.2% ( -3% - 4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%) 0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%) 0.2% ( -7% - 8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%) 0.2% ( -3% - 3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%) 0.3% ( -2% - 3%) 0.532 LowSloppyPhrase 142.83 (1.8%) 143.54 (2.0%) 0.5% ( -3% - 4%) 0.410 OrNotHighMed 709.24 (4.4%) 712.78 (4.3%) 0.5% ( -7% - 9%) 0.715 Fuzzy1 85.33 (5.7%) 85.77 (6.3%) 0.5% ( -10% - 13%) 0.786 MedTerm 1466.50 (3.5%) 1474.85 (3.9%) 0.6% ( -6% - 8%) 0.629 TermDTSort 105.51 (7.7%) 106.33 (7.3%) 0.8% ( -13% - 17%) 0.746 PKLookup 206.18 (2.9%) 208.68 (2.9%) 1.2% ( -4% - 7%) 0.179 OrHighNotMed 876.71 (3.0%) 887.84 (3.9%) 1.3% ( -5% - 8%) 0.251 OrNotHighHigh 774.25 (4.7%) 785.03 (6.0%) 1.4% ( -8% - 12%) 0.411 HighTermMonthSort 74.33 (9.4%) 75.47 (16.3%) 1.5% ( -22% - 30%) 0.716 OrHighLow 518.73 (5.2%) 528.27 (5.4%) 1.8% ( -8% - 13%) 0.272 HighTerm 1892.16 (3.4%) 1934.63 (5.5%) 2.2% ( -6% - 11%) 0.120 AndHighHighDayTaxoFacets 16.46 (2.7%) 16.84 (2.3%) 2.3% ( -2% - 7%) 0.004 HighTermTitleBDVSort 141.39 (14.6%) 145.33 (15.1%) 2.8% ( -23% - 38%) 0.554 MedTermDayTaxoFacets 27.81 (2.1%) 29.54 (2.3%) 6.2% ( 1% - 10%) 0.000 OrHighMedDayTaxoFacets 3.05 (1.9%) 3.30 (2.2%) 8.3% ( 4% - 12%) 0.000 BrowseDayOfYearSSDVFacets 17.36 (13.0%) 18.97 (15.8%) 9.3% ( -17% - 43%) 0.042 BrowseDayOfYearTaxoFacets 3.02 (3.6%) 3.79 (2.5%) 25.4% ( 18% - 32%) 0.000 BrowseDateTaxoFacets 3.01 (3.6%) 3.79 (2.5%) 25.6% ( 18% - 32%) 0.000 BrowseMonthTaxoFacets 3.14 (2.1%) 3.99 (2.5%) 27.0% ( 21% - 32%) 0.000
newest code version
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value TermDTSort 129.74 (10.9%) 127.83 (11.3%) -1.5% ( -21% - 23%) 0.675 HighTerm 1182.13 (5.1%) 1172.76 (6.5%) -0.8% ( -11% - 11%) 0.668 HighSpanNear 7.99 (4.2%) 7.96 (4.2%) -0.3% ( -8% - 8%) 0.816 HighIntervalsOrdered 17.86 (2.1%) 17.85 (2.3%) -0.1% ( -4% - 4%) 0.927 BrowseDateTaxoFacets 19.61 (17.2%) 19.61 (17.4%) -0.0% ( -29% - 41%) 0.995 OrNotHighHigh 619.85 (4.3%) 619.72 (8.6%) -0.0% ( -12% - 13%) 0.992 PKLookup 202.14 (5.6%) 202.11 (4.4%) -0.0% ( -9% - 10%) 0.994 LowIntervalsOrdered 25.53 (1.5%) 25.53 (1.6%) 0.0% ( -3% - 3%) 1.000 BrowseDayOfYearSSDVFacets 14.27 (2.7%) 14.28 (2.7%) 0.0% ( -5% - 5%) 0.965 MedIntervalsOrdered 47.33 (1.9%) 47.34 (2.0%) 0.0% ( -3% - 3%) 0.947 BrowseRandomLabelSSDVFacets 10.25 (2.4%) 10.26 (2.4%) 0.1% ( -4% - 4%) 0.935 BrowseMonthSSDVFacets 15.66 (3.0%) 15.67 (3.0%) 0.1% ( -5% - 6%) 0.945 MedSloppyPhrase 11.97 (1.7%) 11.98 (1.9%) 0.1% ( -3% - 3%) 0.840 Wildcard 25.71 (2.6%) 25.75 (2.4%) 0.1% ( -4% - 5%) 0.875 MedPhrase 33.62 (2.5%) 33.68 (2.6%) 0.2% ( -4% - 5%) 0.802 HighTermDayOfYearSort 80.58 (11.0%) 80.76 (10.6%) 0.2% ( -19% - 24%) 0.949 HighTermTitleBDVSort 130.43 (11.7%) 130.73 (10.7%) 0.2% ( -19% - 25%) 0.947 AndHighHighDayTaxoFacets 32.25 (3.0%) 32.33 (2.9%) 0.2% ( -5% - 6%) 0.796 LowSloppyPhrase 39.50 (1.7%) 39.61 (1.4%) 0.3% ( -2% - 3%) 0.586 Prefix3 127.42 (3.8%) 127.77 (3.4%) 0.3% ( -6% - 7%) 0.812 HighTermMonthSort 117.65 (8.4%) 117.98 (8.1%) 0.3% ( -14% - 18%) 0.915 HighSloppyPhrase 14.47 (1.8%) 14.51 (2.2%) 0.3% ( -3% - 4%) 0.647 MedSpanNear 48.78 (2.2%) 48.93 (2.0%) 0.3% ( -3% - 4%) 0.640 OrHighMedDayTaxoFacets 13.42 (3.7%) 13.48 (3.6%) 0.4% ( -6% - 7%) 0.730 AndHighMedDayTaxoFacets 37.90 (3.0%) 38.05 (3.4%) 0.4% ( -5% - 7%) 0.694 Fuzzy1 83.31 (3.9%) 83.70 (4.9%) 0.5% ( -7% - 9%) 0.738 Respell 49.74 (1.3%) 50.00 (1.5%) 0.5% ( -2% - 3%) 0.254 OrHighLow 531.57 (8.0%) 534.83 (6.7%) 0.6% ( -13% - 16%) 0.792 AndHighHigh 71.99 (2.6%) 72.44 (3.4%) 0.6% ( -5% - 6%) 0.520 LowSpanNear 191.64 (3.5%) 192.85 (3.7%) 0.6% ( -6% - 8%) 0.580 MedTermDayTaxoFacets 55.51 (3.1%) 55.86 (3.9%) 0.6% ( -6% - 7%) 0.567 BrowseRandomLabelTaxoFacets 11492.93 (5.0%) 11570.83 (4.8%) 0.7% ( -8% - 11%) 0.663 IntNRQ 93.40 (2.1%) 94.05 (2.4%) 0.7% ( -3% - 5%) 0.319 AndHighMed 175.02 (2.6%) 176.42 (3.9%) 0.8% ( -5% - 7%) 0.445 Fuzzy2 45.25 (7.2%) 45.64 (6.2%) 0.9% ( -11% - 15%) 0.682 AndHighLow 825.32 (6.8%) 833.43 (8.0%) 1.0% ( -12% - 16%) 0.677 MedTerm 1408.91 (6.2%) 1423.27 (10.2%) 1.0% ( -14% - 18%) 0.703 OrHighMed 136.68 (3.8%) 138.15 (3.6%) 1.1% ( -6% - 8%) 0.356 OrHighHigh 16.31 (3.4%) 16.49 (1.9%) 1.1% ( -4% - 6%) 0.205 BrowseDayOfYearTaxoFacets 11349.30 (4.4%) 11494.17 (4.6%) 1.3% ( -7% - 10%) 0.366 HighPhrase 83.13 (2.9%) 84.24 (3.4%) 1.3% ( -4% - 7%) 0.184 OrHighNotMed 630.30 (5.6%) 639.65 (6.4%) 1.5% ( -9% - 14%) 0.436 LowPhrase 310.17 (4.2%) 315.08 (5.4%) 1.6% ( -7% - 11%) 0.297 OrHighNotHigh 723.22 (5.0%) 734.71 (8.4%) 1.6% ( -11% - 15%) 0.468 BrowseMonthTaxoFacets 11665.05 (7.6%) 11892.66 (5.1%) 2.0% ( -9% - 15%) 0.339 OrHighNotLow 851.60 (6.5%) 869.16 (7.6%) 2.1% ( -11% - 17%) 0.355 OrNotHighMed 699.29 (5.2%) 717.74 (7.7%) 2.6% ( -9% - 16%) 0.205 OrNotHighLow 954.65 (6.4%) 982.93 (9.6%) 3.0% ( -12% - 20%) 0.252 LowTerm 2158.23 (9.1%) 2227.33 (13.4%) 3.2% ( -17% - 28%) 0.377
Attachments
Issue Links
- links to