Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
When merging segments together, the HNSW writer creates a VectorValues instance that gives a merged view of all the segments' VectorValues. This merged instance is used when constructing the new HNSW graph. Graph building needs random access, and the merged VectorValues support this by mapping from merged ordinals -> segments and segment ordinals.
This mapping seems to add overhead. The nightly indexing benchmarks sometimes show substantial time in Arrays.binarySearch (used to map an ordinal to a segment): https://blunders.io/jfr-demo/indexing-1kb-vectors-2022.01.09.18.03.19/top_down_cpu_samples.
Instead of using a merged VectorValues to create the graph, maybe we could first write all the segment vectors to a file, and use that file to build the graph.