Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
New
Description
Currently when merging segments, the HNSW vectors format rebuilds the entire graph from scratch. In general, building these graphs is very expensive, and it'd be nice to optimize it in any way we can. I was wondering if during merge, we could choose the largest segment with no deletes, and load its HNSW graph into heap. Then we'd add vectors from the other segments to this graph, through the normal build process. This could cut down on the number of operations we need to perform when building the graph.
This is just an early idea, I haven't run experiments to see if it would help. I'd guess that whether it helps would also depend on details of the MergePolicy.