[SOLR-13013] Change export to extract DocValues in docID order - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 7.5, 8.0
Fix Version/s: None
Component/s: Export Writer
Labels:
None

Description

The streaming export writer uses a sliding window of 30,000 documents for paging through the result set in a given sort order. Each time a window has been calculated, the values for the export fields are retrieved from the underlying DocValues structures in document sort order and delivered.

The iterative DocValues API introduced in Lucene/Solr 7 does not support random access. The current export implementation bypasses this by creating a new DocValues-iterator for each individual value to retrieve. This slows down export as the iterator has to seek to the given docID from start for each value. The slowdown scales with shard size (see ~~LUCENE-8374~~ for details). An alternative is to extract the DocValues in docID-order, with re-use of DocValues-iterators. The idea is as follows:

Change the FieldWriters for export to re-use the DocValues-iterators if subsequent requests are for docIDs higher than the previous ones
Calculate the sliding window of SortDocs as usual
Take a note of the order of the SortDocs in the sliding window
Re-sort the SortDocs in docID-order
Extract the DocValues to a temporary on-heap structure
Re-sort the extracted values to the original sliding window order
Deliver the values

One big difference from the current export code is of course the need to hold the whole sliding window scaled result set in memory. This might well be a showstopper as there is no real limit to how large this partial result set can be. Maybe such an optimization could be requested explicitly if the user knows that there is enough memory?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-13013.patch
26/Nov/19 14:44
53 kB
Jason Gerlowski
SOLR-13013.patch
19/Nov/19 21:17
43 kB
Jason Gerlowski
SOLR-13013_proof_of_concept.patch
26/Nov/18 14:26
42 kB
Toke Eskildsen
SOLR-13013_proof_of_concept.patch
24/Nov/18 15:04
40 kB
Toke Eskildsen

Issue Links

is related to

SOLR-13024 ValueSourceAugmenter - avoid creating new FunctionValues per doc

Open

Activity

People

Assignee:: Unassigned

Reporter:: Toke Eskildsen

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 24/Nov/18 15:02

Updated:: 09/Dec/19 17:36