[OAK-2730] Faster result count estimation for QueryResult on lines of resultFetchSize support in JR2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: query
Labels:
None

Description

Currently in Oak while fetching the result size for a query the time taken is proportional to the result size. This would not perform well when result size is big. The complete traversal is required to perform ACL check to ensure that result count is accurate

JR2 used to support resultFetchSize (default to integer max). This was used to get an estimate of possible result count whereby the count might not be accurate.

Per mreutegg this feature worked like below

If resultFetchSize is set to 50 then QueryEngine will initially collect up to 50 nodes the current session is allowed to read from the raw lucene result set. While doing that, it counts the number of nodes denied by access control checks. The result size reported is then calculated as: raw-lucene-result-size - number-of-nodes-denied. The resultFetchSize is double and the query executed again if a client iterates passed the currently available nodes. If it is required to have an exact result size, then the configuration for 'resultFetchSize' can be increased to a much higher value. However, this has a severe performance impact for large result sets, because the query will now have to apply access control checks for the complete result set

Attachments

Issue Links

is related to

OAK-2926 Fast result size estimate

Closed

relates to

OAK-2807 Improve getSize performance for "public" content

Resolved

OAK-300 Query: QueryResult.getRows().getSize()

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Chetan Mehrotra

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Apr/15 06:49

Updated:: 08/Oct/19 15:21

Resolved:: 09/Apr/15 10:17