[SOLR-16942] Improve knn explain output - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- knn
- vector-based-search

Description

The following is explain output for a query involving both reRank and a {!knn} query:
1.4137135 = combined unscaled first and scaled second pass score
0.9137135 = first pass score
0.9137135 = sum of:
0.0039847707 = sum of:
0.0039847707 = max of:
0.0014896907 = weight(description_t:miles in 113) [SchemaSimilarity], result of:
0.0014896907 = score(freq=2.0), computed as boost * idf * tf from:
0.001 = boost
2.0111222 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
26 = n, number of documents containing term
197 = N, total number of documents with field
0.740726 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
2.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
21.0 = dl, length of field
47.243656 = avgdl, average length of field
0.0039847707 = weight(title_t:miles in 113) [SchemaSimilarity], result of:
0.0039847707 = score(freq=2.0), computed as boost * idf * tf from:
0.002 = boost
2.84592 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
11 = n, number of documents containing term
197 = N, total number of documents with field
0.7000848 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
2.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
7.0 = dl, length of field
11.314721 = avgdl, average length of field
0.90972877 = within top 100
1.0 = second pass score scaled between:0-1
3.9847708 = second pass score
3.9847708 = sum of:
3.9847708 = max of:
1.4896905 = weight(description_t:miles in 113) [SchemaSimilarity], result of:
1.4896905 = score(freq=2.0), computed as boost * idf * tf from:
2.0111222 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
26 = n, number of documents containing term
197 = N, total number of documents with field
0.740726 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
2.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
21.0 = dl, length of field
47.243656 = avgdl, average length of field
3.9847708 = weight(title_t:miles in 113) [SchemaSimilarity], result of:
3.9847708 = score(freq=2.0), computed as boost * idf * tf from:
2.0 = boost
2.84592 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
11 = n, number of documents containing term
197 = N, total number of documents with field
0.7000848 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
2.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
7.0 = dl, length of field
11.314721 = avgdl, average length of field
0.8636209 = min second pass score
3.9847708 = max sceond pass score
0.5 = rerank weight

Note the detail in the reRank explain, compared to the knn part having one entry:
0.90972877 = within top 100

(And we only know that as a result of doing a knn-only query).

Perhaps it doesn't need to be (and can't be) as detailed as the above, it should at least include:

topK
dimensions
scoring method - dot product, cosine similarity, etc.
maybe some insights into the HNSW tree walk?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Marc Byrd

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Aug/23 00:56

Updated:: 15/Oct/23 20:13