[LUCENE-5317] Concordance/Key Word In Context (KWIC) capability - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.5
Fix Version/s: None
Component/s: core/search
Labels:
- patch

Lucene Fields:

New, Patch Available

Description

This patch enables a Lucene-powered concordance search capability.

Concordances are extremely useful for linguists, lawyers and other analysts performing analytic search vs. traditional snippeting/document retrieval tasks. By "analytic search," I mean that the user wants to browse every time a term appears (or at least the topn) in a subset of documents and see the words before and after.

Concordance technology is far simpler and less interesting than IR relevance models/methods, but it can be extremely useful for some use cases.

Traditional concordance sort orders are available (sort on words before the target, words after, target then words before and target then words after).

Under the hood, this is running SpanQuery's getSpans() and reanalyzing to obtain character offsets. There is plenty of room for optimizations and refactoring.

Many thanks to my colleague, Jason Robinson, for input on the design of this patch.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

concordance_v1.patch.gz
30/Oct/13 19:08
19 kB
Tim Allison
LUCENE-5317.patch
19/Nov/14 19:38
176 kB
Steven Rowe
LUCENE-5317.patch
17/Oct/14 02:11
135 kB
Steven Rowe
lucene5317v1.patch
19/Nov/14 03:09
175 kB
Tim Allison
lucene5317v2.patch
19/Nov/14 19:29
175 kB
Tim Allison

Issue Links

is depended upon by

LUCENE-5318 Co-occurrence counts from Concordance

Open

SOLR-5411 Keyword in Context Search / Concordance Search: Solr wrapper for the code in LUCENE-5317

Open

SOLR-5412 TermVariants from fuzzy and/or span search

Open

links to

GitHub Pull Request #82

give a try

(1 links to)

Activity

People

Assignee:: Tommaso Teofili

Reporter:: Tim Allison

Votes:: 4 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 30/Oct/13 19:08

Updated:: 28/Aug/22 13:56

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h