Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.5
-
None
-
New, Patch Available
Description
This patch enables a Lucene-powered concordance search capability.
Concordances are extremely useful for linguists, lawyers and other analysts performing analytic search vs. traditional snippeting/document retrieval tasks. By "analytic search," I mean that the user wants to browse every time a term appears (or at least the topn) in a subset of documents and see the words before and after.
Concordance technology is far simpler and less interesting than IR relevance models/methods, but it can be extremely useful for some use cases.
Traditional concordance sort orders are available (sort on words before the target, words after, target then words before and target then words after).
Under the hood, this is running SpanQuery's getSpans() and reanalyzing to obtain character offsets. There is plenty of room for optimizations and refactoring.
Many thanks to my colleague, Jason Robinson, for input on the design of this patch.
Attachments
Attachments
Issue Links
- is depended upon by
-
LUCENE-5318 Co-occurrence counts from Concordance
- Open
-
SOLR-5411 Keyword in Context Search / Concordance Search: Solr wrapper for the code in LUCENE-5317
- Open
-
SOLR-5412 TermVariants from fuzzy and/or span search
- Open
- links to