[LUCENE-644] Contrib: another highlighter approach - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.9
Component/s: modules/highlighter
Labels:
None

Description

Mark Harwoods highlighter package is a great contribution to Lucene, I've used it a lot! However, when you have large documents (fields), highlighting can be quite time consuming if you increase the number of bytes to analyze with setMaxDocBytesToAnalyze(int). The default value of 50k is often too low for indexed PDFs etcetera, which results in empty highlight strings.

This is an alternative approach using term position vectors only to build fragment info objects. Then a StringReader can read the relevant fragments and skip() between them. This is a lot faster. Also, this method uses the entire field for finding the best fragments so you're always guaranteed to get a highlight snippet.

Because this method only works with fields which have term positions stored one can check if this method works for a particular field using following code (taken from TokenSources.java):

TermFreqVector tfv = (TermFreqVector) reader.getTermFreqVector(docId, field);
if (tfv != null && tfv instanceof TermPositionVector)

{ // use FulltextHighlighter }

else

{ // use standard Highlighter }

Someone else might find this useful so I'm posting the code here.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

FulltextHighlighter.java
02/Aug/06 07:53
13 kB
Ronnie Kolehmainen
FulltextHighlighterTest.java
02/Aug/06 07:54
15 kB
Ronnie Kolehmainen
svn-diff.patch
02/Aug/06 07:55
30 kB
Ronnie Kolehmainen
TokenSources.java
03/Aug/06 17:43
17 kB
Ronnie Kolehmainen
TokenSources.java.diff
03/Aug/06 17:43
12 kB
Ronnie Kolehmainen
FulltextHighlighter.java
25/Aug/06 10:59
13 kB
Ronnie Kolehmainen
FulltextHighlighterTest.java
25/Aug/06 10:59
16 kB
Ronnie Kolehmainen
svn-diff.patch
25/Aug/06 10:59
31 kB
Ronnie Kolehmainen

Activity

People

Assignee:: Unassigned

Reporter:: Ronnie Kolehmainen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 02/Aug/06 07:52

Updated:: 28/Aug/22 11:29

Resolved:: 27/Jan/11 10:51