[LUCENE-5029] factor out a generic 'TermState' for better sharing in FST-based term dict - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.4
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Currently, those two FST-based term dict (memory codec & blocktree) all use FST<BytesRef> as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix.

While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-5029.patch
01/Jun/13 17:54
8 kB
Han Jiang
LUCENE-5029.patch
13/Jun/13 12:18
120 kB
Han Jiang
LUCENE-5029.algebra.patch
14/Jun/13 20:04
40 kB
Han Jiang
LUCENE-5029.patch
14/Jun/13 20:04
283 kB
Han Jiang
LUCENE-5029.patch
15/Jun/13 09:52
284 kB
Han Jiang
LUCENE-5029.algebra.patch
15/Jun/13 09:57
56 kB
Han Jiang
LUCENE-5029.branch-init.patch
15/Jun/13 10:35
281 kB
Han Jiang
LUCENE-5029.patch
16/Jun/13 10:01
23 kB
Han Jiang

Activity

People

Assignee:: Han Jiang

Reporter:: Han Jiang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Jun/13 17:50

Updated:: 28/Aug/22 13:47

Resolved:: 16/Jun/13 14:48