[OAK-1614] Oak Analyzer can't tokenize chinese phrases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.19
Fix Version/s: 0.20
Component/s: None
Labels:
None

Description

It looks like the WhitespaceTokenizer cannot properly split Chinese phrases, for example '美女衬衫'.
I could not find a reference to this issue other than ~~LUCENE-5096~~.

The fix is to switch to the ClassicTokenizer which seems better equipped for this kind of task.

Attachments

Activity

People

Assignee:: Alex Deparvu

Reporter:: Alex Deparvu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Mar/14 21:34

Updated:: 08/Apr/14 12:47

Resolved:: 26/Mar/14 09:24