Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.19
-
None
-
None
Description
It looks like the WhitespaceTokenizer cannot properly split Chinese phrases, for example '美女衬衫'.
I could not find a reference to this issue other than LUCENE-5096.
The fix is to switch to the ClassicTokenizer which seems better equipped for this kind of task.