[LUCENE-6111] Add Chinese Word Segmentation Analyzer with Ansj implementation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 4.6
Fix Version/s: 4.6
Component/s: modules/analysis
Labels:
- Analyzer
- Ansj

Lucene Fields:

New, Patch Available

Description

When I use mahout-0.9 depending on lucene-4.6 to run Kmeans clustering algorithm, I find that the default word segmentation analyzer class named 'org.apache.lucene.analysis.standard.StandardAnalyzer' is very ugly, only single word could be splitted.However, ansj Chinese word segmentation tool is widely used in Chinese document-tokenizer, and I am willing to add it to support lucene.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: deyinchen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Dec/14 09:11

Updated:: 28/Aug/22 14:21

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified