Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
4.6
-
New, Patch Available
Description
When I use mahout-0.9 depending on lucene-4.6 to run Kmeans clustering algorithm, I find that the default word segmentation analyzer class named 'org.apache.lucene.analysis.standard.StandardAnalyzer' is very ugly, only single word could be splitted.However, ansj Chinese word segmentation tool is widely used in Chinese document-tokenizer, and I am willing to add it to support lucene.