Details
-
Documentation
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
2.1.0
-
None
Description
In OPENNLP-1257, a change was proposed to update the code to split the lemmatizer input by spaces instead of by tab. I believe tab is the desired delimiter but we need to make sure the documentation is consistent.
Refer to https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer , in particular the following sentences:
"The training data consist of three columns separated by spaces. Each word has been put on a separate line and there is an empty line after each sentence. The first column contains the current word, the second its part-of-speech tag and the third its lemma. Here is an example of the file format:"
Determine if that first line should read "separated by tabs" instead.
Attachments
Issue Links
- relates to
-
OPENNLP-1257 Splitting in Lemmatizer via tabs
- Closed