[OPENNLP-1363] Verify the documentation of the lemmatizer input format - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.1.1
Component/s: Documentation
Labels:
None

Description

In ~~OPENNLP-1257~~, a change was proposed to update the code to split the lemmatizer input by spaces instead of by tab. I believe tab is the desired delimiter but we need to make sure the documentation is consistent.

Refer to https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer , in particular the following sentences:

"The training data consist of three columns separated by spaces. Each word has been put on a separate line and there is an empty line after each sentence. The first column contains the current word, the second its part-of-speech tag and the third its lemma. Here is an example of the file format:"

Determine if that first line should read "separated by tabs" instead.

Attachments

Issue Links

relates to

OPENNLP-1257 Splitting in Lemmatizer via tabs

Closed

Activity

People

Assignee:: Atita Arora

Reporter:: Jeff Zemerick

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Mar/22 13:17

Updated:: 09/Jan/23 18:57

Resolved:: 20/Dec/22 07:19