Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1363

Verify the documentation of the lemmatizer input format

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.1.0
    • 2.1.1
    • Documentation
    • None

    Description

      In OPENNLP-1257, a change was proposed to update the code to split the lemmatizer input by spaces instead of by tab. I believe tab is the desired delimiter but we need to make sure the documentation is consistent.

      Refer to https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer , in particular the following sentences:

      "The training data consist of three columns separated by spaces. Each word has been put on a separate line and there is an empty line after each sentence. The first column contains the current word, the second its part-of-speech tag and the third its lemma. Here is an example of the file format:"

      Determine if that first line should read "separated by tabs" instead.

       

      Attachments

        Issue Links

          Activity

            People

              aarora Atita Arora
              jzemerick Jeff Zemerick
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: