Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-743

The chunker training data format is incorrectly/insufficiently described.

    XMLWordPrintableJSON

Details

    Description

      The chunker training data format is described as follows: The train data consist of three columns separated by spaces. Each word has been put on a separate line and there is an empty line after each sentence. However, in the example, several spaces are between tokens and tag. First, it looks like tabs (which are not allowed), second several spaces are not allowed as well (apparently, the line String is splitted(" ")). Suggestion: emphasize that columns are separated by one space and tabs are not allowed.

      Attachments

        Issue Links

          Activity

            People

              colen William Colen
              popelucha Zuzana Neverilova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 10m
                  10m
                  Remaining:
                  Remaining Estimate - 10m
                  10m
                  Logged:
                  Time Spent - Not Specified
                  Not Specified