Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
tools-1.5.2-incubating
-
All
Description
The input and output encodings are not working correctly or are not properly handled. A good example is the CoNLL 2002 data if correctly encoded in UTF-8 does not correctly work for training without specifying -Dfile.encoding=UTF-8 for the Java Command.
We already specify the input and expected output encoding on the cmdline interface with the -encoding paramter. For some reason this isn't being followed.
I'll work on fixing this for the next major release...