Details
-
Task
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
future enhancement
-
None
-
None
Description
Currently cTAKES handles character above ascii 127 different depending on if using a pipeline that processes CDA (Clinical document architecture XML) or a pipeline that expects plain text.
The CDA pipelines, as an early step, create a plaintext view that has each upper ascii characters replaced by a blank.
The plaintext pipelines do not do anything special for upper ascii characters.
Example input text for plaintext, to show this behavior:
His name is Gërman. Temp is 98 °C taken on the forehead
Need to decide if it is OK for this inconsistent behavior or if we should change one or the other to make them consistent.
See ClinicalNotePreProcessor.java