[CTAKES-145] inconsistent handling of upper ascii - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: future enhancement
Fix Version/s: None
Component/s: ctakes-preprocessor
Labels:
None

Description

Currently cTAKES handles character above ascii 127 different depending on if using a pipeline that processes CDA (Clinical document architecture XML) or a pipeline that expects plain text.

The CDA pipelines, as an early step, create a plaintext view that has each upper ascii characters replaced by a blank.

The plaintext pipelines do not do anything special for upper ascii characters.

Example input text for plaintext, to show this behavior:
His name is Gërman. Temp is 98 °C taken on the forehead

Need to decide if it is OK for this inconsistent behavior or if we should change one or the other to make them consistent.

See ClinicalNotePreProcessor.java

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: James Joseph Masanz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Feb/13 16:35

Updated:: 05/Feb/13 20:17