Uploaded image for project: 'cTAKES'
  1. cTAKES
  2. CTAKES-145

inconsistent handling of upper ascii

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • future enhancement
    • None
    • ctakes-preprocessor
    • None

    Description

      Currently cTAKES handles character above ascii 127 different depending on if using a pipeline that processes CDA (Clinical document architecture XML) or a pipeline that expects plain text.

      The CDA pipelines, as an early step, create a plaintext view that has each upper ascii characters replaced by a blank.

      The plaintext pipelines do not do anything special for upper ascii characters.

      Example input text for plaintext, to show this behavior:
      His name is Gërman. Temp is 98 °C taken on the forehead

      Need to decide if it is OK for this inconsistent behavior or if we should change one or the other to make them consistent.

      See ClinicalNotePreProcessor.java

      Attachments

        Activity

          People

            Unassigned Unassigned
            james-masanz James Joseph Masanz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: