Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-330

Better HWP (Hangul Word Processor) detection pattern

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.6
    • mime
    • None

    Description

      The current magic byte pattern we have for the HWP (Hangul Word Processor, application/x-hwp) file format matches also the test-outlook.msg test file we have. I looked for a better detection pattern and found one from OpenOffice.org.

      The hwpfilter/source/hwpfile.cpp file suggests that all HWP files start with the signature string "HWP Document File V", so I'll change the detection pattern accordingly.

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            jukkaz Jukka Zitting
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: