Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-1808

JPEG schema accepts too many non-JPEG data files

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • DFDL Schemas
    • None

    Description

      The JPEG DFDL schema has the problem of being much too permissive. Just blobs of binary data can often be accepted. The schema (to date) just identifies whether the file is any collection of JPEG segments. Alas one segment type is effectively just a datablob, so many datablobs will be accepted.

      To overcome this, additional constraint-checking is needed. This can be expressed using DFDL's dfdl:assert statements in the DFDL schema. There are two there already which enforce the first segment being a SOI segment (start of image), and the last being EOI (end of image); however, a blob of bytes between SOI and EOI would be accepted when it is clearly NOT a jpeg image.

      In some cases the constraint rules will require more expressive power than this - where true XPath query capability is required.

      The Schematron rule language could be used. See also DFDL-1807 - for schematron - in case it proves to be needed.

      Note that this is not "validation" of the data, it is using what we normally think of as a validation language, but using it for checking if the data is well-formed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: