Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG schema accepts too many non-JPEG data files #2

Open
mbeckerle opened this issue Mar 9, 2018 · 1 comment
Open

JPEG schema accepts too many non-JPEG data files #2

mbeckerle opened this issue Mar 9, 2018 · 1 comment

Comments

@mbeckerle
Copy link
Member

The JPEG DFDL schema has the problem of being much too permissive. Just blobs of binary data can often be accepted. The schema (to date) just identifies whether the file is any collection of JPEG segments. Alas one segment type is effectively just a datablob, so many datablobs will be accepted.

To overcome this, additional constraint-checking is needed. This can be expressed using DFDL's dfdl:assert statements in the DFDL schema. There are two there already which enforce the first segment being a SOI segment (start of image), and the last being EOI (end of image); however, a blob of bytes between SOI and EOI would be accepted when it is clearly NOT a jpeg image.

In some cases the constraint rules will require more expressive power than this - where true XPath query capability is required.

The Schematron rule language could be used.
See also https://issues.apache.org/jira/browse/DAFFODIL-1807 - for schematron - in case it proves to be needed.

Note that this is not "validation" of the data, it is using what we normally think of as a validation language, but using it for checking if the data is well-formed.

@mbeckerle
Copy link
Member Author

It is very possible users will want to run schematron rules, but NOT run xerces validation.

The schematron rules might not be validation, but might be enforcing "well formedness" constraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant