You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{ and } are not PDF delimiter tokens except within Type 4 PostScript functions (i.e. they are PS delimiters only) so using them elsewhere is incorrect. This was a long-standing error in PDF specifications.
PDF-hul header check is for %PDF-1 but spec says it is %PDF- followed by any digit (0-9), . and another `digit so PDF 2.0 files should report as a PDF file, but with an unsupported PDF version until such time as you support PDF 2.0. JHOVE currently reports PDF 2.0 files as a bytestream which is incorrect. See here
PDF-hul crashes if a PDF hex-string contains EOL characters - this is permitted by the PDF spec as whitespace can occur in hex-strings and the EOLs are considered whitespace. (For what it is worth, hex-strings and literal strings are the only 2 types of PDF tokens or keywords that can span multiple lines).
there seem to be assumptions with PDF-hul-xx error codes that a key with an explicit null value is invalid whereas the PDF spec states that such keys should be ignored (same as not present). An easy test is to set /Annots null on any page and compare behaviour to not having an /Annots entry present.
Java exception gets thrown if cross-reference sub-section marker lines (of 2 integers) start with a negative number (i.e. for the object number).
FileSpecification.java does not account for the UF entry added with PDF 1.7. This was noticed from a code review.
there is something strange going on when encountering empty names (i.e. just a '/' followed by nothing, which is a valid PDF name). PDump correctly lists as a Name object with empty string "", but if 2 empty names are appended to a trailer dictionary (i.e. a valid key/value dictionary entry) then JHOVE doesn't work properly...
please consider adding support for UTF-8 text strings introduced with PDF 2.0. This was noted from a code review. Also note that UTF-8 strings do occur in some pre-PDF 2.0 files...
The text was updated successfully, but these errors were encountered:
Some issues noted about parsing PDFs:
{
and}
are not PDF delimiter tokens except within Type 4 PostScript functions (i.e. they are PS delimiters only) so using them elsewhere is incorrect. This was a long-standing error in PDF specifications.PDF-hul header check is for
%PDF-1
but spec says it is%PDF-
followed by any digit (0
-9
),.
and another `digit so PDF 2.0 files should report as a PDF file, but with an unsupported PDF version until such time as you support PDF 2.0. JHOVE currently reports PDF 2.0 files as a bytestream which is incorrect. See herePDF-hul crashes if a PDF hex-string contains EOL characters - this is permitted by the PDF spec as whitespace can occur in hex-strings and the EOLs are considered whitespace. (For what it is worth, hex-strings and literal strings are the only 2 types of PDF tokens or keywords that can span multiple lines).
there seem to be assumptions with PDF-hul-xx error codes that a key with an explicit null value is invalid whereas the PDF spec states that such keys should be ignored (same as not present). An easy test is to set
/Annots null
on any page and compare behaviour to not having an/Annots
entry present.Java exception gets thrown if cross-reference sub-section marker lines (of 2 integers) start with a negative number (i.e. for the object number).
FileSpecification.java does not account for the UF entry added with PDF 1.7. This was noticed from a code review.
there is something strange going on when encountering empty names (i.e. just a '/' followed by nothing, which is a valid PDF name). PDump correctly lists as a Name object with empty string
""
, but if 2 empty names are appended to a trailer dictionary (i.e. a valid key/value dictionary entry) then JHOVE doesn't work properly...please consider adding support for UTF-8 text strings introduced with PDF 2.0. This was noted from a code review. Also note that UTF-8 strings do occur in some pre-PDF 2.0 files...
The text was updated successfully, but these errors were encountered: