Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location of checks is always empty #140

Open
BennyAlex opened this issue Apr 25, 2023 · 4 comments
Open

Location of checks is always empty #140

BennyAlex opened this issue Apr 25, 2023 · 4 comments

Comments

@BennyAlex
Copy link

hello,

When recieving the check informations, the location property is always null.
image

It would be nice to have it available again.

@bdoubrov
Copy link
Contributor

The error contains two properties that are used to identify the object causing the problem:

  • location: bbox or list of bboxes, if it was computed during validation (not always available)
  • context: the path to the PDF object in the model tree of the document (always available)

Typically location is null for Machine checks of PDF/UA-1, and is not null for additional human WCAG checks.

It is assumed that the PDF viewer would be able to compute the location bbox from the context, if the location property is null. A sample implementation of such logic is available in the PDF viewer module based on pdf.js: https://github.com/veraPDF/verapdf-js-viewer

@BennyAlex
Copy link
Author

@bdoubrov Yeah Thanks you.

Unfortunatelly the js-viewer is causing many problems, since its relaying on a beta version of react-pdf. This version is not compatible with vite, which we are using for our project.
I tried upgrading react pdf, but the js-viewer is using an internal prop _pdfInfo.structureTree which is not available any more in newer versions of react-pdf.
So I am trapped and I there is no way of using the js-viewer, unfortunatelly.

So I thought having the bbox infos it is easy to show them by myself.

@BennyAlex
Copy link
Author

BennyAlex commented Apr 27, 2023

@bdoubrov
Another problem I am facing is the inconsistent context paths.
Somethimes its something like this:

"root/doc[0]/StructTreeRoot[0]/children[0](173 0 obj Sect Sect)/children[11](1331 0 obj P P)"

other times it's like:
"root/document[0]/pages[0](74 0 obj PDPage)/contentStream[0]/content[0]/contentItem[0]"

The second example its easy to get the page number and the index of an item.

In my opinion, the report should output always the same unified format.

@bdoubrov
Copy link
Contributor

If this helps, we plan to upgrade the viewer to the latest stable version of react-js (6.2).

The context info is consistent within the veraPDF validation model, that is it shows the path to the object in question in the graph of all veraPDF objects created by PDF parser.

For example, two different formats i your example come from two different parent objects in the model:

  • document[0] means PDDocument object that corresponds to the PDF Catalog dictionary, as used by PDF/UA-1 (Machine) checks
  • doc[0] means SADocument object that corresponds to the semantics of PDF document, as used by WCAG checks

The server-side calculation of the bounding boxes for PDF/UA-1 objects is not implemented (yet). But it is on our radar for the future development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants