Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification of use of remote resources #869

Closed
mattgarrish opened this issue Nov 12, 2018 · 14 comments
Closed

Verification of use of remote resources #869

mattgarrish opened this issue Nov 12, 2018 · 14 comments
Assignees
Labels
spec: EPUB 3.2 Impacting the support of EPUB 3.2 status: completed Work completed, can be closed
Milestone

Comments

@mattgarrish
Copy link
Member

Scripts are allowed to reference remote resources, so whether a remote foreign resource listed in the manifest is in actual use or not may be complicated to check. It might make sense to only issue an informative message that no reference could be found, or emit nothing at all, if all resources are checked to see if they are in use.

Reference: https://w3c.github.io/publ-epub-revision/epub32/spec/epub-spec.html#sec-resource-locations-script

@mattgarrish mattgarrish added spec: EPUB 3.2 Impacting the support of EPUB 3.2 status: accepted Ready to be further processed labels Nov 12, 2018
@rdeltour rdeltour added status: blocked Blocked by another issue or situation and removed status: accepted Ready to be further processed labels Feb 25, 2019
@rdeltour rdeltour added this to the 4.2.0-RC milestone Feb 25, 2019
@rdeltour rdeltour added status: accepted Ready to be further processed and removed status: blocked Blocked by another issue or situation labels Feb 25, 2019
@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

So the spec says:

Resources retrieved by scripts MAY be located outside the EPUB Container.

My understanding is that it only applies to foreign resources, correct? Maybe this could be clarified in the spec?

@mattgarrish
Copy link
Member Author

My understanding is that it only applies to foreign resources, correct?

No, a script can grab whatever type of resource it needs. They can be core media types or foreign resources.

@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

Oh OK. So a remote CMT would be allowed if not referenced directly (e.g. remote images allowed if not in an img element).

@mattgarrish
Copy link
Member Author

Right, if a script can grab a resource by an xhr call or fetch, it's all legitimate. The additional requirement that resources used by scripts in the rendering have to be CMTs or provide an adequate fallback was intended to ensure rendering regardless of which type of resource is retrieved, even though it's not really testable.

But scripting generally undermines all content restrictions, as you can write and rewrite just about anything after the fact, including what the src of an img is. You just do it at your own risk since the rules exist to ensure rendering consistency.

In theory, all these resources have to be listed in the manifest and the remote-resources flag applied to their content document, but given that we can't test what scripts actually do, you could even ignore doing any of that and still have a legit epub.

Sorry, that was probably more rambling than you wanted... :)

@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

scripting generally undermines all content restrictions

Yeah, exactly… which kinda makes me wonder why we have any restrictions on remote resources. I could use a remote image, as long as it is somehow added by a script. What's the point of disallowing using remote images then? And not even images, your EPUB could actually consist of empty-shell XHTML content documents downloading remote XHTML documents.

@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

To be clear: I thought this new exception was intended for scripts to be able to download external data resources, to be later rendered via dynamically-generated HTML or SVG. I wasn't aware it was intended to allow directly-rendered remote content (mostly images or HTML).

Is there really a big diff between:

<body>
  <img id="img" src="https://www.w3.org/2008/site/images/logo-w3c" alt="w3c"/>
</body>

and

<body onload="document.getElementById('img').setAttribute('src','https://www.w3.org/2008/site/images/logo-w3c')">
    <img id="img" src="#" alt="w3c"/>
</body>

That the former is non-conforming and the latter conforming feels like an arbitrary restriction.

@mattgarrish
Copy link
Member Author

I thought this new exception was intended for scripts to be able to download external data resources, to be later rendered via dynamically-generated HTML or SVG.

Right.

I wasn't aware it was intended to allow directly-rendered remote content (mostly images or HTML).

It's not, but that's where I kind of went off on a rant. It's trivial to work around everything EPUB recommends or requires with a bit of script. It's only the general lack of support for network access that works as a deterrent.

@mattgarrish
Copy link
Member Author

That the former is non-conforming and the latter conforming feels like an arbitrary restriction.

The latter isn't conforming, but it's not easily detectable so how do we prevent it?

You can't reference remote scripts, but what stops you from fetching a script and writing it into the dom? You don't even have to declare what you're accessing is a script. Call it a text resource.

But let's not go down this rabbit hole...

@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

The latter isn't conforming,

I'm not sure I understand: why is it not conforming with the current wording?

but it's not easily detectable so how do we prevent it?

If we want to be more restrictive we could for instance forbid referencing of remote text/html, img/*, and all CMTs (except videos and fonts).

@mattgarrish
Copy link
Member Author

I'm not sure I understand: why is it not conforming with the current wording?

The script rewrites the src attribute, but is not involved in retrieving the resource.

@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

The script rewrites the src attribute, but is not involved in retrieving the resource.

ok, right.

@mattgarrish
Copy link
Member Author

If we want to be more restrictive we could for instance forbid referencing of remote text/html, img/*, and all CMTs (except videos and fonts).

Could you do this without executing the script, though? If you don't know what the script is doing, and I don't declare it in the manifest, then the epub is valid.

I think we've opened an exploit here, for what was proposed as needed for remote data sets, but it is what it is. We really need a more detailed JS model for EPUB, but we can't solve that here. Reading systems can prevent remote access calls, so that's ultimately what will determine how usable this is. History suggests significant interop problems.

@mattgarrish
Copy link
Member Author

mattgarrish commented Mar 1, 2019

And it's not all new to this version. You could always try to rewrite src attributes or use xhr to pull files.

I just wonder sometimes what the point of the content restrictions are when they only exist until run time.

@rdeltour
Copy link
Member

rdeltour commented Mar 1, 2019

Could you do this without executing the script, though?

Well, only if the resources are also declared in the OPF. It's true that if they're not even declared, there's absolutely nothing EPUBCheck can do (as was the case before).

I think we've opened an exploit here

Yeah, this exploit have always kinda been there, as it was never (and probably never will be) reported by EPUBCheck.

Still, the current spec wording doesn't really express the intent I think. But that's probably not an issue for this tracker :-)

rdeltour added a commit that referenced this issue Mar 7, 2019
Message changes:

- message `OPF-018` (`WARNING`) is only reported when a Content Doc is
  declared in the Package Document with the property `remote-resource`,
  and neither remote references nor scripts are found.
- new message `OPF-018b` (`USAGE`) reported when a Content Doc is
  declared in the Package Document with the property `remote-resource`,
  and no remote references are found but the Content Doc has scripted
  content.
- message `RSC-006` (`ERROR`) is only reported when a remote publication
  resource is declared in the Package Document with a type that doesn't
  specifically allow it to be located outside the container, and when no
  scripted content is found.
- new message `RSC-006b` (`USAGE`) is reported when a remote publication
  resource is declared in the Package Document with a type that doesn't
  specifically allow it to be located outside the container, and
  scripted content is found.

Internal changes:

- new utility static method `OPFChecker.isScriptType(String)`
  tells if a media type is a JavaScript MIME type essence match
  (see https://mimesniff.spec.whatwg.org/#javascript-mime-type-essence-match)
- better identify scripted content: from event handler attributes, or
  script element with a type matching a JavaScript MIME type essence.
- add a couple tests and fix existing tests

Fix #869.
@rdeltour rdeltour added status: has PR The issue is being processed in a pull request and removed status: accepted Ready to be further processed labels Mar 7, 2019
rdeltour added a commit that referenced this issue Mar 10, 2019
Message changes:

- message `OPF-018` (`WARNING`) is only reported when a Content Doc is
  declared in the Package Document with the property `remote-resource`,
  and neither remote references nor scripts are found.
- new message `OPF-018b` (`USAGE`) reported when a Content Doc is
  declared in the Package Document with the property `remote-resource`,
  and no remote references are found but the Content Doc has scripted
  content.
- message `RSC-006` (`ERROR`) is only reported when a remote publication
  resource is declared in the Package Document with a type that doesn't
  specifically allow it to be located outside the container, and when it
  is directly referenced from a Content Document, or used in the spine,
  or else if there is no scripted content in the publication.
- new message `RSC-006b` (`USAGE`) is reported when a remote publication
  resource is declared in the Package Document with a type that doesn't
  specifically allow it to be located outside the container, and
  scripted content is found.

Internal changes:

- new utility static method `OPFChecker.isScriptType(String)`
  tells if a media type is a JavaScript MIME type essence match
  (see https://mimesniff.spec.whatwg.org/#javascript-mime-type-essence-match)
- better identify scripted content: from event handler attributes, or
  script element with a type matching a JavaScript MIME type essence.
- add a couple tests and fix existing tests

Fix #869.

fixup: better check remote resources
rdeltour added a commit that referenced this issue Mar 10, 2019
Message changes:

- message `OPF-018` (`WARNING`) is only reported when a Content Doc is
  declared in the Package Document with the property `remote-resource`,
  and neither remote references nor scripts are found.
- new message `OPF-018b` (`USAGE`) reported when a Content Doc is
  declared in the Package Document with the property `remote-resource`,
  and no remote references are found but the Content Doc has scripted
  content.
- message `RSC-006` (`ERROR`) is only reported when a remote publication
  resource is declared in the Package Document with a type that doesn't
  specifically allow it to be located outside the container, and when it
  is directly referenced from a Content Document, or used in the spine,
  or else if there is no scripted content in the publication.
- new message `RSC-006b` (`USAGE`) is reported when a remote publication
  resource is declared in the Package Document with a type that doesn't
  specifically allow it to be located outside the container, and
  scripted content is found.

Internal changes:

- new utility static method `OPFChecker.isScriptType(String)`
  tells if a media type is a JavaScript MIME type essence match
  (see https://mimesniff.spec.whatwg.org/#javascript-mime-type-essence-match)
- better identify scripted content: from event handler attributes, or
  script element with a type matching a JavaScript MIME type essence.
- add a couple tests and fix existing tests

Fix #869.

fixup: better check remote resources
@rdeltour rdeltour self-assigned this Mar 10, 2019
@rdeltour rdeltour added status: completed Work completed, can be closed and removed status: has PR The issue is being processed in a pull request labels Mar 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec: EPUB 3.2 Impacting the support of EPUB 3.2 status: completed Work completed, can be closed
Projects
None yet
Development

No branches or pull requests

2 participants