Skip to content

Commit

Permalink
fix: do not treat escaped <a> elements as hyperlinks in HTM-053
Browse files Browse the repository at this point in the history
Fix the regex used to report "file:" hyperlinks as `HTM-053` (informative)
to only consider HTML elements and not plain text.

This regex-based parsing is still brittle, but we'll refactor this whole
package later. For now this simple fix will do.

Fixes #1182
  • Loading branch information
rdeltour committed Feb 26, 2021
1 parent 5ee72e7 commit 5949b6c
Show file tree
Hide file tree
Showing 7 changed files with 56 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/main/java/com/adobe/epubcheck/ctc/FileLinkSearch.java
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
* ========================================================<br/>
*/
public class FileLinkSearch extends TextSearch {
private static final Pattern fileLinkPattern = Pattern.compile("href=[\"']file://");
private static final Pattern fileLinkPattern = Pattern.compile("<a\\s([^<>]*\\s)?href=[\"']file://");

public FileLinkSearch(EPUBVersion version, ZipFile zip, Report report)
{
Expand Down
6 changes: 6 additions & 0 deletions src/test/resources/epub3/content-publication.feature
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@ Feature: EPUB 3 ▸ Content Documents ▸ Full Publication Checks
When checking EPUB 'content-xhtml-link-to-local-file-valid'
Then info HTM-053 is reported
And no errors or warnings are reported

Scenario: Do not report escaped hyperlinks to resources in the local file system
See issue #1182
When checking EPUB 'content-xhtml-link-to-local-file-escaped-valid'
Then info HTM-053 is reported 0 times
And no errors or warnings are reported

Scenario: Report a hyperlink to a resource missing from the publication
When checking EPUB 'content-xhtml-link-to-missing-doc-error'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="utf-8"/>
<title>Minimal EPUB</title>
</head>
<body>
<h1>Loomings</h1>
<p>Call me Ishmael.</p>
&lt;a class="external" href="file:///C:/path/file.pdf"&gt;link to local file&lt;/a&lt;
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
<head>
<meta charset="utf-8"/>
<title>Minimal Nav</title>
</head>
<body>
<nav epub:type="toc">
<ol>
<li><a href="content_001.xhtml">content 001</a></li>
</ol>
</nav>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" xml:lang="en" unique-identifier="q">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title id="title">Minimal EPUB 3.0</dc:title>
<dc:language>en</dc:language>
<dc:identifier id="q">NOID</dc:identifier>
<meta property="dcterms:modified">2017-06-14T00:00:01Z</meta>
</metadata>
<manifest>
<item id="content_001" href="content_001.xhtml" media-type="application/xhtml+xml"/>
<item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
</manifest>
<spine>
<itemref idref="content_001" />
</spine>
</package>
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8" ?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="EPUB/package.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
application/epub+zip

0 comments on commit 5949b6c

Please sign in to comment.