Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Add test suite from html5lib #5794

Closed
wants to merge 70 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
e2468d2
Add test cases from html5lib-tests tree-construction
sirreal Apr 5, 2024
491acc1
Add html5lib test class
sirreal Dec 18, 2023
37ff9f8
Skip unhandled tests
sirreal Dec 18, 2023
70088f1
Avoid WPCS lint nags; skip tests for unsupported input or fragment co…
dmsnell Dec 18, 2023
01dd880
Add line number to test case label
dmsnell Dec 18, 2023
647e086
WPCS Nags
dmsnell Dec 18, 2023
94a6c83
1-index test case numbering
sirreal Dec 19, 2023
bf55265
Skip doctype and comments in test dom tree
sirreal Dec 19, 2023
dc8ad56
Print nicer tests names
sirreal Dec 19, 2023
e5cdeaf
Better tag finding
sirreal Dec 19, 2023
b37739a
Remove space from test identifier, easier copy/paste filtering
sirreal Dec 19, 2023
8724ea4
Add skipping of certain tests
sirreal Dec 19, 2023
9d50600
Add files crediting html5lib-tests project
sirreal Dec 19, 2023
e80bf95
Fix lint
sirreal Dec 19, 2023
01f4149
Add ignores for formatting elements
sirreal Dec 19, 2023
c2d0e1f
Move test data to test data dir
sirreal Dec 19, 2023
aff5cd6
Fix expect/actual ordering, add test message
sirreal Dec 20, 2023
493cf5c
Add extra skipped tests
dmsnell Dec 20, 2023
273479f
Avoid running tests that expect anything in <head>
sirreal Dec 22, 2023
fd603fa
Use line numbers for test IDs
sirreal Jan 15, 2024
e57f7a8
Use padded line number
sirreal Jan 15, 2024
25bd659
Fix HTML input processing
sirreal Jan 15, 2024
b647629
Update ignores
sirreal Jan 16, 2024
47dc0f4
Skip incomplete token tests
sirreal Jan 16, 2024
0fbcfd1
Mark unsupported markup tests as incomplete, not skipped
sirreal Jan 16, 2024
9d5b180
Fix lints
sirreal Jan 16, 2024
5907bc7
Fix strlen paren bug
sirreal Jan 16, 2024
5e399f2
Fix some comments
sirreal Jan 16, 2024
867f109
Skip head tests
sirreal Jan 16, 2024
659eebd
Fix lint
sirreal Jan 16, 2024
544a4a6
Add attributes to html5lib tests
sirreal Jan 16, 2024
825c97f
Clean up and refactor test document parsing
sirreal Jan 16, 2024
9fd69f3
Fixing more lints
sirreal Jan 17, 2024
8ece64b
Rename class and test function
sirreal Jan 17, 2024
f49bbf3
Add skip for known bug - all tests passing or skipped
sirreal Jan 17, 2024
6b6c3dc
Ignore another P tag test
sirreal Jan 21, 2024
08d51e6
Use DIR_TESTDATA
sirreal Jan 23, 2024
a251cd8
Update covers
sirreal Jan 29, 2024
40da3e8
Add todo comments
sirreal Jan 30, 2024
252e37a
Remove covers
sirreal Jan 30, 2024
07fa7ad
Remove comment test skip
sirreal Jan 30, 2024
672bb47
Add much more HTML to tests
sirreal Jan 30, 2024
89aed66
Skip entities tests
sirreal Jan 30, 2024
8d4cb2a
Better variable name
sirreal Jan 30, 2024
3dce944
Skip all entities for now
sirreal Jan 30, 2024
1d113a7
Update skips
sirreal Jan 30, 2024
4148612
Remove leading class body space
sirreal Jan 30, 2024
7598fb7
Fix void tag indenting
sirreal Jan 31, 2024
286803b
Replace $p with $processor
sirreal Jan 31, 2024
a51363c
Expand README and add update instructions
sirreal Jan 31, 2024
1878f93
Add description to test class
sirreal Jan 31, 2024
8a45383
fix some test skipping
sirreal Jan 31, 2024
bec7014
Handle CDATA lookalike comment types
sirreal Jan 31, 2024
33f6a4b
Throw on unhandled token types
sirreal Jan 31, 2024
035f0ee
Skip whitespace test
sirreal Jan 31, 2024
f892705
Fix lint
sirreal Jan 31, 2024
168bba6
var_export our token type
sirreal Feb 2, 2024
429e933
Rename tree representation method, make private
sirreal Feb 2, 2024
dd0d56d
Skip tests with known issues
sirreal Feb 5, 2024
f0bcccd
Add ticket to dataProvider
sirreal Feb 27, 2024
084def5
Use assertSame over assertEquals
sirreal Feb 27, 2024
49c80f4
Extract should_skip_test function
sirreal Feb 27, 2024
09ce469
Rename result variable to expected_tree
sirreal Feb 27, 2024
cc1ed71
Mark unsupported tests as skipped
sirreal Feb 27, 2024
d73836e
Add and exclude html-api-html5lib-tests group
sirreal Feb 27, 2024
209af9a
Also exclude html5lib tests from multisite
sirreal Mar 7, 2024
b23ecda
Update since annotation
sirreal Apr 5, 2024
9717fff
Abort when error at end of token loop
sirreal Apr 5, 2024
c64034e
Explicitly mark test files as text, despite the .dat file extension.
dmsnell Apr 16, 2024
bbe7c21
Merge branch 'trunk' into add-html5lib-tests
dmsnell Apr 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions phpunit.xml.dist
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
<group>ms-files</group>
<group>ms-required</group>
<group>external-http</group>
<group>html-api-html5lib-tests</group>
sirreal marked this conversation as resolved.
Show resolved Hide resolved
</exclude>
</groups>
<logging>
Expand Down
1 change: 1 addition & 0 deletions tests/phpunit/data/html5lib-tests/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.dat -text diff
34 changes: 34 additions & 0 deletions tests/phpunit/data/html5lib-tests/AUTHORS.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Credits
=======

The ``html5lib`` test data is maintained by:

- James Graham
- Geoffrey Sneddon


Contributors
------------

- Adam Barth
- Andi Sidwell
- Anne van Kesteren
- David Flanagan
- Edward Z. Yang
- Geoffrey Sneddon
- Henri Sivonen
- Ian Hickson
- Jacques Distler
- James Graham
- Lachlan Hunt
- lantis63
- Mark Pilgrim
- Mats Palmgren
- Ms2ger
- Nolan Waite
- Philip Taylor
- Rafael Weinstein
- Ryan King
- Sam Ruby
- Simon Pieters
- Thomas Broyer
21 changes: 21 additions & 0 deletions tests/phpunit/data/html5lib-tests/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Copyright (c) 2006-2013 James Graham, Geoffrey Sneddon, and
other contributors

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
25 changes: 25 additions & 0 deletions tests/phpunit/data/html5lib-tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# html5lib-tests

This directory contains a third-party test suite used for testing the WordPress HTML API.

`html5lib-tests` can be found on GitHub at [html5lib/html5lib-tests](https://github.com/html5lib/html5lib-tests).

The necessary files have been copied to this directory:

- `AUTHORS.rst`
- `LICENSE`
- `README.md`
- `tree-construction/README.md`
- `tree-construction/*.dat`

The version of these files was taken from the git commit with
SHA [`a9f44960a9fedf265093d22b2aa3c7ca123727b9`](https://github.com/html5lib/html5lib-tests/commit/a9f44960a9fedf265093d22b2aa3c7ca123727b9).

## Updating

If there have been changes to the html5lib-tests repository, this test suite can be updated. In
order to update:

1. Check out the latest version of git repository mentioned above.
1. Copy the files listed above into this directory.
1. Update the SHA mentioned in this README file with the new html5lib-tests SHA.
108 changes: 108 additions & 0 deletions tests/phpunit/data/html5lib-tests/tree-construction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
Tree Construction Tests
=======================

Each file containing tree construction tests consists of any number of
tests separated by two newlines (LF) and a single newline before the end
of the file. For instance:

[TEST]LF
LF
[TEST]LF
LF
[TEST]LF

Where [TEST] is the following format:

Each test must begin with a string "\#data" followed by a newline (LF).
All subsequent lines until a line that says "\#errors" are the test data
and must be passed to the system being tested unchanged, except with the
final newline (on the last line) removed.

Then there must be a line that says "\#errors". It must be followed by
one line per parse error that a conformant checker would return. It
doesn't matter what those lines are, although they can't be
"\#new-errors", "\#document-fragment", "\#document", "\#script-off",
"\#script-on", or empty, the only thing that matters is that there be
the right number of parse errors.

Then there \*may\* be a line that says "\#new-errors", which works like
the "\#errors" section adding more errors to the expected number of
errors.

Then there \*may\* be a line that says "\#document-fragment", which must
be followed by a newline (LF), followed by a string of characters that
indicates the context element, followed by a newline (LF). If the string
of characters starts with "svg ", the context element is in the SVG
namespace and the substring after "svg " is the local name. If the
string of characters starts with "math ", the context element is in the
MathML namespace and the substring after "math " is the local name.
Otherwise, the context element is in the HTML namespace and the string
is the local name. If this line is present the "\#data" must be parsed
using the HTML fragment parsing algorithm with the context element as
context.

Then there \*may\* be a line that says "\#script-off" or
"\#script-on". If a line that says "\#script-off" is present, the
parser must set the scripting flag to disabled. If a line that says
"\#script-on" is present, it must set it to enabled. Otherwise, the
test should be run in both modes.

Then there must be a line that says "\#document", which must be followed
by a dump of the tree of the parsed DOM. Each node must be represented
by a single line. Each line must start with "| ", followed by two spaces
per parent node that the node has before the root document node.

- Element nodes must be represented by a "`<`" then the *tag name
string* "`>`", and all the attributes must be given, sorted
lexicographically by UTF-16 code unit according to their *attribute
name string*, on subsequent lines, as if they were children of the
element node.
- Attribute nodes must have the *attribute name string*, then an "="
sign, then the attribute value in double quotes (").
- Text nodes must be the string, in double quotes. Newlines aren't
escaped.
- Comments must be "`<`" then "`!-- `" then the data then "` -->`".
- DOCTYPEs must be "`<!DOCTYPE `" then the name then if either of the
system id or public id is non-empty a space, public id in
double-quotes, another space an the system id in double-quotes, and
then in any case "`>`".
- Processing instructions must be "`<?`", then the target, then a
space, then the data and then "`>`". (The HTML parser cannot emit
processing instructions, but scripts can, and the WebVTT to DOM
rules can emit them.)
- Template contents are represented by the string "content" with the
children below it.

The *tag name string* is the local name prefixed by a namespace
designator. For the HTML namespace, the namespace designator is the
empty string, i.e. there's no prefix. For the SVG namespace, the
namespace designator is "svg ". For the MathML namespace, the namespace
designator is "math ".

The *attribute name string* is the local name prefixed by a namespace
designator. For no namespace, the namespace designator is the empty
string, i.e. there's no prefix. For the XLink namespace, the namespace
designator is "xlink ". For the XML namespace, the namespace designator
is "xml ". For the XMLNS namespace, the namespace designator is "xmlns
". Note the difference between "xlink:href" which is an attribute in no
namespace with the local name "xlink:href" and "xlink href" which is an
attribute in the xlink namespace with the local name "href".

If there is also a "\#document-fragment" the bit following "\#document"
must be a representation of the HTML fragment serialization for the
context element given by "\#document-fragment".

For example:

#data
<p>One<p>Two
#errors
3: Missing document type declaration
#document
| <html>
| <head>
| <body>
| <p>
| "One"
| <p>
| "Two"
Loading
Loading