Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml2rfc non-determinism #553

Closed
ietf-svn-bot opened this issue Sep 24, 2020 · 19 comments
Closed

xml2rfc non-determinism #553

ietf-svn-bot opened this issue Sep 24, 2020 · 19 comments
Labels
html Issues in HTML output minor

Comments

@ietf-svn-bot
Copy link

owner:krathnayake@ietf.org resolution_fixed type_enhancement | by normingtong@vmware.com


A colleague and I are both using xml2rfc v3.1.1, but the HTML output varies, which makes diffs bigger than they need to be.

An example of generating unchanged XML to produce changed HTML is here.

The nub of the problem seems to be that classes appear in an unpredictable order. For example sometimes xml2rfc produces:

<ul class="ulEmpty compact toc">

and other times it produces:

<ul class="ulEmpty toc compact">

A suggested fix would be to sort the classes.


Issue migrated from trac:553 at 2022-02-08 07:09:58 +0000

@ietf-svn-bot
Copy link
Author

@henrik@levkowetz.com commented


Hi Glyn,

At the bottom of this is how dictionary key order it is handled by Python, and how it has changed with different versions. There were changes between 3.5 and 3.6, and additional changes between 3.6 and 3.7. I've put quite a bit of effort into minimizing the effect, using ordered dicts in some places, and also tried to be consistent about insertion order, but there are still differences which seems to be related to the interaction between Python's dictionary key ordering and the lxml library's code.

I have to take exactly the issue you point at into consideration in the xml2rfc test suite, making things more complex there than would otherwise be needed.

The best I can suggest is that you both use the same version of Python 3. The exact version shouldn't matter for this issue, although I'd suggest 3.7 or 3.8 because of the added consistency they guarantee in this area.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


I see the same problem on a single machine, with always using the same Python version (I believe the only thing that changes between runs is the system time...).

@ietf-svn-bot
Copy link
Author

ietf-svn-bot commented Sep 25, 2020

@henrik@levkowetz.com commented


Replying to ietf-svn-conversion/xml2rfc#553 (comment:2):

I see the same problem on a single machine, with always using the same Python version (I believe the only thing that changes between runs is the system time...).

This would be helpful if it provided sample XML showing the issue and gave the python version. None of my test files show this behaviour when the same python version is being used, and this has been the case since I first observed that different python versions gave different attribute orderings, more than a year ago.

Till solid data is provided, believe you're mistaken in indicating this can happen with always using the same Python version, Julian.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de uploaded file xref-tests-rfc7991.xml (49.5 KiB)

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


test file attached

python version:


> python3 --version
Python 3.6.4

As you can see, multiple runs create different HTML (of same length, and the difference is indeed ordering of class names)

Test runs:


> xml2rfc --html --v3 --legacy-date-format -P xref-tests-rfc7991.xml -o xref-tests-rfc7991.xml2rfcv3.html
 Created file xref-tests-rfc7991.xml2rfcv3.html
> cksum xref-tests-rfc7991.xml2rfcv3.html
645618257 137844 xref-tests-rfc7991.xml2rfcv3.html

> xml2rfc --html --v3 --legacy-date-format -P xref-tests-rfc7991.xml -o xref-tests-rfc7991.xml2rfcv3.html
 Created file xref-tests-rfc7991.xml2rfcv3.html
> cksum xref-tests-rfc7991.xml2rfcv3.html
842635941 137844 xref-tests-rfc7991.xml2rfcv3.html

> xml2rfc --html --v3 --legacy-date-format -P xref-tests-rfc7991.xml -o xref-tests-rfc7991.xml2rfcv3.html
 Created file xref-tests-rfc7991.xml2rfcv3.html
> cksum xref-tests-rfc7991.xml2rfcv3.html
1449545 137844 xref-tests-rfc7991.xml2rfcv3.html

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


BTW, the problem reproduces with "elements.xml" as well.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


Out of curiosity: what's the relevance of lxml here? The different ordering happens with CSS class names in a single attribute, not with the ordering of multiple HTML attributes...

@ietf-svn-bot
Copy link
Author

@rjsparks@nostrum.com changed priority from medium to minor

@ietf-svn-bot
Copy link
Author

@rjsparks@nostrum.com changed status from new to under_review

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


I'm now running this:

#!/bin/sh
cat $1 |
sed 's|<li class="compact ulEmpty toc"|<li class="compact toc ulEmpty"|' |
sed 's|<li class="toc compact ulEmpty"|<li class="compact toc ulEmpty"|' |
sed 's|<li class="toc ulEmpty compact"|<li class="compact toc ulEmpty"|' |
sed 's|<li class="ulEmpty compact toc"|<li class="compact toc ulEmpty"|' |
sed 's|<li class="ulEmpty toc compact"|<li class="compact toc ulEmpty"|' |
sed 's|<ul class="compact ulEmpty toc"|<ul class="compact toc ulEmpty"|' |
sed 's|<ul class="toc compact ulEmpty"|<ul class="compact toc ulEmpty"|' |
sed 's|<ul class="toc ulEmpty compact"|<ul class="compact toc ulEmpty"|' |
sed 's|<ul class="ulEmpty compact toc"|<ul class="compact toc ulEmpty"|' |
sed 's|<ul class="ulEmpty toc compact"|<ul class="compact toc ulEmpty"|' |
tr -d "\r" > $$
mv $$ $1

as post-processor.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


Any chance this might get fixed some time soon? It's really annoying, and recent changes to the HTML generated for the ToC made it worse...

@ietf-svn-bot
Copy link
Author

@kesara@staff.ietf.org changed status from under_review to assigned

@ietf-svn-bot
Copy link
Author

@kesara@staff.ietf.org set owner to krathnayake@ietf.org

@ietf-svn-bot
Copy link
Author

@kesara@staff.ietf.org commented


Looks like sorting class values in HTML post processing step might be a quick fix for this. On to it. :)

@ietf-svn-bot
Copy link
Author

@kesara@staff.ietf.org changed status from assigned to closed

@ietf-svn-bot
Copy link
Author

@kesara@staff.ietf.org set resolution to fixed

@ietf-svn-bot
Copy link
Author

@kesara@staff.ietf.org commented


Fixed in 65b5445:

Sort class values in HTML output. Fixes #553. Commit ready for merge.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


Awesome, thanks!

@ietf-svn-bot
Copy link
Author

@rjsparks@nostrum.com commented


Fixed in 34d039e:

Merged in 65b5445 from krathnayake@ietf.org:\n Sort class values in HTML output. Fixes #553.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
html Issues in HTML output minor
Projects
None yet
Development

No branches or pull requests

1 participant