Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table-related infinite loop triggered in versions >= 1.0.2 #615

Closed
bsmelo opened this issue Dec 3, 2020 · 9 comments
Closed

Table-related infinite loop triggered in versions >= 1.0.2 #615

bsmelo opened this issue Dec 3, 2020 · 9 comments

Comments

@bsmelo
Copy link

bsmelo commented Dec 3, 2020

Hello!

We found an infinite loop being triggered using openhtmltopdf 1.0.4. Actually, after some tests, we found our code succeeds using openhtmltopdf 1.0.1, but from 1.0.2 onwards it enters the infinite loop. The smallest input we could construct was:

<table style="float:left; margin:1px;">
   <tr></tr>
</table>
<p>NOME</p>

Since we have a preprocessing phase using jsoup, the actual input to openhtmltopdf is:

<html>
 <head>
  <style>  body { word-wrap: break-word } table { width: 100% }  </style>
 </head>
 <body>
  <table style="float:left; margin:1px;"> 
   <tbody>
    <tr></tr> 
   </tbody>
  </table> 
  <p>NOME</p>
 </body>
</html>

We still couldn't figure out what exactly causes the program to loop, but if we remove any piece of the above input (e.g.: float:left; or even <p>NOME</p>) it doesn't loop at all. The following code can be used to reproduce the issue:

            ByteArrayOutputStream streamPdfa = new ByteArrayOutputStream();
            PdfRendererBuilder geradorPdf = new PdfRendererBuilder();
            htmlConteudoAjustado = "<THE INPUT ABOVE>";
            
            geradorPdf.withProducer("Nome Produtor");
            geradorPdf.useFastMode();
            geradorPdf.withHtmlContent(htmlConteudoAjustado, "");
            geradorPdf.toStream(streamPdfa);
            geradorPdf.run();

We're introducing a local, pre-processing fix, while also debugging and trying to figure out if we can contribute a fix at the openhtmltopdf side.

Thanks for the lib!

@syjer
Copy link
Contributor

syjer commented Dec 4, 2020

hi @bsmelo I can confirm the issue.

Looks like a problem inside the Layout algorithm for Inline text.

(More specifically it seems to start from InlineBoxing.layoutText() )

@syjer
Copy link
Contributor

syjer commented Dec 4, 2020

@syjer
Copy link
Contributor

syjer commented Dec 4, 2020

@syjer
Copy link
Contributor

syjer commented Dec 4, 2020

even simpler reproducer:

<html>
<head>
</head>
<body style="word-wrap:break-word">
<div style="width:100%;float:left;margin:1px"><span></span></div>
<p>a</p>
</body>
</html>

To be noted: margin:1px is important, the empty span is also important.

danfickle added a commit that referenced this issue Dec 4, 2020
Test kindly provided by @bsmelo and further refined by @syjer.
@danfickle
Copy link
Owner

Thanks both, strongly suspect it is hitting continue on L202 in endless loop. Will further debug now.

@LAlves91
Copy link

LAlves91 commented Dec 4, 2020

Greetings @danfickle!

If you guys find the solution to this issue, will it be released in the next planned version or do you think it would be possible to release something like a "bugfix" version?

Cheers and thank you for the great work!

@danfickle
Copy link
Owner

The core problem is that float clearing is not working. If you take out the break-word style, one can see that the paragraph is still incorrectly on the right side of the screen.

This then feeds into a too fragile test (my fault) in the break-word breaker:

                    if (totalWidth == 0 &&
                        avail == lineWidth) {
                        // We are at the start of the line but could not fit a single character!
                        totalWidth += context.getWidth();
                        break LOOP;
                    } else {
                        // We may be at the end of the line, so pick up at next line.
                        context.setEnd(savedEnd);
                        break LOOP;
                    }

Here, avail == lineWidth is never true as the float clearance failed, leading to an endless loop.

Anyway, got to continue tomorrow...

Of course, I'll do a release once this serious issue is fixed (perhaps also #614 regression).

danfickle added a commit that referenced this issue Dec 10, 2020
This now triggers the endless loop detection code and crashes instead of infinite loop. To fix this completely we need to also fix the float clearing algorithm.
danfickle added a commit that referenced this issue Dec 10, 2020
Trying to make breaker more robust again.
danfickle added a commit that referenced this issue Dec 11, 2020
Also move sanity checks for LineBreakContext to class and make them asserts.
danfickle added a commit that referenced this issue Dec 18, 2020
With test proofs (new and updated for transform float which was also not taking account of margin).
@LAlves91
Copy link

Hey @danfickle! Firstly, I wanna thank you a lot for working on this issue!

So, I know this one is closed already, but I wanted to use the space to make a suggestion: do you guys think it would be possible to add some kind of 'Developer Guide' to the wiki?

I'm speaking for myself, but I really wanted to help solving this issue, but the architecture is really complex (totally fitting to the extremelly complex problem it solves, of course), so it's kind of intimidating to start. I believe an architecture documentation (even only a few diagrams showing layers and flow) would help bringing more contributions!

Thank you!

@syjer
Copy link
Contributor

syjer commented Dec 20, 2020

Hi @LAlves91 ,

do you guys think it would be possible to add some kind of 'Developer Guide' to the wiki?

Good idea.

so it's kind of intimidating to start.

It's a big project, but you can identify a few logical "modules" and phases. Generally, for developing, I add a test and follow with the debugger the various points that I'm interested with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants