Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed blog readtime calculation to ignore non-content text #7370

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

sisp
Copy link
Contributor

@sisp sisp commented Jul 16, 2024

I've fixed the blog readtime calculation to ignore non-content text, i.e. any text nodes that are inside the following tags are omitted from calculating the readtime:

  • <object>
  • <script>
  • <style>
  • <svg>

The implementation is inspired by the search plugin parser.

Note that I've duplicated the void set from the search plugin. It might be cleaner to deduplicate it, but I wasn't sure where to best move it to import from. If you'd prefer deduplication, I'd appreciate your guidance on where to factor void out, @squidfunk.

Fixes #7367.

@squidfunk
Copy link
Owner

Note that I've duplicated the void set from the search plugin. It might be cleaner to deduplicate it, but I wasn't sure where to best move it to import from. If you'd prefer deduplication, I'd appreciate your guidance on where to factor void out, @squidfunk.

There's no need to integrate the void tags at all, as for them, the data handler is not called. Void tags are tags without children, i.e., leaf components. Please remove the void handling logic, as it does not influence the result.

@squidfunk
Copy link
Owner

Ah wait, no, we might need them to maintain the context. Can you please just import it from the search plugin for now and make a note in the code that we might refactor it at some point? Please don't do the refactoring now, we're heavily working behind the curtains right now and there will be fundamental changes to Material for MkDocs, which is why we currently try to keep things as stable and unchanging as possible.

@squidfunk
Copy link
Owner

... you might check if the _endtag handler is called for void tags. If it is, remove the void logic. If not, keep it. It's been too long since I've worked with Python's HTML SAX parser.

@sisp
Copy link
Contributor Author

sisp commented Jul 16, 2024

I've checked: The _endtag is not called for void tags. Now, the void set is imported from the search plugin as you suggested.

@squidfunk
Copy link
Owner

Okay great, thanks for investigating. Note that this is an HTML4 parser, not HTML5, which we chose for speed and package size (as no further dependency is needed), so as I remember, this is why we need to check for void tags by ourselves. We'll check in the future if we can swap this out for an approach that directly operates on the HTML AST, which would be much more efficient, but for now, I think we can keep it.

@squidfunk
Copy link
Owner

From what I can see, this looks great, ready to merge! Thanks for your time!

@squidfunk squidfunk merged commit 6b13c56 into squidfunk:master Jul 16, 2024
4 checks passed
@sisp
Copy link
Contributor Author

sisp commented Jul 16, 2024

Perfect, thanks for the helpful review and feedback! 🙏

@sisp sisp deleted the fix/blog-readtime branch July 16, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blog readtime includes inline SVG text content
2 participants