Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal URLs not parsed in summaries #3265

Closed
arturlangner opened this issue Dec 9, 2023 · 11 comments · Fixed by #3280
Closed

Internal URLs not parsed in summaries #3265

arturlangner opened this issue Dec 9, 2023 · 11 comments · Fixed by #3280

Comments

@arturlangner
Copy link

Internal links (ie. {filename}/stuff/more_stuff.md) are not parsed in a summary and the output HTML URL is broken. It contains the original verbatim text.

I experienced it when using https://github.com/MinchinWeb/minchin.pelican.plugins.summary but the bug itself is in Pelican codebase.

The problematic code in get_summary() is:

if "summary" in self.metadata:
            return self.metadata["summary"]

Fix that works for me is:

if "summary" in self.metadata:
            return self._update_content(self.metadata["summary"], siteurl)

I would be happy if this fix or something similar goes upstream. 🙂

@avaris
Copy link
Member

avaris commented Dec 18, 2023

Your suggestion looks correct to me. PR would be welcome :).

Actually... This should already be handled in

self._summary = self._update_content(self._summary, self.get_siteurl())

And it works as expected. I can't reproduce this:

$ cat pelicanconf.py
SITEURL = 'http://example.com'
$ cat content/articles/test.md
title: article1
date: 02.01.2023
summary: [another article]({filename}test2.md)

Article 1 content
$ cat content/articles/test2.md
title: article2
date: 03.01.2023

Article 2 content
$ pelican content -s pelicanconf.py
[00:32:55] WARNING  No timezone information specified in the settings. Assuming your timezone is UTC for feed generation. Check                                                        log.py:89
                    https://docs.getpelican.com/en/latest/settings.html#TIMEZONE for more information
Done: Processed 2 articles, 0 drafts, 0 hidden articles, 0 pages, 0 hidden pages and 0 draft pages in 0.06 seconds.
$ grep "another article" output/index.html
</footer><!-- /.post-info -->                <p><a href="http://example.com/article2.html">another article</a></p>

@justinmayer
Copy link
Member

@arturlangner: Any follow-up comments about this issue that you submitted?

@arturlangner
Copy link
Author

I uninstalled and reinstalled Pelican 4.9.1 with pip in a virtualenv and I still have the issue. Output HTML in the summaries contains links with "raw" URLs like: <a href="{filename}/articles/2021/stuff.md">stuff</a>.

I checked my contents.py:527 and it does contain the code @avaris has mentioned above.

The only somewhat special thing about my setup is that I also use minchin.pelican.plugins.summary (1.2.1, also via pip).

@justinmayer
Copy link
Member

@avaris already said that he can't reproduce the issue you describe. Do you not think it would make more sense to try without the minchin.pelican.plugins.summary plugin installed and see if the issue persists? Don't that seem like a troubleshooting step that should have been performed from the very beginning?

@arturlangner
Copy link
Author

I retested with the plugin disabled and the output is okay. The plugin only extracts the summary and does not do any processing.

@justinmayer justinmayer removed the bug label Jan 16, 2024
@justinmayer
Copy link
Member

So… If the problem goes away when the plugin is disabled, doesn't this seem like something to ask @MinchinWeb about since it appears related to the plugin instead of Pelican itself?

@MinchinWeb
Copy link
Contributor

MinchinWeb commented Jan 19, 2024

@arturlangner Is your site setup somewhere online where I can look at it? In particular, it makes a difference what version of Pelican you're trying to run this with...

c.f. MinchinWeb/minchin.pelican.plugins.summary#4

@arturlangner
Copy link
Author

It is not online but I attach the minimal subset here (run make devserver). example.zip

I installed everything with pip in a virtualenv:

$ pip list 
Package                            Version
---------------------------------- ------------
anyio                              4.1.0
blinker                            1.7.0
docutils                           0.20.1
feedgenerator                      2.1.0
idna                               3.4
Jinja2                             3.1.2
Markdown                           3.5.1
markdown-it-py                     3.0.0
MarkupSafe                         2.1.3
mdurl                              0.1.2
minchin.pelican.plugins.autoloader 1.2.1
minchin.pelican.plugins.summary    1.2.1
ordered-set                        4.1.0
pelican                            4.9.1
pelican-more-categories            0.1.0
pelican-sitemap                    1.1.0
pip                                23.3.2
Pygments                           2.17.2
python-dateutil                    2.8.2
pytz                               2023.3.post1
rich                               13.7.0
semantic-version                   2.10.0
setuptools                         68.2.2
six                                1.16.0
sniffio                            1.3.0
Unidecode                          1.3.7
watchfiles                         0.21.0

Everything runs on Python 3.11.7.

@MinchinWeb
Copy link
Contributor

@arturlangner this is brilliant, thank you!

MinchinWeb added a commit to MinchinWeb/minchin.pelican.plugins.summary that referenced this issue Jan 22, 2024
@MinchinWeb
Copy link
Contributor

So the good news, is I can replicate the issue.

Some history:

Back is 2014, Plugins Issue #314 was raised covering this very issue (that links weren't being resolved), and so the solution was to create a new signal (all_generators_finalized, see PR 1616) and move this plugin over to it (see Plugins PR #410. Those changes shipped with with Pelican 3.6. The plugin is using the same singals it did then.

Pelican 3.6 works as expected, as does Pelican 3.7. However, v4.0 breaks, as above.

Between v 3.7.1 (the last of the 3 series) and v 4.0.0, there are over 250 commits. It will take some time to figure out further where it broke. If anyone also wants to look for this, you'll need Python <= 3.9, Jinja2 <= 3.0.1, and Markdown 2.6.11 (other version of Markdown may work).

The breaking change was PR #2226 (although it was actually #2288 that was merged, but it's based on this). What this change does is remove the automatic link processing of the summary, but extend it to all FORMATTED_FIELDS. And the default value of FORMATTED_FIELDS = ["summary"] (see Setting docs).

After all the above, is the flaw just that the links are updated before the plugin is run? Can the order of these just be swapped? The current code:

# pelican/__init__.py

        for p in generators:
            if hasattr(p, "refresh_metadata_intersite_links"):
                p.refresh_metadata_intersite_links()  # <-- links get updated

        signals.all_generators_finalized.send(generators)  # <-- summary plugin called

I've created a Pull Request (to be pushed here shortly) that will do just that.

@MinchinWeb
Copy link
Contributor

Ok, the Pull Request has been pushed! #3280 solves it for me locally. A test site is available as part of the summary plugin if you want to test is locally yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants