Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata intrasite links #2288

Merged
merged 4 commits into from
Feb 9, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pelican/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,10 @@ def run(self):
if hasattr(p, 'generate_context'):
p.generate_context()

for p in generators:
if hasattr(p, 'refresh_metadata_intersite_links'):
p.refresh_metadata_intersite_links()

signals.all_generators_finalized.send(generators)

writer = self.get_writer()
Expand Down
19 changes: 14 additions & 5 deletions pelican/contents.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,9 +140,8 @@ def __init__(self, content, metadata=None, settings=None,
if not hasattr(self, 'status'):
self.status = getattr(self, 'default_status', None)

# store the summary metadata if it is set
if 'summary' in metadata:
self._summary = metadata['summary']
if len(self._context.get('filenames', [])) > 0:
self.refresh_metadata_intersite_links()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesfleche @justinmayer As I commented in #2162 (comment), this line causes link replacement to be dependent on the order in which files are processed -- because it's done during parsing of a single article/page and not later when everything is ready (so e.g. links to pages in metadata fields that were not yet processed are reported as Unable to find).

I investigated further and realized refresh_metadata_intersite_links() is called again later below, after all articles/pages are processed -- and in that second step the links that were previously Unable to find are then replaced correctly. I don't know the code very well, but in my case when I commented the above two lines out, the warnings were gone (and links in metadata fields still correctly replaced). I imagine removing the above might cause some plugin to cease working (since e.g. the summary metadata field won't be present yet), so not really sure how to fix that. On the other hand, I don't think calling refresh_metadata_intersite_links() twice on everything is a good idea (at least from a perf PoV, but I could imagine some potential issues with {attach} as well).

Any ideas how to fix this?

Copy link
Member

@avaris avaris Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is not ideal.

Removing these won't make summary vanish, but it'll be unprocessed if one accesses before everything is processed. And that is actually fine, since accessing .summary in a plugin earlier than all_generators_finalized is a "bug" in itself for this same reason (it'll spew Unable to find warnings).

Another issue is that, this removed _summary. Even though it's "internal", some plugins used that.

From what I can see, this modification is all for preventing summary being processed twice.
Edit: Everything is processed twice anyway. I don't know what this is for.

I'd say, reverting this bit and get_summary should be enough.

And maybe excluding summary from refresh_metadata_intersite_links, if one is worried enough about the double processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got back to this (sorry for the delay), the proposed change is in #2406.

And maybe excluding summary from refresh_metadata_intersite_links, if one is worried enough about the double processing.

Not sure what you mean, reverting the above bit is (as far as I can see) enough to stop the double processing.

Copy link
Member

@avaris avaris Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, if you revert the above part and get_summary then summary will be processed twice.

  • Once when refresh_metadata_intersite_links is called
  • Once when content.summary is accessed (or any other way .get_summary is called).

Furthermore, reverting only this part is pointless, since you end up with a ._summary attribute that's not used anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, I'm afraid I lost all the context over the time we're discussing this 🙈

Furthermore, reverting only this part is pointless, since you end up with a ._summary attribute that's not used anywhere.

I did that only to fix this:

Another issue is that, this removed _summary. Even though it's "internal", some plugins used that.

So, what I ended up doing in #2406 is:

  • keeping _summary, because some plugins used that, as you said
  • not calling refresh_metadata_intersite_links too soon

I could also just delete this part altogether (and thus _summary would be no longer). Can we move the discussion to #2406?

Copy link
Member

@avaris avaris Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but _summary is not used in get_summary anymore. This PR also changes that. I mean those should be reverted together.

Plugins modifying _summary do so in order to change .summary, since you can't directly access .summary too early. Now they will access and change something irrelevant. Modifying _summary will leave .summary unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Now I get you. I thought _summary was accessed only for read, never for write. Now it all makes sense, sorry for my denseness.

#2406 is updated (the CI is failing for an unrelated reason, same on master), but instead of just reverting the second part I realized the changes to _summary won't be picked up by refresh_metadata_intersite_links() and the _summary handling would need to creep there as well. So now I introduced a _summary property getter/setter that's basically just a "view" onto metadata["summary"]. Also, "because some plugins used that" I assume it's not a workflow worth preserving so the getter/setter now prints a deprecation warning when the property is used.

I hope this is finally getting closer to how it should be :) And sorry for the delays, I'm extremely busy.


signals.content_object_init.send(self)

Expand Down Expand Up @@ -337,8 +336,8 @@ def get_summary(self, siteurl):
This is based on the summary metadata if set, otherwise truncate the
content.
"""
if hasattr(self, '_summary'):
return self._update_content(self._summary, siteurl)
if 'summary' in self.metadata:
return self.metadata['summary']

if self.settings['SUMMARY_MAX_LENGTH'] is None:
return self.content
Expand Down Expand Up @@ -413,6 +412,16 @@ def relative_dir(self):
os.path.abspath(self.source_path),
os.path.abspath(self.settings['PATH']))))

def refresh_metadata_intersite_links(self):
for key in self.settings['FORMATTED_FIELDS']:
if key in self.metadata:
value = self._update_content(
self.metadata[key],
self.get_siteurl()
)
self.metadata[key] = value
setattr(self, key.lower(), value)


class Page(Content):
mandatory_properties = ('title',)
Expand Down
15 changes: 15 additions & 0 deletions pelican/generators.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,14 @@ def generate_output(self, writer):
self.generate_pages(writer)
signals.article_writer_finalized.send(self, writer=writer)

def refresh_metadata_intersite_links(self):
for e in chain(self.articles,
self.translations,
self.drafts,
self.drafts_translations):
if hasattr(e, 'refresh_metadata_intersite_links'):
e.refresh_metadata_intersite_links()


class PagesGenerator(CachingGenerator):
"""Generate pages"""
Expand Down Expand Up @@ -649,6 +657,13 @@ def generate_output(self, writer):
override_output=hasattr(page, 'override_save_as'))
signals.page_writer_finalized.send(self, writer=writer)

def refresh_metadata_intersite_links(self):
for e in chain(self.pages,
self.hidden_pages,
self.hidden_translations):
if hasattr(e, 'refresh_metadata_intersite_links'):
e.refresh_metadata_intersite_links()


class StaticGenerator(Generator):
"""copy static paths (what you want to copy, like images, medias etc.
Expand Down
14 changes: 9 additions & 5 deletions pelican/tests/test_contents.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,17 +319,21 @@ def test_intrasite_link(self):
)

# also test for summary in metadata
args['metadata']['summary'] = (
parsed = (
'A simple summary test, with a '
'<a href="|filename|article.rst">link</a>'
)
args['context']['localsiteurl'] = 'http://notmyidea.org'
p = Page(**args)
self.assertEqual(
p.summary,
linked = (
'A simple summary test, with a '
'<a href="http://notmyidea.org/article.html">link</a>'
)
args['settings']['FORMATTED_FIELDS'] = ['summary', 'custom']
args['metadata']['summary'] = parsed
args['metadata']['custom'] = parsed
args['context']['localsiteurl'] = 'http://notmyidea.org'
p = Page(**args)
self.assertEqual(p.summary, linked)
self.assertEqual(p.custom, linked)

def test_intrasite_link_more(self):
# type does not take unicode in PY2 and bytes in PY3, which in
Expand Down