Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlinks for stable etc. copy files on Github with potential performance loss #1914

Closed
jkrumbiegel opened this issue Aug 23, 2022 · 3 comments

Comments

@jkrumbiegel
Copy link
Contributor

jkrumbiegel commented Aug 23, 2022

The Makie docs had a very long deploy time of 38min for pure build + 20min just for pushing the built site to gh-pages.
The artifact size was about 6GB. I tried to reduce the size by removing all lower patch versions within the same minor version and replacing them with symlinks to not break old links. However, I noticed that gh-pages build time on Github Actions was not affected by this.

I then noticed that github actions prepares upload of a built site with the command tar --dereference --hard-dereference ... which means that every symlink to a directory is replaced with a full copy of that directory.

I know that this currently probably doesn't make a ton of a difference for most projects that have purely text-based docs and don't use my patch-version-symlink hack (so all their patch version docs would accumulate over time anyway). But maybe you weren't aware that the symlink technique for stable, dev, etc, doesn't save build time or artifact bloat, so over time the build process can become quite slow.

Here's a response by github why the behavior is what it is and ways to mitigate community/community#9104 (reply in thread)

It could make sense to replace the stable etc. symlinks with client side redirects.

@mortenpi
Copy link
Member

Thanks for bringing this up, I wasn't aware of this behavior, and it's certainly good to know that GitHub Pages works this way.

But maybe you weren't aware that the symlink technique for stable, dev, etc, doesn't save build time or artifact bloat, so over time the build process can become quite slow.

I wasn't, but that was never the goal anyway. Mostly, the logic with the symlinks is there to make sure you have all patch versions stashed away for posterity, and offers the easiest way to organize canonical URLs. It does save space on the gh-pages branch though, which is the one that usually matters (well, Git should also de-duplicate, so maybe it doesn't save space actually).

However, I am not sure if there is anything actionable here for Documenter.

  • As you say, this is not really something that affects the average repo, or even Julia's own docs (for the latter, it takes 4min for 840MB artifacts). So I am not sure it's worth maintaining a complex workaround for this here.
  • I don't think client-side redirects (via meta http-equiv="refresh" I presume) would be very practical. First, you'd have to have to generate a separate .html file for each possible URL the user could land on, which is a hassle. Secondly, it would break the URLs, since the client-side redirect updates the URL (our goal with the symlinks is to make the e.g. stable/ or v1/ URL canonical).
  • Server-side redirects would be great, but GitHub hasn't implemented anything for this yet, so I wouldn't hold my breath.

For Makie.. in some ways the problem is just that the docs are too big, no? For standard documentation setups, we generate only a few symlinks anyway (relative to the unique patch versions), and so even if we'd remove the symlinks, the artifact size wouldn't change much I think? If you'd remove all the patch version symlinks (patch version URLs are not usually canonical anyway), then it should help quite a bit in your case?

It might be possible to do something with the versions keyword of deploydocs though, to not keep separate patch versions around.

@jkrumbiegel
Copy link
Contributor Author

Yes our docs are quite big, but that's because we have to show lots of images. Not really a good way around that.

First, you'd have to have to generate a separate .html file for each possible URL the user could land on, which is a hassle.

Ah yeah I didn't consider the subdirectories.. Ok then that's not a possibility.
I can delete all the old patch version symlinks, I just didn't like that I would break tons of old links this way. But probably needs to be done.

I am not sure if there is anything actionable here for Documenter

I agree now, so because server-side redirects are not in your scope, I'll close. Just wanted to bring this to your attention.

@mortenpi
Copy link
Member

First, you'd have to have to generate a separate .html file for each possible URL the user could land on, which is a hassle.

Ah yeah I didn't consider the subdirectories.. Ok then that's not a possibility.

I wouldn't say it's not a possibility, just that I think it's too complicated to have here, when symlinks work well in basically all cases.

It would not be too hard to do actually --- just walk through the target directory, and for each HTML file, generate a HTML file in the source directory with the appropriate <meta http-equiv="refresh" ...> tag. This might actually be perfect for you to keep your patch version symlinks working. In that case you're also going from a non-canonical URL to a canonical, so the client-side URL change would not be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants