Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup an asset repository #5

Open
white-gecko opened this issue Feb 21, 2023 · 4 comments
Open

Setup an asset repository #5

white-gecko opened this issue Feb 21, 2023 · 4 comments

Comments

@white-gecko
Copy link
Member

We should also setup an asset repository for static resources especially images.

@white-gecko white-gecko changed the title Setup another asset repository Setup an asset repository Feb 22, 2023
@white-gecko
Copy link
Member Author

The repository should have some way to go through the graph and fetch all images (people depictions, project logos, partner logos, …) and serve them under some predictable id (e.g. hash of the iri) in this way we can then retrieve all images via https and from our domain where we can make sure to not track the visitors.

@KonradHoeffner
Copy link
Member

KonradHoeffner commented Mar 21, 2023

I would not use the hash of the IRI because if the content of the IRI changes then the hash stays the same.
This is not optimal for caching.
Instead I propose to hash the content.
This enables two very performance critical optimizations:

  1. An ETag response header with a strong ETag, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag.
    This allows transfering 0 bytes of the content, only the request and response headers (response code 304 I think) need to be transfered.
    However there is still the lag from the round trip, which shouldn't matter that much because the page should load in parallel but still even better is to add:
  2. Setting the cache-control header field to "immutable" along with the maximum max-age. This causes the browser cache to never even send a request header, which means that we don't even have the latency from the round trip.

Also this is a perfect way to deal with duplicates and renamings.

See also:

@KonradHoeffner
Copy link
Member

However the question is if the AKSW website has such heavy traffic and low latency requirements that this is even worth the effort.
Given that a static asset repository introduces more work and another potential failure point, my recommendation would be to first check if that is necessary.
Or do you mean it is legally required?
How many HTTP partner image URIs are there currently?
Would it be less work to just manually save them in a subdirectory?

@white-gecko
Copy link
Member Author

My main concern is eg this page: https://aksw.github.io/aksw.org.jekyllrdf/Team where we have images from various sources.

The asset repo would:

  • Keep images even if they are not available
  • provide all images from one host
  • we can add https
  • since the images are referenced with iris in the aksw.org graph I'm looking for a solution where I can just take that IRI and calculate a new url on our asset repo based on the iri alone.

Sure the host would not always be up to date, but we could have some ci job running that updates the files periodically. we could then have an etag based on the git commit id when the file was last updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants