-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation guidance on ETags #60
Comments
Interesting topic! @acoburn :
It does seem to be that it would be legitimate to do so, from the spec:
So, it seems we can stretch it not only to when the order of triples change, but also to different media types. However, it is clearly not the case for all RDF representations, as an RDFa media type may contain content that is not represented in e.g. Turtle, even if the graphs expressed are the same. This could resolve some of the issues you mention, but certainly not all. Also, I don't think weak validators are sufficient for our purpose (e.g. conditional writes). Therefore, I speculate that perhaps a new type of validator that has semantics that is better suited is something we should undertake? |
Yeah, I agree with Kjetil - for simplicity I would say it is One consequence would be PATCHing a comment on a Turtle resource wouldn't update the ETag, but I'm Ok with that, as I'd strip comments anyway (i.e. I'd just persist the graph). If the client wants comments preserved, then POST it as a NonRS!). I'm not sure how to treat RDFa (in general :) !), but my feeling is it'll just have to be special-cased too (and in LDP terms, also treated as a NonRS (but with internal awareness of it's 'partial' RDF-ness)). I'm sure I'm missing loads of nuance here, but I'm just trying to keep the server dumb! I'm also fine with @kjetilk's suggestion to begin undertaking a new validator, but that's a separate issue, for possible inclusion in v2.0 of the spec! |
re:
Given https://www.w3.org/TR/rdf11-concepts/#dfn-rdf-source :
Only the underlying RDF graph is semantically significant. Snapshots of the same state have isomorphic graphs. All other information in the serialization is out of scope. What's semantically significant is mentioned in https://tools.ietf.org/html/rfc7232#section-2.1 with a recommended exception:
Hence, it is legitimate to generate a strong ETag. It is also legitimate to use the same ETag value for different representations of the same RDF source. |
"may contain content" is not part of the same graph comparison as per https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism . Only the information that can be stated as an RDF graph is compared. |
@csarven I really appreciate this guidance. It is incredibly helpful. I would, however, like to bring up another very practical consideration:
If a browser fetches a resource as, say, Turtle, and the response includes an ETag (strong or weak doesn't matter here). < GET /resource
< Accept: text/turtle
> HTTP/1.1 200
> Content-Type: text/turtle; charset=UTF-8
> ETag: "opaque-etag-value" Then, in a subsequent request by that browser for the same resource but as, e.g. JSON-LD or with a different Prefer header, the request sends the ETag value in an < GET /resource
< Accept: application/ld+json
< If-None-Match: "opaque-etag-value"
> HTTP/1.1 304 If the client here is expecting JSON-LD (instead of Turtle), the graph parsing would likely fail. Is this too much of an edge case? I.e. a work-around would be for the client to explicitly not send the Again, any guidance would be appreciated. |
@acoburn Good point. Would adding |
@dmitrizagidulin I have already done that (though The easy way to resolve this would be to have different representations producing different ETag values, but that also leads to the issue described above w/r/t which value to use with The best browser-based work-around that I have found is to use the |
Very important insight, @acoburn , and while I agree with @csarven that strong validators are legitimate from the wording of the specifications, I think such issues are going to cause problems. The use of conditional requests will be very important for performance optimizations in Solid, and is therefore something that has to Just Work. I assume that the parsing is done by Solid-near code, perhaps we could work around it. Just braindumping here: My general sense is that the use of a cache is so important and parsers are generally available, so it shouldn't be an impediment to use a cache that you have a different serialization in there than was requested by an app. Generic client-side libraries should be able to get stuff from cache and transform it to whatever the app wants. So, even if the browser cache contains a Turtle serialization, and the strong validator makes the browser return that Turtle, perhaps the client side libraries should be capable of transforming the RDF, so that the app gets a the requested media type? |
Yes, but then, I would argue that it is not the same representation. For example, I would say it is a stretch to say that: <h2 property="dct:title">The Trouble with Bob</h2>
<p>Date: <span property="dct:created">2011-09-10</span></p> is fully represented by <> dct:title "The Trouble with Bob" ;
dct:created "2011-09-10" . even though they are isomorphic graphs. In the latter, the header and paragraph semantics is lost. It is definitly breaking it to say that <h2>The Trouble with Bob</h2>
<p>Date: <span property="dct:created">2011-09-10</span></p> is fully represented by <> dct:created "2011-09-10" . even though they are again isomorphic. Graph isomorphism isn't sufficient to decide whether two representations are equivalent for all RDF serializations, even though it is for most. |
Citation needed. The equivalence of representations is in context of RDF Sources in which LDP-RS is based off. Markup languages like HTML, SVG, MathML are host languages for RDFa. What's relevant in context of an RDF Source is that RDFa happens to be a way to materialise the RDF graph. All other information eg. |
It seems the argument was provided below? 🙂 |
I was requesting to see material from the world of specs that would dismiss what I've argued (and cited) and supports his argument. I could "argue" that infinitely different Turtle, JSON-LD.. serializations but with the same underlying graph to be different representations all meanwhile being interpreted in context of LDP-RS/RDF Source. But, that would make no sense (to me). neutral_face |
If there are no named graphs; yes. Otherwise, triple-based formats are not lossless compared to quad-based. |
Right, we're citing exactly the same specs here, that is not the issue. My point is that an RDFa representation cannot simply be considered an LDP-RS, because that would discard semantics of the host language. I suppose we have to go to the Webarch definition of representation. The host language and RDF combined can represent the resource state in fully in a way that one of them does not. So, if we do not take the academic argument here, but the pragmatic one, just imagine if the user requests an RDFa document with rich host language markup and content, and we give them just a few triples that represents a tiny part of the original document, because a strong validator has told us that the two are equivalent. I think users would be very upset. |
Hmmm, OK, rereading the thread, then, I think I understand your argument better, @csarven , because if we assume that it is known that the resource is an LDP-RS, then indeed, data that are not represented by RDF is of no relevance. OK, I can go with that. However, my point is that in the general case, we'd have a situation with RDFa where it is not clear if we have an LDP-RS, and then the host language semantics and content matters. |
If content publisher deems that only the information that's encapsulated in RDFa is intended to persist eg. through other RDF representations, that's their call. This is why content publishers wanting to persist whatever is of relevance for a graph, they should take a lossless approach as much as possible. There is nothing stopping them to describe the complete structure and the content such that the resulting graph contains all information. It is sensible to treat a resource as an LDP-RS given RDF Source, which says "any web document that has an RDF-bearing representation may be considered an RDF source." Holds true for any syntax used to convey information with RDF. If host language's semantics and the content that's not encapsulated in RDFa is important for the publisher ie. at least more important than having it emit an RDF graph, and needs to persist, then they should instead consider treating the resource as an LDP-NR. Their call as to how they wish handle their resources. |
Right, I can see your point. I realize it is essentially an argument from ignorance, but I tend to think about the mistakes that I might make and try to design robustness around them. I'm not confident that I would know to make an RDFa document an LDP-NR, nor am I sure I should have to. So, the implication of all this is that a strong validator can be used for an LDP-RS with different serializations with the practical caveats around how browsers treat caches. We could be in a situation where the difference between an LDP-RS and an LDP-NR lies only in the markup (e.g. |
Good point about mistakes and reducing their chances from happening. I suppose this is where authoring/sharing tools (aka Solid applications) get to decide a bit on behalf of the user. If the source document is intended to be graph-like, then that's all there is to it. We could try to dissect the rationale further but I think it would suffice to treat that as an axiom. Ultimately only an application and its user would know whether something is intended to be an LDP-RS or -NR. That is also why LDP servers are instructed to honour client's interaction model in the request. |
@acoburn ,
Assuming that the server responded to an LDP-RS with a strong ETag, then
So:
Especially so:
Am I missing something?
Excellent question. I don't think so. It is an implementation detail re https://tools.ietf.org/html/rfc7232#section-2.3.1 and that if a representation changes that "can be reasonably and consistently determined" there will be a new ETag value. It is an open-ended criteria, so a set of dimensions that is deemed to uniquely identify a representation.
Strong ETag from the original request.
No, it shouldn't because one of the key identifiers ie.
I'm not sure how that works in implementations. Do they combine the information from the cached entry with the new request and factor them in the process eg. graph parsing? The intention to use This is a bit fuzzy for me at the moment but it is probably worthwhile to specify that a server should be capable of generating both weak and strong entity-tags so that however it decides to come up with one for an ETag, it can potentially be used for
Wouldn't the server perhaps include
I did not have good experience in reusing cache after |
My concern is that if we are to realize the use cases we are promoting where the app gathers data from many different sources, then we have very little control over the optimizations at the origin (as opposed to e.g. Facebook), so we will have to rely on every single tool in the shed to realize the performance goals, and caching based on conditional requests whether on the client side or in proxies, I'm pretty sure will be very important. I'm therefore wary that the easy way out by using "no-store" will become a problem not easily corrected down the road. |
@kjetilk I agree. Given the current state of user-agents and the degree of discrepancies with the specifications, perhaps TSE can have an informative text using a should- or may-like language to note the potholes that applications may encounter (and maybe ways to get around them.) |
The LDP specification requires the use of ETag headers for
GET
andHEAD
responses. There is some subtlety to how ETags work in the context of RDF, and some implementation guidance (e.g. in a non-normative section) might be useful.For example, RFC 7232, section 2.3 is clear that different ETags should be produced for different representations of a resource:
Furthermore, RFC 7232, section 2.1 describes the difference between weak and strong ETags. For RDF serializations where RDF semantics may be more important than byte-for-byte consistency, is it legitimate to generate a strong ETag for an LDP-RS even if the server does not guarantee the order of the triples?
Consider also the case where a server can produce
text/turtle
,application/ld+json
andapplication/n-triples
, each of which generates a different ETag value. Then consider also that the server supportsPrefer
headers as well asContent-Encoding
negotiation, each of which could produce an additional dimension for ETag generation. In that case, suppose a client wishes to send aPATCH
request along with anIf-Match
header as part of a conditional request. What value should be included in theIf-Match
header? And given the various permutations of ETags that could be generated, does a server need to check all such permutations before accepting the conditional request? (DoesIf-Match
make sense in the context ofPATCH
?)This question is simpler in the context of
PUT
, but what if a client retrieves an LDP-RS as JSON-LD using a custom profile (Accept: application/ld+json; profile="https://...."
), modifies the RDF graph and then, viaPUT
replaces the resource as JSON-LD. What value should be used withIf-Match
? Does this change if the resource is retrieved as JSON-LD and replaced as Turtle?There are clearly some nuances here, and it may be helpful to provide some guidance to implementers.
The text was updated successfully, but these errors were encountered: