Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container Data Augmentation #144

Open
kjetilk opened this issue Jan 30, 2020 · 8 comments
Open

Container Data Augmentation #144

kjetilk opened this issue Jan 30, 2020 · 8 comments

Comments

@kjetilk
Copy link
Member

kjetilk commented Jan 30, 2020

From discussion with @timbl , we found that container metadata are intended to be quite sparse, in the sense that it is mostly containment triples, minimal server data, possibly other metadata, such as timestamps of last modification, etc. It is not intended to host extensive metadata about the container or resources in the container that apps may find useful.

Such data is what I refer to as augmented data. Hitherto (I love that word :-) ), the databrowser has used a resource called index.ttl for such data.

This issue is opened as the issue is a orthogonal (or at least 75 degrees :-) ) to the container listing issue in #116.

@csarven
Copy link
Member

csarven commented Jan 30, 2020

Suggestions (more or less in the following order):

  • Can the use case be covered by one (or more) of the proposed resource metadata types?
  • If not, does it warrant a new metadata category eg. "stats", "auxiliary", "stuff"?

If neither of those hold, there should still be a way to discover the resource through a relation (via container), and not just rely on a fixed name (eg. index.ttl). After all, if a fixed name is important, /.well-known/augmentation would work just as well although fugly!). The name "index.ttl" problematic in that it clashes with potential representations of / with names under index.* or at the very least index.html. Its use case also differs from index.html for instance. And evidently a cause of confusion to us - I can only imagine the amount of confusion elsewhere. We should steer away from that happening ie. index.html for representation but index.ttl for container data augmentation. It will free up the name index.ttl for container representation. If a server wants to use a specific property and still use index.ttl, that's their call, and so the naming is an implementation detail.

If the kind of data that's expected to appear in this resource is significant, it should have a specific property for discovery - I don't care what it is called. While rdfs:seeAlso can be useful, I find it a bit too generic. There may be other seeAlso uses in a Container and so it should be easily distinguishable. However, if it is not that significant, seeAlso would be fine.

@kjetilk
Copy link
Member Author

kjetilk commented Jan 30, 2020

Agreed with all those points!

I think a decision item here is that it could be more than one, to allow different ACLs to apply to them.

I suppose it could be a metadata category, by itself, yes.

@elf-pavlik
Copy link
Member

elf-pavlik commented Jan 30, 2020

timestamps of last modification

This sounds to me like Server Managed metadata described in https://github.com/solid/data-interoperability-panel/pull/32/files#diff-a2cbd45fd836a7e442e877ff247f2118R105

Can the use case be covered by one (or more) of the proposed resource metadata types?

👍 I hope that proposed resource metadata can fully address it

It is not intended to host extensive metadata about the container or resources in the container that apps may find useful.

I think one of common examples for applications like generic data browser comes with human readable label (sometimes with i18n). I think it should work in the same predictable way for any resource, to provide human readable labels for non-rdf sources (eg images or videos), I think Resource Description (solid/data-interoperability-panel#32) provides clear location for label. In case client doesn't find label for container in container representation itself, it should try to follow rel="describedBy" just as for any non-rdf source.

I think we could discuss in solid/data-interoperability-panel#32 cases where server would want to optimize responses by including in representation of one resource some information from another one (possibly one of more of its metadata resources). I'll post comment about it in that PR.

Edit: I've created a new issue solid/data-interoperability-panel#39 (Embedding metadata in resource representation)

@kjetilk
Copy link
Member Author

kjetilk commented Jan 30, 2020

timestamps of last modification

This sounds to me like Server Managed metadata described in https://github.com/solid/data-interoperability-panel/pull/32/files#diff-a2cbd45fd836a7e442e877ff247f2118R105

Yes, most of the container representation will indeed be server managed. Server managed is a bigger class of things though, as it could be audit logs and that kind of stuff too, which wouldn't be a container representation per se…

Can the use case be covered by one (or more) of the proposed resource metadata types?

+1 I hope that proposed resource metadata can fully address it

It is not intended to host extensive metadata about the container or resources in the container that apps may find useful.

I think one of common examples for applications like generic data browser comes with human readable label (sometimes with i18n). I think it should work in the same predictable way for any resource, to provide human readable labels for non-rdf sources (eg images or videos), I think Resource Description (solid/data-interoperability-panel#32) provides clear location for label. In case client doesn't find label for container in container representation itself, it should try to follow rel="describedBy" just as for any non-rdf source.

I think we could discuss in solid/data-interoperability-panel#32 cases where server would want to optimize responses by including in representation of one resource some information from another one (possibly one of more of its metadata resources). I'll post comment about it in that PR.

There's an important nuance here that I feel is lost: It is decidedly not about embedded metadata, as it is not part of the container representation, to the contrary, it is about data that can augment the container, and also better sum up certain traits of the contained resources, so I am reluctant to frame it under the label of embedded metadata, even more general as metadata at all :-) Now, one person's metadata is somebody else's data, that's the nature of things, so simply from that perspective the division is somewhat arbitrary. The resource is quite likely to contain data that is part of the data found in resources, as such, it may be used as a optimization.

I think that perhaps we should defer the discussion of this topic somewhat, since it can be done later, and that the index.ttl is used for the purpose right now, so there's no urgency in defining it.

For this feature, I think it makes sense to have use cases on the table before going further.

@elf-pavlik
Copy link
Member

For this feature, I think it makes sense to have use cases on the table before going further.

👍 💯

@acoburn
Copy link
Member

acoburn commented Jan 30, 2020

In the context of how this sort of special resource fits into the Solid specification, it might be useful to generalize this pattern a bit.

That is, this container data augmentation could be one type of resource that is able to customize the way data is presented to a user. @justinwb has called these "smart resources" in a different context; for example, a container that includes lots of activity data, a "smart resource" might be able to do certain types of aggregation or other processing of the contained data. The details will clearly be application specific and depend a lot on what a given server can support, but it would be good to think about a general pattern for this.

@csarven
Copy link
Member

csarven commented Jan 31, 2020

timestamps of last modification

This sounds to me like Server Managed metadata described in https://github.com/solid/data-interoperability-panel/pull/32/files#diff-a2cbd45fd836a7e442e877ff247f2118R105

I'd prefer to see a clarification on what's exactly getting the update eg. resource representation or a particular resource described that's in a representation. Right off the bat, "last modification" sounds to me like best handled with the Last-Modified header. So, let's be careful to not dump everything on resource metadata by default.

I think it should work in the same predictable way for any resource, to provide human readable labels for non-rdf sources (eg images or videos), I think Resource Description (solid/data-interoperability-panel#32) provides clear location for label.

That would be fine as that's the only real option, however it doesn't hold for an RDF bearing representation that is capable of self-describing eg. a container. As mentioned before, a human-readable "label" is not "metadata" in that case.

I think we could discuss in solid/data-interoperability-panel#32 cases where server would want to optimize responses by including in representation of one resource some information from another one (possibly one of more of its metadata resources).

That's precisely what this issue is about... to potentially have a distinct resource encapsulating the required information. However, it doesn't entail the information in that particular resource metadata type should or can appear elsewhere - in all likelihood, it shouldn't. This is in fact how we managed to move away from dumping RDF in container's "metadata" into a resource but having it also appear in container. If they are deemed to be distinct resources and equally updateable, they can be self-describing.


I am reluctant to frame it under the label of embedded metadata, even more general as metadata at all

I generally agree with that view. I think the "resource metadata" is stretching the purpose of metadata or at the very least giving the wrong impression - at risk of being a misnomer. Perhaps the data interop panel can consider a different term.


For this feature, I think it makes sense to have use cases on the table before going further.

👍 💯

Literally why this issue exists in the first place! #116 (comment) , #116 (comment) ;)


it would be good to think about a general pattern for this.

We have many ways of expressing the relationship between a primary resource to a secondary or auxiliary resource. I would suggest that the data interop panel should (continue to) investigate this further out and report back. For the time being, the spec (or at least the next drafts) should focus on concrete practices. We can always generalise patterns but need a representative set to get there (as opposed to going from a sample of one).

@kjetilk
Copy link
Member Author

kjetilk commented Jun 24, 2021

I've been rethinking this after getting a bit of distance. I now feel we are complicating things unnecessarily here. I don't think this data augmentation should be an auxiliary resource, it should be a normal resource like any other, that another resources recommends to the client that they derefence.

To issue that recommendation, we could simply use the rdfs:seeAlso predicate, which has meant "make sure you get this resource" for two decades now. Or, we could make a subproperty of it that carries even stronger semantics.

Clients could then write their data into that resource, they may decide to duplicate statements from other resources, or servers may decide to do that for them.

For the container case, a client or server may include the rdfs:seeAlso relation in the container. That'd be all. A single predicate should be all that's needed for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants