Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdfpipe adds a graph name to triples in the default graph, consequently breaking round tripping #1804

Closed
aucampia opened this issue Apr 11, 2022 · 10 comments · Fixed by #2406
Labels
bug Something isn't working concept: RDF dataset Relates to the RDF datasets concept. core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` critical

Comments

@aucampia
Copy link
Member

I would expect to be able to round-trip quads with rdfpipe, but this does not work because rdfpipe labels quads in the default graph with a graph name and then serializes them with the injected graph name.

~/sw/d/github.com/iafork/rdflib.cleanish
$ cat ./test/w3c/trig/trig-turtle-03.trig 
# Turtle is TriG
prefix : <http://example/> 

[ :p 123 ; :q 456 ] :r 1 .
~/sw/d/github.com/iafork/rdflib.cleanish
$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i trig -o nquads  ./test/w3c/trig/trig-turtle-03.trig
_:n33872095eabf41d58f85dc3a7c672883b1 <http://example/p> "123"^^<http://www.w3.org/2001/XMLSchema#integer> <file:///home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/w3c/trig/trig-turtle-03.trig> .
_:n33872095eabf41d58f85dc3a7c672883b1 <http://example/q> "456"^^<http://www.w3.org/2001/XMLSchema#integer> <file:///home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/w3c/trig/trig-turtle-03.trig> .
_:n33872095eabf41d58f85dc3a7c672883b1 <http://example/r> "1"^^<http://www.w3.org/2001/XMLSchema#integer> <file:///home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/w3c/trig/trig-turtle-03.trig> .
~/sw/d/github.com/iafork/rdflib.cleanish
$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i trig -o nquads  ./test/w3c/trig/trig-turtle-03.trig | pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i nquads -o trig -

@prefix ns1: <file:///home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/w3c/trig/> .
@prefix ns2: <http://example/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ns1:trig-turtle-03.trig {
    [] ns2:p 123 ;
        ns2:q 456 ;
        ns2:r 1 .
}
@aucampia aucampia added the bug Something isn't working label Apr 11, 2022
@ghost
Copy link

ghost commented Apr 14, 2022

The added graph name is a "feature" of RDFLib parsing, if you don't need to record the source of the data, just pipe the data into rdfpipe:

cat ./test/w3c/trig/trig-turtle-03.trig | rdfpipe -i trig -o nquads -
_:n1f504c660679440782137e22f180766eb1 <http://example/p> "123"^^<http://www.w3.org/2001/XMLSchema#integer>  .
_:n1f504c660679440782137e22f180766eb1 <http://example/r> "1"^^<http://www.w3.org/2001/XMLSchema#integer>  .
_:n1f504c660679440782137e22f180766eb1 <http://example/q> "456"^^<http://www.w3.org/2001/XMLSchema#integer>  .

Perhaps this is more of a documentation issue.

@aucampia
Copy link
Member Author

As per discussion on gitter: https://gitter.im/RDFLib/rdflib?at=62586bb3e9cb3c52ae651573

I somewhat understand the idea behind what is happening here, but to me this still seems like it is a bug. The default graph should have no name [ref], and the example has triples outside of a graph statement, which according to TriG spec makes them part of the default graph [ref].

So to me, RDFLib is outputting something which is not the same as the input. It may be useful in some context, but I would say it should not be the default behaviour, or at the very least there should be a way to disable this behaviour, as the primary thing I want to be able to do with rdfpipe is graph preserving conversions.

Not sure what others think about it though.

@ghost
Copy link

ghost commented Apr 15, 2022

I would say it should not be the default behaviour

Perhaps not of rdfpipe but unfortunately, it has been the default behaviour of Graph.parse() (via create_input_source) for over 17 years so it's a long-established feature of the public API. I haven't checked but I believe that if it was not the default behaviour, then issue #130 might raise its ugly head again.

The default graph should have no name

You are correct, the default graph of a Dataset should have no name but, as a minor nitpick, I'm not sure that the RDF1.1 spec is relevant here as the current implementation of rdfpipe uses ConjunctiveGraph which (AIUI) doesn't claim to conform to RDF1.1 and in which the default context has an exposed BNode identifier.

@aucampia
Copy link
Member Author

I'm still not convinced this is not a bug as IMO we should be targeting RDF 1.1 which is 6 years old by now. We can fix these things also without removing ConjunctiveGraph also - they are real pain points with RDFLib to me. I would say one of our highest priorities should be to make sure we are RDF 1.1 compliant before we add any more features or before we look at something like rdf-star - as having rdf-star support on a base that has some serious compliance is not that valuable.

But if we are not targetting RDF 1.1. we should also clarify in our documentation that we are not. I fear then however we are essentially mothballing rdflib as what I want is an RDF 1.1 compatible library in python, and RDF 1.1 compatible tooling.

@aucampia
Copy link
Member Author

aucampia commented Apr 15, 2022

Maybe the right solution is to re-look at how #130 works - however quite independently of this, rdfpipe/rdflib should ideally not be resolving relative URIs IMO.

@ghost
Copy link

ghost commented Apr 15, 2022

Alternatively, given that this change:

diff --git a/rdflib/tools/rdfpipe.py b/rdflib/tools/rdfpipe.py
index 6b53cc8b..1a17b48c 100644
--- a/rdflib/tools/rdfpipe.py
+++ b/rdflib/tools/rdfpipe.py
@@ -49,6 +49,8 @@ def parse_and_serialize(
             fpath = sys.stdin
         elif not input_format and guess:
             use_format = guess_format(fpath) or DEFAULT_INPUT_FORMAT
+        if fpath != sys.stdin:
+            kws["publicID"] = fpath if use_format != "trig" else ""
         dataset.parse(fpath, format=use_format, **kws)
 
     if outfile:

yields

$ rdfpipe -i trig -o nquads ./test/w3c/trig/trig-turtle-03.trig
_:n67a6e43a45f44eb18daf43e9211e3c42b1 <http://example/p> "123"^^<http://www.w3.org/2001/XMLSchema#integer>  .
_:n67a6e43a45f44eb18daf43e9211e3c42b1 <http://example/q> "456"^^<http://www.w3.org/2001/XMLSchema#integer>  .
_:n67a6e43a45f44eb18daf43e9211e3c42b1 <http://example/r> "1"^^<http://www.w3.org/2001/XMLSchema#integer>  .

there may be an opportunity to set publicID to "" by default and expose it as an on/off arg to rdfpipe for when users want to have a specific document location.

fwiw, using the unedited code:

$ cat ./test/w3c/trig/trig-turtle-03.trig | rdfpipe -i trig -o nquads -
_:n87d3a44a5cca4dbabeb1fab34f999ea1b1 <http://example/r> "1"^^<http://www.w3.org/2001/XMLSchema#integer>  .
_:n87d3a44a5cca4dbabeb1fab34f999ea1b1 <http://example/p> "123"^^<http://www.w3.org/2001/XMLSchema#integer>  .
_:n87d3a44a5cca4dbabeb1fab34f999ea1b1 <http://example/q> "456"^^<http://www.w3.org/2001/XMLSchema#integer>  .

@ghost
Copy link

ghost commented Apr 15, 2022

IMO we should be targeting RDF 1.1 which is 6 years old by now. ... I would say one of our highest priorities should be to make sure we are RDF 1.1 compliant
But if we are not targetting RDF 1.1. we should also clarify in our documentation that we are not. I fear then however we are essentially mothballing rdflib as what I want is an RDF 1.1 compatible library in python, and RDF 1.1 compatible tooling.

Uh-huh, I'm not disagreeing in any way but RDFLib 6 is quite close to compliance according to David Wood's What’s New in RDF 1.1. The remaining significant issue is a compliant implementation of Dataset. However, the implications of the W3's bland statement that the “RDF Working Group did not standardize the semantics of RDF datasets” are pretty profound from an implementation perspective. I've posted further on this in the comments to #1814

@aucampia
Copy link
Member Author

Regarding relative uris:

@aucampia aucampia mentioned this issue Apr 17, 2022
4 tasks
@aucampia aucampia added the core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` label Aug 21, 2022
@aucampia aucampia added the concept: RDF dataset Relates to the RDF datasets concept. label May 20, 2023
@aucampia
Copy link
Member Author

I don't think this is related to relative URIs or paths, this really is only related to what happens to triples in the default graph [ref].

In trying to address #2393 this issue came up again, as I'm having trouble detecting if named graphs are present, as the default graph is also named when loading from a file.

@aucampia
Copy link
Member Author

Fix is more or less ready, please have a look:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working concept: RDF dataset Relates to the RDF datasets concept. core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant