Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

Closed
niklasl opened this issue Nov 22, 2014 · 5 comments · Fixed by #2406
Closed

ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

niklasl opened this issue Nov 22, 2014 · 5 comments · Fixed by #2406
Labels
bug Something isn't working concept: RDF dataset Relates to the RDF datasets concept. id-as-cntxt tracking related issues parsing Related to a parsing. store Related to a store.

Comments

@niklasl
Copy link
Member

niklasl commented Nov 22, 2014

When ConjunctiveGraph.parse is called, it wraps its underlying store in a regular Graph instance. This causes problems for parsers of datasets, e.g. NQuads, TriG and JSON-LD.

Specifically, the triples in the default graph of a dataset haphazardly end up in bnode-named contexts.

Example:

import sys
from rdflib import *

cg = ConjunctiveGraph()
cg.parse(format="nquads", data=u"""
<http://example.org/a> <http://example.org/ns#label> "A" .
<http://example.org/b> <http://example.org/ns#label> "B" <http://example.org/b/> .
""")
assert len(cg.default_context) == 1 # fails

While I've attempted to overcome this by using the underlying graph.store in these parsers, they cannot access the default_context of ConjunctiveGraph through this store. It is there in the underlying store, but its identifier is inaccessible to the parser without further changes to the parse method of ConjunctiveGraph.

This becomes tricky because the contract for ConjunctiveGraph:s parse method is:

    Parse source adding the resulting triples to its own context
    (sub graph of this graph).

    See :meth:`rdflib.graph.Graph.parse` for documentation on arguments.

    :Returns:

    The graph into which the source was parsed. In the case of n3
    it returns the root context.

I am not sure how we can change this behaviour, since client code may rely on this. We could either add a new method, e.g. parse_dataset, or a flag. That would not be obvious to all users though, and somehow I would like to change the behaviour to handle datasets as well. It is always possible to get/create a named graph from a conjunctive graph and parse data into that.

I have gotten further by adding publicID=cg.default_context.identifier to the parse invocation. This causes the TriG parser to behave properly (and it is easy to adapt the nquads parser to work from there on). But I am not sure if this is a wise solution to the problem.

I'll mull more on this given time, but it would be good to have more people consider a proper revision of the parsing mechanism for datasets.

This underlies the problems described in #432 and #433 (and is related #428).

(Obviously, this in turn causes the serializers for the same formats to emit unexpected bnode-named graphs when data has been read through these parsers.)

@niklasl
Copy link
Member Author

niklasl commented Aug 4, 2016

It might make sense that one should simply parse into the default_context of a ConjunctiveGraph or Dataset, like:

cg = rdflib.ConjunctiveGraph()
cg.default_context.parse(data=data, format='trig')
print cg.serialize(format='trig')

By doing it like this (along with a bunch of fairly recent fixes on RDFLib master), this could be considered good enough. It doesn't seem intuitive though.

Leaving this open in case we want to redesign the parsing of datasets to make this more obvious.

@joernhees
Copy link
Member

joernhees commented Aug 4, 2016

hmm, so maybe the 6.0.0 label was wrong? can this go in 4.2.2 then (so no backwards incompatibility) and just be closed and re-opened if desired?

@niklasl
Copy link
Member Author

niklasl commented Aug 4, 2016

There would be no change by telling users to parse into default_context, that just seems unintuitive.

I'd say leave this open (but for 5.0.0 maybe?) since it is about changing the parsing usage/behaviour when parsing dataset syntaxes (nquads, trig, json-ld and trix). The current wiring of graphs, contexts and underlying stores could really do with such an overhaul.

@nicholascar
Copy link
Member

nicholascar commented Dec 7, 2021

This issue is still a problem in RDFlib 6.0.2. The workaround of publicID=cg.default_context.identifier does work but is indeed unintuitive.

We really do need to be able to say:

cg = Dataset()
cg.parse("some-quads-file.trig")   # RDF file type worked out by guess_format()

... and then have the default_context == whatever the Trig file said the default graph was.

@ghost ghost added the id-as-cntxt tracking related issues label Dec 24, 2021
@aucampia
Copy link
Member

Fix is more or less ready, please have a look:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working concept: RDF dataset Relates to the RDF datasets concept. id-as-cntxt tracking related issues parsing Related to a parsing. store Related to a store.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants