ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

niklasl · 2014-11-22T20:38:58Z

When ConjunctiveGraph.parse is called, it wraps its underlying store in a regular Graph instance. This causes problems for parsers of datasets, e.g. NQuads, TriG and JSON-LD.

Specifically, the triples in the default graph of a dataset haphazardly end up in bnode-named contexts.

Example:

import sys
from rdflib import *

cg = ConjunctiveGraph()
cg.parse(format="nquads", data=u"""
<http://example.org/a> <http://example.org/ns#label> "A" .
<http://example.org/b> <http://example.org/ns#label> "B" <http://example.org/b/> .
""")
assert len(cg.default_context) == 1 # fails

While I've attempted to overcome this by using the underlying graph.store in these parsers, they cannot access the default_context of ConjunctiveGraph through this store. It is there in the underlying store, but its identifier is inaccessible to the parser without further changes to the parse method of ConjunctiveGraph.

This becomes tricky because the contract for ConjunctiveGraph:s parse method is:

    Parse source adding the resulting triples to its own context
    (sub graph of this graph).

    See :meth:`rdflib.graph.Graph.parse` for documentation on arguments.

    :Returns:

    The graph into which the source was parsed. In the case of n3
    it returns the root context.

I am not sure how we can change this behaviour, since client code may rely on this. We could either add a new method, e.g. parse_dataset, or a flag. That would not be obvious to all users though, and somehow I would like to change the behaviour to handle datasets as well. It is always possible to get/create a named graph from a conjunctive graph and parse data into that.

I have gotten further by adding publicID=cg.default_context.identifier to the parse invocation. This causes the TriG parser to behave properly (and it is easy to adapt the nquads parser to work from there on). But I am not sure if this is a wise solution to the problem.

I'll mull more on this given time, but it would be good to have more people consider a proper revision of the parsing mechanism for datasets.

This underlies the problems described in #432 and #433 (and is related #428).

(Obviously, this in turn causes the serializers for the same formats to emit unexpected bnode-named graphs when data has been read through these parsers.)

The text was updated successfully, but these errors were encountered:

niklasl · 2016-08-04T12:59:27Z

It might make sense that one should simply parse into the default_context of a ConjunctiveGraph or Dataset, like:

cg = rdflib.ConjunctiveGraph()
cg.default_context.parse(data=data, format='trig')
print cg.serialize(format='trig')

By doing it like this (along with a bunch of fairly recent fixes on RDFLib master), this could be considered good enough. It doesn't seem intuitive though.

Leaving this open in case we want to redesign the parsing of datasets to make this more obvious.

joernhees · 2016-08-04T14:39:07Z

hmm, so maybe the 6.0.0 label was wrong? can this go in 4.2.2 then (so no backwards incompatibility) and just be closed and re-opened if desired?

niklasl · 2016-08-04T15:09:58Z

There would be no change by telling users to parse into default_context, that just seems unintuitive.

I'd say leave this open (but for 5.0.0 maybe?) since it is about changing the parsing usage/behaviour when parsing dataset syntaxes (nquads, trig, json-ld and trix). The current wiring of graphs, contexts and underlying stores could really do with such an overhaul.

nicholascar · 2021-12-07T00:39:33Z

This issue is still a problem in RDFlib 6.0.2. The workaround of publicID=cg.default_context.identifier does work but is indeed unintuitive.

We really do need to be able to say:

cg = Dataset()
cg.parse("some-quads-file.trig")   # RDF file type worked out by guess_format()

... and then have the default_context == whatever the Trig file said the default graph was.

aucampia · 2023-05-24T21:41:33Z

Fix is more or less ready, please have a look:

BREAKING CHANGE: Don't use publicID as the name for the default graph. #2406

niklasl mentioned this issue Apr 25, 2015

RDF 1.1 tests & support #450

Closed

7 tasks

niklasl mentioned this issue Jul 15, 2015

jsonld-0.3 breaks rdflib tests RDFLib/rdflib-jsonld#30

Open

joernhees added bug Something isn't working parsing Related to a parsing. store Related to a store. labels Jul 15, 2015

joernhees added this to the rdflib 5.0.0 milestone Jul 15, 2015

niklasl mentioned this issue Oct 29, 2015

nquads parser creates one context per triple in the default graph #535

Closed

joernhees modified the milestones: rdflib 5.0.0, rdflib 6.0.0 Jan 28, 2016

This was referenced Aug 4, 2016

Trig parser can creating multiple contexts for the default graph #432

Closed

Trig serialiser writing empty named graph name for default graph #433

Closed

joernhees modified the milestones: rdflib 5.0.0, rdflib 6.0.0 Aug 4, 2016

niklasl mentioned this issue Sep 1, 2016

default graph is not handled correctly RDFLib/rdflib-jsonld#34

Closed

gromgull mentioned this issue Jan 24, 2017

Added trig unit tests to highlight some current parsing/serializing issues #431

Closed

joernhees mentioned this issue Jan 26, 2017

Revert "skip round-trip test, unfixable until 5.0" #702

Closed

This was referenced Jan 2, 2018

not loading jsonld file RDFLib/rdflib-jsonld#53

Open

jsonld file not loading #799

Closed

danbri mentioned this issue Mar 27, 2018

Triples not loaded when using @context and @graph RDFLib/rdflib-jsonld#40

Open

white-gecko modified the milestones: rdflib 5.0.0, rdflib 5.1.0 Apr 6, 2020

white-gecko modified the milestones: rdflib 5.1.0, rdflib 6.0.0 May 1, 2020

ghost added the id-as-cntxt tracking related issues label Dec 24, 2021

ghost mentioned this issue Jan 6, 2022

Move Store API to work with identifiers, not graphs #1646

Closed

white-gecko modified the milestones: rdflib 6.x.x, 2022 June release Jun 20, 2022

aucampia added the concept: RDF dataset Relates to the RDF datasets concept. label May 20, 2023

This was referenced May 22, 2023

Triples from the default graph in RDF documents do not go into the default_context of Dataset or ConjunctiveGraph #2404

Closed

BREAKING CHANGE: Don't use publicID as the name for the default graph. #2406

Merged

aucampia closed this as completed in 4b96e9d Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

niklasl commented Nov 22, 2014

niklasl commented Aug 4, 2016

joernhees commented Aug 4, 2016 •

edited

Loading

niklasl commented Aug 4, 2016

nicholascar commented Dec 7, 2021 •

edited

Loading

aucampia commented May 24, 2023

ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436

Comments

niklasl commented Nov 22, 2014

niklasl commented Aug 4, 2016

joernhees commented Aug 4, 2016 • edited Loading

niklasl commented Aug 4, 2016

nicholascar commented Dec 7, 2021 • edited Loading

aucampia commented May 24, 2023

joernhees commented Aug 4, 2016 •

edited

Loading

nicholascar commented Dec 7, 2021 •

edited

Loading