Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trig serialisation handling prefixes incorrectly #428

Closed
dschallis opened this issue Oct 15, 2014 · 8 comments
Closed

Trig serialisation handling prefixes incorrectly #428

dschallis opened this issue Oct 15, 2014 · 8 comments
Labels
bug Something isn't working serialization Related to serialization.
Milestone

Comments

@dschallis
Copy link

In version 4.1.2 (under python 2.7), running the Trig test detailed at #317 produces invalid Trig output.

Running the code:

from rdflib import ConjunctiveGraph

data = """
@prefix ns1: <http://chartex.org/chartex-schema#> .
<http://chartex.org/graphid/document1> = {
    ns1:Person_A a ns1:Person ;
        ns1:TextSpan "Simon" .
    ns1:Person_B a ns1:Person ;
        ns1:TextSpan "Walter" .
}
<http://chartex.org/graphid/document2> = {
    ns1:Person_C a ns1:Person ;
        ns1:TextSpan "Agnes" .
    ns1:Person_D a ns1:Person ;
        ns1:TextSpan "Emma" .
}
<http://chartex.org/graphid/Person_Atenant_ofPerson_B> = {
    ns1:Person_A ns1:tenant_of ns1:Person_B .
}
<http://chartex.org/graphid/Person_Ctenant_ofPerson_D> = {
    ns1:Person_C ns1:tenant_of ns1:Person_D .
}
<http://chartex.org/graphid/ConfidenceMetrics> = {
    <http://chartex.org/graphid/Person_Atenant_ofPerson_B> ns1:confidence 0.9265 .
    <http://chartex.org/graphid/Person_Ctenant_ofPerson_D> ns1:confidence 0.8765 .
}
"""

cg = ConjunctiveGraph()
cg.parse(data=data, format='trig')
print cg.serialize(format='trig')

produces the invalid Trig output:

@prefix ns1: <http://chartex.org/chartex-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<ns2:Person_Atenant_ofPerson_B> = {
    ns1:Person_A ns1:tenant_of ns1:Person_B .
}

<ns2:ConfidenceMetrics> = {
    ns2:Person_Atenant_ofPerson_B ns1:confidence 0.9265 .

    ns2:Person_Ctenant_ofPerson_D ns1:confidence 0.8765 .
}

<ns2:Person_Ctenant_ofPerson_D> = {
    ns1:Person_C ns1:tenant_of ns1:Person_D .
}

<ns2:document2> = {
    ns1:Person_C a ns1:Person ;
        ns1:TextSpan "Agnes" .

    ns1:Person_D a ns1:Person ;
        ns1:TextSpan "Emma" .
}

<ns2:document1> = {
    ns1:Person_A a ns1:Person ;
        ns1:TextSpan "Simon" .

    ns1:Person_B a ns1:Person ;
        ns1:TextSpan "Walter" .
}

The Trig output contains the prefix "ns2" for graphs, but this prefix isn't defined anywhere. In addition, the prefixed graph name shouldn't be enclosed within angle brackets, e.g. the correct output for each named graph should be either:

@prefix ns2: <http://chartex.org/graphid/> .

ns2:document1 = {
    ...
}

Or:

<http://chartex.org/graphid/document1> = {
    ...
}
@jjon
Copy link

jjon commented Oct 29, 2014

hi @dschallis,

Inasmuch as that test was from my own data, I got curious. It seems like back then I didn't notice the problem you found. I tried it again with my development clone and got the same effect you did. But then things got weird and I don't have sufficient experience to interpret what I'm seeing.

With an updated version of 4.2.dev (and Python 2.7.1), I did exactly the experiment you did and got the same result, the list of namespaces was this:

>>> print cg.serialize(format='trig')
@prefix ns1: <http://chartex.org/chartex-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

so I tried this:

>>> list(cg.namespaces())
[('xml', rdflib.term.URIRef(u'http://www.w3.org/XML/1998/namespace')), ('rdfs', rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#')), ('rdf', rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#')), ('xsd', rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#')), (u'ns1', rdflib.term.URIRef(u'http://chartex.org/chartex-schema#')), ('ns2', rdflib.term.URIRef(u'http://chartex.org/graphid/'))]

and thought, "that's odd, there's ns2 right there, and it is defined properly." Baffled, I did this again:

print cg.serialize(format='trig')

and got this:

>>> print cg.serialize(format='trig')
@prefix ns1: <http://chartex.org/chartex-schema#> .
@prefix ns2: <http://chartex.org/graphid/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

I'm able to reproduce this effect reliably, but I have no idea what might cause this behavior. It appears that cg.namespaces() lacks ('ns2', rdflib.term.URIRef(u'http://chartex.org/graphid/')) until the graph is first serialized as trig, only then does cg.namespaces() contain the generated namespace.

I hope someone with more experience than I will be able to explain why this happens.

@dschallis
Copy link
Author

Hi @jjon,
I've found the cause and have been working on a fix, hopefully I'll get the chance to write some tests and send a pull request sometime today/tomorrow.

@satra
Copy link
Contributor

satra commented Nov 16, 2014

@dschallis - are there any updates on this? we have run into this with trying to serialize/deserialize the w3c prov model as trig.

@dschallis
Copy link
Author

@satra unfortunately not, I looked into the issue, and found some deeper problems regarding rdflib's trig handling (#432 and #433), which make it hard to use/test the fix I'd originally intended. I'm still planning to look into these when I get a chance, but I'll need to spend some time understanding more about some of rdflib's internals first.

@RinkeHoekstra
Copy link
Contributor

RinkeHoekstra commented Apr 17, 2015

I just stumbled upon the same problem, and the problem occurs when the TurtleSerializer serializes a IRI that does not belong to a known namespace. This namespace is then created based on some guesswork, but because the TrigSerializer has by that point already serialized the namespace declaration (the turtle serializer is essentially called for every graph in the store), the newly added namespace is not picked up properly.

For now my workaround is just to serialize the graph twice in succession.

@drewp
Copy link
Contributor

drewp commented Jul 21, 2015

I was running into something like this. These patches may help:
drewp@5d9a7b0
drewp@bd57db7
drewp@f167ecf

I don't remember why I was doing these or why I stopped trying to get them merged back.

@white-gecko
Copy link
Member

white-gecko commented Jun 7, 2016

Regarding the workaround of @RinkeHoekstra still the issue is, that qnames are generated, but printed in <…>

Edit: I've checked this on the current master branch again and patches of @drewp seem to be integrated/merged. Can this issue be closed?

@gromgull gromgull added this to the rdflib 4.2.2 milestone Jan 24, 2017
@gromgull
Copy link
Member

this works fine in master now.

gromgull added a commit that referenced this issue Jan 24, 2017
gromgull added a commit that referenced this issue Jan 24, 2017
gromgull added a commit that referenced this issue Jan 24, 2017
@joernhees joernhees added bug Something isn't working serialization Related to serialization. labels Jan 25, 2017
joernhees added a commit that referenced this issue Jan 25, 2017
* master: (44 commits)
  quote cleanup OCD
  serializer/parser alias for 'ntriples'
  serializer/parser alias for 'ttl'
  cleanup
  remove outdated always skipped test
  a bit of changelog
  add a NTSerializer sub-class for nt11 (#700)
  Restrict normalization to unicode-compatible values (#674)
  fixes for turtle/trig namespace handling
  skip serialising empty default graph
  skip round-trip test, unfixable until 5.0
  prefix test for #428
  Added additional trig unit tests to highlight some currently occurring issues.
  remove ancient and broken 2.3 support code. (#681)
  updating deprecated testing syntax (#697)
  docs: clarify the use of an identifier when persisting a triplestore (#654)
  removing pyparsing version requirement (#696)
  made min/max aggregate functions support all literals (#694)
  actually fix projection from sub-queries
  added dawg tests for #607
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working serialization Related to serialization.
Projects
None yet
Development

No branches or pull requests

8 participants