Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour in Filter Query #532

Closed
ldibanyez opened this issue Oct 21, 2015 · 4 comments
Closed

Unexpected behaviour in Filter Query #532

ldibanyez opened this issue Oct 21, 2015 · 4 comments
Assignees
Labels
bug Something isn't working fix-in-progress SPARQL
Milestone

Comments

@ldibanyez
Copy link

Hi,
rdflib 4.2.1 on Ubuntu 14.04 LTS, python 2.7.10

I'm running the following query on this data in MEPs.n3:

@base <http://purl.org/linkedpolitics/MembersOfParliament_background> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix lpv: <vocabulary/> .
@prefix NationalParty: <NationalParty/> .

<EUmember_1026>
    a lpv:MemberOfParliament ;
    lpv:MEP_ID "1026" ;
    lpv:countryOfRepresentation <EUCountry_FR> ;
    lpv:dateOfBirth "1946-07-13"^^xsd:date ;
    lpv:politicalFunction [
        a lpv:PoliticalFunction ;
        lpv:beginning "1989-10-13"^^xsd:date ;
        lpv:end "1992-01-14"^^xsd:date ;
        lpv:institution NationalParty:sans_etiquette
    ] , [
        a lpv:PoliticalFunction ;
        lpv:beginning "2005-02-24"^^xsd:date ;
        lpv:end "2007-01-30"^^xsd:date ;
        lpv:institution NationalParty:Union_pour_la_democratie_francaise
    ] ;
    foaf:familyName "Bourlanges" ;
    foaf:givenName "Jean-Louis" .

query:

from rdflib import Graph
g = Graph()
g.load("MEPs.n3", format='n3')

getnewMeps ="""
PREFIX lpv: <http://purl.org/linkedpolitics/vocabulary/> 
prefix foaf: <http://xmlns.com/foaf/0.1/> 
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 

SELECT DISTINCT ?name ?lastname WHERE {
  ?member a lpv:MemberOfParliament .
  ?member foaf:givenName ?name .
  ?member foaf:familyName ?lastname .
  ?member lpv:politicalFunction ?function .
  ?function lpv:beginning ?date .

  FILTER (?date >= "2004-06-20"^^xsd:date)
} 
"""

result = g.query(getnewMeps)
for bind in result:
    # is empty!! 
    print bind

The dataset contains only one entity, and I expect it to be returned in the result, but that is not the case. If I remove the FILTER clause, the result is not empty, as expected.

The query works in other libraries (Corese, Jena, a virtuoso endpoint)

@joernhees joernhees added bug Something isn't working SPARQL labels Oct 23, 2015
@joernhees
Copy link
Member

(i took the liberty to make the report self-contained, reformat and fix some minor bugs in it.)

i can reproduce this problem... the same query without FILTER clause or with != instead of >= works and produces [(rdflib.term.Literal(u'Jean-Louis'), rdflib.term.Literal(u'Bourlanges'))]

as the parsed query contains a Literal('2004-06-20') (without any datatype), i guess the error lies in the parser not correctly parsing literals with xsd:date datatypes...

Here's the what i think relevant part of the parsed query:

RelationalExpression_{'_vars': set([rdflib.term.Variable(u'date')]), 'expr': rdflib.term.Variable(u'date'), 'other': rdflib.term.Literal(u'2004-06-20'), 'op': '>='}

Digging into the parsing and finding XSD.date missing from XSD_DTs in https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/datatypes.py#L7 i tried adding it and voila, that fixes this issue...

@gromgull i'm not entirely sure that what i'm doing there is the right thing, as i guess it should be possible to in general parse Literals with unknown datatype IRIs, which made it a bit surprising to me that this worked... any idea what's going on?

@gromgull
Copy link
Member

I cannot replicate the missing DT, if I do:

rdflib.plugins.sparql.prepareQuery('prefix xsd:
http://www.w3.org/2001/XMLSchema# select * where { ?a ?b ?c FILTER (
?a > "123"^^xsd:date ) }').algebra.p.p

I get:

Filter_{' [... blah ... ] 'other': rdflib.term.Literal(u'123',
datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#date')),
'op': '>'}}

The problem is here:

https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/operators.py#L795

XSD.date is not in the list, so this triggers, and the filter
expression raises an error, this is the same as "false".

Out of the basic date time types (http://www.w3.org/TR/xmlschema-2/#isoformats)

duration, dateTime, time, date, gYearMonth, gMonthDay, gDay, gMonth and gYear

only datetime is present.

Digging a bit deeper, this is because only these are required by the
SPARQL spec:

http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#operandDataTypes

And therefore the tests also only use these.

Both date and time are correctly handled by rdflib Literals, I see no
reason not to add them to the SPARQL list.

duration could probably be mapped to datetime.timedelta, but that is
more work, the others I don't know enough about.

Do all tests pass if you add date/time? :)

  • Gunnar

On 23 October 2015 at 19:10, Jörn Hees notifications@github.com wrote:

(i took the liberty to make the report self-contained, reformat and fix some
minor bugs in it.)

i can reproduce this problem... the same query without FILTER clause or with
!= instead of >= works and produces [(rdflib.term.Literal(u'Jean-Louis'),
rdflib.term.Literal(u'Bourlanges'))]

as the parsed query contains a Literal('2004-06-20') (without any datatype),
i guess the error lies in the parser not correctly parsing literals with
xsd:date datatypes...

Here's the what i think relevant part of the parsed query:

RelationalExpression_{'_vars': set([rdflib.term.Variable(u'date')]), 'expr':
rdflib.term.Variable(u'date'), 'other': rdflib.term.Literal(u'2004-06-20'),
'op': '>='}

Digging into the parsing and finding XSD.date missing from XSD_DTs in
https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/datatypes.py#L7
i tried adding it and voila, that fixes this issue...

@gromgull i'm not entirely sure that what i'm doing there is the right
thing, as i guess it should be possible to in general parse Literals with
unknown datatype IRIs, which made it a bit surprising to me that this
worked... any idea what's going on?


Reply to this email directly or view it on GitHub.

http://gromgull.net

@joernhees joernhees added this to the rdflib 4.2.2 milestone Oct 24, 2015
@joernhees joernhees self-assigned this Oct 24, 2015
@joernhees
Copy link
Member

thanks, seems the debugger tricked me by pretty printing unicode subtypes without properly using their __repr__ :(

the PR in #533 adds XSD.date to the list of types and works.

I'm still not sure if for all other types the comparison result should be a silent False as it currently is... an alternative would be falling back to syntactic string comparison of the literals, which would've worked in this case... but maybe that's equally confusing. Maybe we should at least raise a warning or even fail when we can't compare the types and do something unexpected instead?

joernhees added a commit that referenced this issue Oct 24, 2015
SPARQL can now compare xsd:date type as well, fixes #532
@ldibanyez
Copy link
Author

Thanks for this. I applied the patch and it works.

IMHO, It should raise an error, I think xsd:date is the only one that works when casted to String. I should then be able to implement the comparison I need through a custom eval.

Perhaps is also worthy to update the docs to state clearly that you support querying only on the specified SPARQL 1.1 types plus xsd:date. I incorrectly believed that I could query on all the xsd types supported by Literal (https://rdflib.readthedocs.org/en/stable/rdf_terms.html)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fix-in-progress SPARQL
Projects
None yet
Development

No branches or pull requests

3 participants