Duplicate results in alphabetical listing #275

tfrancart · 2015-08-19T10:09:28Z

(using code from master branch)

I test SKOSMOS with a multilingual vocabulary, specifically the Unesco thesaurus available in SKOS at http://skos.um.es/unescothes/. I use Jena Fuseki (1.3.0) with JenaText dialect.
Concepts appear duplicated in the alphabetical listing on the left, as shown below :

Looking at the underlying SPARQL query, I see it looks like this :

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX text: <http://jena.apache.org/text#>
SELECT ?s ?label ?alabel WHERE {   
    GRAPH <http://graph> {     
        {       
            { 
                ?s text:query (skos:prefLabel 'C*' 100000) 
            }
            ?s skos:prefLabel ?label .
            FILTER ( 
                strstarts(lcase(str(?label)), 'c')
                &&
                langMatches(lang(?label), 'en')
            )
        }
        UNION
        {
            { 
                ?s text:query (skos:altLabel 'C*' 100000) 
            }
            {
                ?s skos:altLabel ?alabel .
                FILTER (
                    strstarts(lcase(str(?alabel)), 'c')
                    &&
                    langMatches(lang(?alabel), 'en')
                )
            }
            {
                ?s skos:prefLabel ?label .
                FILTER (
                    langMatches(lang(?label), 'en')
                )
            }
        }
        ?s a ?type .
        FILTER NOT EXISTS {
            ?s owl:deprecated true
        }
    }
    VALUES (?type) { (<http://www.w3.org/2004/02/skos/core#Concept>) }
}
ORDER BY LCASE(IF(BOUND(?alabel), STR(?alabel), STR(?label)))

It returns duplicate rows, and probably should have a DISCTINCT keyword added.

The text was updated successfully, but these errors were encountered:

tfrancart · 2015-08-19T11:44:23Z

The problem is really tied to the Jena text search, because if using SPARQL generic queries, the alphabetical listing is displayed correctly without duplicates.

osma · 2015-08-20T07:23:08Z

Thank you for reporting this and also testing without jena-text.

My guess is that this is due to changes introduced in the jena-text module of Jena 3.0.0 / Fuseki 1.3.0 / Fuseki 2.3.0. The text index can now optionally report more information (scores and original literals), but some low-level suppression of duplicates had to be skipped due to the richer information - I happen to know because I made those changes in jena-text.

We are currently still running Fuseki 1.1.x so hadn't yet noticed the problem. Will have to test with newer Fuseki. As you say, fixing this might be as easy as adding a DISTINCT clause to the query as a simple fix. You can of course try using a slightly older Fuseki as well, at least it's better than not using a text index at all.

To actually make use of the new features in jena-text 3.0.0 is described in #273, but it will not yet be implemented in the Skosmos 1.2 development cycle.

osma · 2015-08-20T10:47:30Z

I was able to reproduce this with Fuseki 1.3.0. DISTINCT seems to correct the problem.

osma added the bug label Aug 20, 2015

osma added this to the 1.2 milestone Aug 20, 2015

osma self-assigned this Aug 20, 2015

osma closed this as completed in d9fff00 Aug 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate results in alphabetical listing #275

Duplicate results in alphabetical listing #275

tfrancart commented Aug 19, 2015

tfrancart commented Aug 19, 2015

osma commented Aug 20, 2015

osma commented Aug 20, 2015

Duplicate results in alphabetical listing #275

Duplicate results in alphabetical listing #275

Comments

tfrancart commented Aug 19, 2015

tfrancart commented Aug 19, 2015

osma commented Aug 20, 2015

osma commented Aug 20, 2015