Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate results in alphabetical listing #275

Closed
tfrancart opened this issue Aug 19, 2015 · 3 comments
Closed

Duplicate results in alphabetical listing #275

tfrancart opened this issue Aug 19, 2015 · 3 comments
Assignees
Labels
Milestone

Comments

@tfrancart
Copy link
Contributor

(using code from master branch)

I test SKOSMOS with a multilingual vocabulary, specifically the Unesco thesaurus available in SKOS at http://skos.um.es/unescothes/. I use Jena Fuseki (1.3.0) with JenaText dialect.
Concepts appear duplicated in the alphabetical listing on the left, as shown below :

screenshot-skosmos-alphabetical-duplicated

Looking at the underlying SPARQL query, I see it looks like this :

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX text: <http://jena.apache.org/text#>
SELECT ?s ?label ?alabel WHERE {   
    GRAPH <http://graph> {     
        {       
            { 
                ?s text:query (skos:prefLabel 'C*' 100000) 
            }
            ?s skos:prefLabel ?label .
            FILTER ( 
                strstarts(lcase(str(?label)), 'c')
                &&
                langMatches(lang(?label), 'en')
            )
        }
        UNION
        {
            { 
                ?s text:query (skos:altLabel 'C*' 100000) 
            }
            {
                ?s skos:altLabel ?alabel .
                FILTER (
                    strstarts(lcase(str(?alabel)), 'c')
                    &&
                    langMatches(lang(?alabel), 'en')
                )
            }
            {
                ?s skos:prefLabel ?label .
                FILTER (
                    langMatches(lang(?label), 'en')
                )
            }
        }
        ?s a ?type .
        FILTER NOT EXISTS {
            ?s owl:deprecated true
        }
    }
    VALUES (?type) { (<http://www.w3.org/2004/02/skos/core#Concept>) }
}
ORDER BY LCASE(IF(BOUND(?alabel), STR(?alabel), STR(?label)))

It returns duplicate rows, and probably should have a DISCTINCT keyword added.

@tfrancart
Copy link
Contributor Author

The problem is really tied to the Jena text search, because if using SPARQL generic queries, the alphabetical listing is displayed correctly without duplicates.

@osma osma added the bug label Aug 20, 2015
@osma osma added this to the 1.2 milestone Aug 20, 2015
@osma osma self-assigned this Aug 20, 2015
@osma
Copy link
Member

osma commented Aug 20, 2015

Thank you for reporting this and also testing without jena-text.

My guess is that this is due to changes introduced in the jena-text module of Jena 3.0.0 / Fuseki 1.3.0 / Fuseki 2.3.0. The text index can now optionally report more information (scores and original literals), but some low-level suppression of duplicates had to be skipped due to the richer information - I happen to know because I made those changes in jena-text.

We are currently still running Fuseki 1.1.x so hadn't yet noticed the problem. Will have to test with newer Fuseki. As you say, fixing this might be as easy as adding a DISTINCT clause to the query as a simple fix. You can of course try using a slightly older Fuseki as well, at least it's better than not using a text index at all.

To actually make use of the new features in jena-text 3.0.0 is described in #273, but it will not yet be implemented in the Skosmos 1.2 development cycle.

@osma
Copy link
Member

osma commented Aug 20, 2015

I was able to reproduce this with Fuseki 1.3.0. DISTINCT seems to correct the problem.

@osma osma closed this as completed in d9fff00 Aug 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants