Optimize alphabet letter query: use GRAPH instead of FROM #1098

osma · 2020-12-01T13:46:03Z

It turns out that the existing SPARQL query to generate the letters of the alphabet for the alphabetical index is quite slow when using Fuseki TDB2. Switching to GRAPH instead of FROM makes it a lot faster.

See this thread on the Jena users list for some more background.

…seki

osma · 2020-12-01T14:13:26Z

Some benchmarking of the query time for the KANTO/finaf vocabulary, using the dev.finto.fi SPARQL endpoint.

Before this PR: 16 seconds
After this PR: 1.7 seconds

The new query is almost 10x faster.

codecov · 2020-12-01T19:48:36Z

Codecov Report

Merging #1098 (77f4b41) into master (91a30e5) will decrease coverage by 0.01%.
The diff coverage is 57.14%.

@@             Coverage Diff              @@
##             master    #1098      +/-   ##
============================================
- Coverage     60.32%   60.31%   -0.02%     
  Complexity     1592     1592              
============================================
  Files            32       32              
  Lines          4429     4430       +1     
============================================
  Hits           2672     2672              
- Misses         1757     1758       +1

Impacted Files	Coverage Δ	Complexity Δ
model/sparql/GenericSparql.php	`67.98% <57.14%> (-0.06%)`	`318.00 <0.00> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91a30e5...6db0e1e. Read the comment docs.

… on Fuseki

sonarcloud · 2020-12-01T21:12:45Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

kouralex

Looks good and works, which is the most important part. I had trouble with this PR at first since there was no similar treatment for single alphabetical letter query so I added one. Feel free to re-review, git diff went bad since there are similar blocks.

One note worth thinking now that this is faster, I think #729 and #751 could be fixed with a simple skos:prefLabel|skos:altLabel change in the appropriate spot which you have already identified in the comment section of the latter. I leave this for you to decide if you want to add that to this PR given that I presume the ten-fold increase in performance could handle that/those use cases with no problems, now.

Lastly, I checked the code for JenaTextSparql and there was one place in code where the original issue was already fixed in the exact same way it is in this PR, so no further modifications are necessary.

osma · 2020-12-03T14:09:11Z

Looks good and works, which is the most important part. I had trouble with this PR at first since there was no similar treatment for single alphabetical letter query so I added one. Feel free to re-review, git diff went bad since there are similar blocks.

I benchmarked this change using KANTO and the alphabetical index page for the letter A. Before the PR, the query took 48s. After the change it was 12s. So a 4x improvement.

But the change you made is only relevant when using the Generic dialect. In the JenaText version of the alphabetical letter query, GRAPH is already used. I suspect you are using the Generic dialect, which is maybe not a great idea with large vocabularies such as KANTO/finaf.

One note worth thinking now that this is faster, I think #729 and #751 could be fixed with a simple skos:prefLabel|skos:altLabel change in the appropriate spot which you have already identified in the comment section of the latter. I leave this for you to decide if you want to add that to this PR given that I presume the ten-fold increase in performance could handle that/those use cases with no problems, now.

I leave that to another PR which must be separately benchmarked.

Lastly, I checked the code for JenaTextSparql and there was one place in code where the original issue was already fixed in the exact same way it is in this PR, so no further modifications are necessary.

Right. I think there is some room for refactoring now that the Generic and JenaText variants of the query are closer aligned.

Use GRAPH, not FROM in alphabetical letters query - much faster on Fu…

77f4b41

…seki

osma added enhancement performance needs backport labels Dec 1, 2020

osma added this to the 2.9 milestone Dec 1, 2020

osma self-assigned this Dec 1, 2020

osma requested a review from kouralex December 1, 2020 15:04

use GRAPH, not FROM in single alphabetical letter query - much faster…

6db0e1e

… on Fuseki

kouralex reviewed Dec 1, 2020

View reviewed changes

osma merged commit 254c2c6 into master Dec 7, 2020

osma deleted the fix-first-character-query-graph branch December 7, 2020 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize alphabet letter query: use GRAPH instead of FROM #1098

Optimize alphabet letter query: use GRAPH instead of FROM #1098

osma commented Dec 1, 2020

osma commented Dec 1, 2020

codecov bot commented Dec 1, 2020 •

edited

Loading

sonarcloud bot commented Dec 1, 2020

kouralex left a comment

osma commented Dec 3, 2020

Optimize alphabet letter query: use GRAPH instead of FROM #1098

Optimize alphabet letter query: use GRAPH instead of FROM #1098

Conversation

osma commented Dec 1, 2020

osma commented Dec 1, 2020

codecov bot commented Dec 1, 2020 • edited Loading

Codecov Report

sonarcloud bot commented Dec 1, 2020

kouralex left a comment

Choose a reason for hiding this comment

osma commented Dec 3, 2020

codecov bot commented Dec 1, 2020 •

edited

Loading