GitHub - dkw-aau/sparql-optimization

Optimizing SPARQL Queries using Shape Statistics

Cardinality estimates are essential for finding a good join order to improve query performance. In order to access the impact of having shapes statistics of RDF graphs on cardinality estimation, we have performed these experiments. We have generated global and shapes statistics and proposed a join ordering technique to make use of these statistics and estimate cardinalities to propose efficient query plans. We used synthetic (LUBM, WATDIV) and a real dataset (i.e., YAGO-4). We compared against the query plans proposed by Jena ARQ query engine, GraphDB, Characteristics Sets, and SumRDF approach. At this page we present technical details of our experiments such as how to generate these statistics, how to run the experiments, the links to the datasets, and finally the results.

Persistent URI & Licence:

All of the data and results presented in our experimental study are available at https://github.com/Kashif-Rabbani/sparql-optimization/ under Apache License 2.0 .

Datasets, Queries and the Statistics used:

We used the following datasets, queries, and the statistics:

Dataset	RDF Dump	Queries	Stats
LUBM	Download	See LUBM Queries	Global and Shapes Statistics
YAGO-4	Download	See YAGO-4 Queries	Global and Shapes Statistics
WATDIV-100M	Download	See WATDIV Queries	Global and Shapes Statistics
WATDIV-1Billion	Download	See WATDIV Queries	Global and Shapes Statistics

How does it work?

1. Generating SHACL Shapes Graph:

  Given an RDF graph, we used shaclgen https://pypi.org/project/shaclgen/ library to generate its SHACL shapes graph.

2. Generating Shapes Statistics:

  We use Shapes Annotator component to extend SHACL shapes graph with the statistics of the RDF graph. E.g., for YAGO-4 dataset, we use the https://github.com/Kashif-Rabbani/sparql-optimization/blob/main/code/yagoConfig.properties file by setting the generateStatistics=true.

3. Running Experiments:

We loaded all datasets in Jena TDB, bundled the code in a Jar and created a config file to run each type of experiment. For example we used the following pattern fo run experiments using:

1. Shapes Statistics

> Set the appropriate paths for the Jena TDB and the directory containing queries in the config files, e.g., for YAGO-4 dataset https://github.com/Kashif-Rabbani/sparql-optimization/blob/main/code/yagoConfig.properties
> Set the value fo shapeExec=true , set the number of times the query should run.
> Use java -jar code.jar yagoConfig.properties YAGO  &> output.log
> Logs will be saved in OUTPUT_QUERY directory as benchmarks.csv and also in output.log file. 
> Use these logs to plot the results.

2. Global Statistics

> Follow the same steps as mentioned above for Shapes Statistics, except set the value shapeExec=false and globalStatsExec=true.

3. Jena

> Follow the same steps as mentioned above except set the value shapeExec and globalStatsExec as false and jenaExec=true.

4. GraphDB

> We loaded each dataset in GraphDB and used 'onto:explain' feature explained https://graphdb.ontotext.com/documentation/standard/explain-plan.html to see the plans and their cardinalities.

5. Characteristics Sets

> We used the extended characteristics sets implementation from https://github.com/gmontoya/federatedOptimizer to generate characteristics Sets for each dataset and then gnerated their query plans.

6. SumRDF Cardinality Estimator (official link)

> We implemented our join ordering algorithm using SumRDF cardinality estimator. The code is available in the folder https://github.com/Kashif-Rabbani/sparql-optimization/tree/main/sumRDF

Evaluation Results:

Discussed in the paper and available in folder results_data

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
globalAndShapesStats		globalAndShapesStats
queries		queries
results_data		results_data
sumRDF		sumRDF
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
code.jar		code.jar
ext.pdf		ext.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing SPARQL Queries using Shape Statistics

Persistent URI & Licence:

Datasets, Queries and the Statistics used:

How does it work?

1. Generating SHACL Shapes Graph:

2. Generating Shapes Statistics:

3. Running Experiments:

1. Shapes Statistics

2. Global Statistics

3. Jena

4. GraphDB

5. Characteristics Sets

6. SumRDF Cardinality Estimator (official link)

Evaluation Results:

About

Releases

Packages

Languages

License

dkw-aau/sparql-optimization

Folders and files

Latest commit

History

Repository files navigation

Optimizing SPARQL Queries using Shape Statistics

Persistent URI & Licence:

Datasets, Queries and the Statistics used:

How does it work?

1. Generating SHACL Shapes Graph:

2. Generating Shapes Statistics:

3. Running Experiments:

1. Shapes Statistics

2. Global Statistics

3. Jena

4. GraphDB

5. Characteristics Sets

6. SumRDF Cardinality Estimator (official link)

Evaluation Results:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages