diff --git a/README.md b/README.md index 87283b4..2c3f77e 100644 --- a/README.md +++ b/README.md @@ -1,247 +1,171 @@ -# rdf-validator +# RDF Validator -Runner for RDF test cases +A Simple runner for RDF test cases & validations. -## Installation +RDF Validator runs a collection of test cases against a SPARQL endpoint. The endpoint can be either a HTTP(s) SPARQL +endpoint or a file or directory of RDF files on disk. Test cases can be specified as either a SPARQL query file containing either +an `ASK` or a `SELECT` query, or a suite of such files with a suite manifest. -Install [leiningen](https://leiningen.org/) and then run +Main features: -``` -lein uberjar -``` - -This will build a standalone jar in the `target/uberjar` directory. - -## Usage - -`rdf-validator` runs a collection of test cases against a SPARQL endpoint. The endpoint can be either a HTTP(s) SPARQL -endpoint or a file or directory on disk. Test cases can be specified as either a SPARQL query file, or a directory -of such files. - -The repository contains versions of the well-formed cube validation queries defined in the [RDF data cube specification](https://www.w3.org/TR/vocab-data-cube/#wf). -These are defined as SPARQL SELECT queries rather than the ASK queries defined in the specification to enable more detailed error reporting. +- 👍 SPARQL `SELECT` or `ASK` queries as validations +- 👌🏾 Package suites as git dependencies with a simple manifest format +- 🏃 Run 3rd party validations as dependencies via git or maven dependencies (thanks to the Clojure CLI tools) +- 🏃🏾 Run validations against SPARQL endpoints or files of RDF +- 🚴 Optionally dynamically generate queries with handlebars-like [selmer](https://github.com/yogthos/Selmer) templates -To run these tests against a local SPARQL endpoint: +## Quick Start - $ java -jar rdf-validator-standalone.jar --endpoint http://localhost/sparql/query --suite ./queries - -This will run all test cases in the queries directory against the endpoint. Test cases can be run individually: +The recomended way to install and run the RDF Validator as an application is via the Clojure command line tools. - $ java -jar rdf-validator-standalone.jar --endpoint http://localhost/sparql/query --suite ./queries/01_SELECT_Observation_Has_At_Least_1_Dataset.sparql - -SPARQL endpoints can also be loaded from a file containing serialised RDF triples: +The advantage to this method is that it provides an advanced way to include suites of validations as git deps, that +will be automatically fetched and installed on first usage, and cached thereafter. This means you can use Clojure's +`deps.edn` file to fetch suites of validations from 3rd parties easily. - $ java -jar rdf-validator-standalone.jar --endpoint data.ttl --suite ./queries - -Multiple test cases can be specified: - - $ java -jar rdf-validator-standalone.jar --endpoint data.ttl --suite test1.sparql --suite test2.sparql - -The RDF dataset can also be specified: +To do this follow these steps: - $ java -jar rdf-validator-standalone.jar --endpoint data.ttl --graph http://graph1 --graph http://graph2 --suite test1.sparql - -Graphs are added a named graphs and included in the default graph. +First [Install the clojure CLI tools](https://clojure.org/guides/getting_started#_clojure_installer_and_cli_tools). -## Writing test cases +Then specify a `deps.edn` file like this: -Test cases are expressed as either SPARQL ASK or SELECT queries. These queries are run against the target endpoint and the outcome of the test is based on the -result of the query execution. - -### ASK queries - -SPARQL ASK queries are considered to have failed if they evaluate to `true` so should be written to find invalid statements. -This is consistent with the queries defined in the RDF data cube specification. +```clojure +{ + :aliases {:rdf-validator {:extra-deps { swirrl/rdf-validator {:git/url "https://github.com/Swirrl/rdf-validator.git" + :sha "fd848fabc5718f876f99ee4ee5a3f89ea8529571"}} + :main-opts ["-m" "rdf-validator.core"]} + } + } +``` -### SELECT queries +This then lets you run the command line validator like so: -SPARQL SELECT queries are considered to have failed if they return any matching solutions. Like ASK queries they should return bindings describing invalid resources. + $ clojure -A:rdf-validator -### Query variables +You'll then want to configure some validation suites and supply it with the location of some RDF (either via a SPARQL endpoint) or as a file of triples. -Validation queries can be parameterised with query variables which must be provided when the test suite is run. Query variables have the format `{{variable-name}}` -within a query file. For example to validate no statements exist with a specified predicate, the following query could be defined: +### Including a test suite -*bad_predicate.sparql* -```sparql -SELECT ?this WHERE { - ?this <{{bad-predicate}}>> ?o . -} -``` +The easiest way to include a test suite is to include an existing one as a dependency in your `deps.edn`. `deps.edn` supports +[various ways of fetching and resolving dependencies](https://clojure.org/reference/deps_and_cli#_dependencies) and putting them +on the classpath, such as via git deps, maven packaged jars, or just dependencies at a `:local/root`. -when running this test case, the value of `bad-predicate` must be provided. This is done by providing an EDN file containing variable -bindings. The EDN document should contain a map from keywords to the corresponding string values e.g. +To do this we can include a 3rd party suite [such as those found in this repo](https://github.com/Swirrl/pmd-rdf-validations) like this: -*variables.edn* ```clojure -{ :bad-predicate "http://to-be-avoided" - :other-variable "http://other" } + {:deps {;; NOTE each dep here is a validation suite + swirrl/validations.qb {:git/url "git@github.com:Swirrl/pmd-rdf-validations.git" + :sha "63479f200a7c3d1b0e63bc43b2617181644c846b" + :deps/manifest :deps + :deps/root "qb"} + } + :aliases {:rdf-validator {:extra-deps { swirrl/rdf-validator {:git/url "https://github.com/Swirrl/rdf-validator.git" + :sha "fd848fabc5718f876f99ee4ee5a3f89ea8529571"}} + :main-opts ["-m" "rdf-validator.core"]} + } + } ``` -the file of variable bindings is specified when running the test case(s) using the `--variables` parameter e.g. +This particular repository contains multiple suites, each defined as their own dep within the same repo. The `:deps/root` key essentially +lets us point to a directory containing a dep, here the dep is a copy of the [integrity constraints](https://www.w3.org/TR/vocab-data-cube/#wf-rules) +from the [RDF Data Vocabulary](https://www.w3.org/TR/vocab-data-cube/). -## Defining test suites +Once these are specified we can run them against a repository containing data cubes, e.g. -A test suite defines a group of tests to be run. A test suite can be created from a single test file or a directory containing test files as shown in the -examples above. A test suite can also be defined within an EDN file that lists the tests it contains. The minimal form of this EDN file is: + $ clojure -A:rdf-validator --endpoint http://some.domain/sparql/query -```clojure -{ - :suite-name ["test1.sparql" - "dir/test2.sparql"] - :suite2 ["suite2/test3.sparql"] -} -``` +Note that this command will first fetch the validation suite dependency, cache it locally for future use, and run all the validation suites +we put on the classpath (here just the data cube validations). -Each key in the top-level map defines a test suite and the corresponding value contains the suite definition. Each test definition in the associated -list should be a path to a test file relative to the suite definition file. The type and name of each test is derived from the test file name. These -can be stated explicitly by defining tests within a map: +### Writing your own validation suites -```clojure -{ - :suite-name [{:source "test1.sparql" - :type :sparql - :name "first"} - {:source "test2.sparql" - :name "second"} - {:source "test3.sparql"} - "dir/test4.sparql"] -} -``` +Validations can be supplied on the command line as just a directory of `.sparql` files, or specified on the JVMs classpath via your `deps.edn` file. -When defining test definitions explicitly, only the `:source` key is required, the type and name will be derived from the test file name if not -provided. The two styles of defining tests can be combined within a test suite definition as defined above. +Here we demonstrate writing a simple classpath suite, as it is the easiest way to manage suites of validations that can be included as libraries. Other +supported methods are described in the more detailed docs. -### Combining test suites - -Test suites can selectively include test cases from other test suites: +To do this first add a `:paths ["src"]` key to your `deps.edn`: ```clojure -{ - :suite1 ["test1.sparql" - "test2.sparql"] - :suite2 ["test3.sparql"] - :suite3 {:import [:suite1 :suite2] - :exclude [:suite1/test1] - :tests [{:source "test4.txt" - :type :sparql}]} -} + {:paths ["src"] + :deps {;; NOTE each dep here is a validation suite + swirrl/validations.qb {:git/url "git@github.com:Swirrl/pmd-rdf-validations.git" + :sha "63479f200a7c3d1b0e63bc43b2617181644c846b" + :deps/manifest :deps + :deps/root "qb"} + } + :aliases {:rdf-validator {:extra-deps { swirrl/rdf-validator {:git/url "https://github.com/Swirrl/rdf-validator.git" + :sha "fd848fabc5718f876f99ee4ee5a3f89ea8529571"}} + :main-opts ["-m" "rdf-validator.core"]} + } + + } ``` -Test suites can import any number of other suites - this includes each test from the referenced suite into the importing suite. Any tests defined -in the imported suites can be selectively excluded by referencing them in the `:exclude` list. Each entry should contain a keyword of the form -`:suite-name/test-name`. By default test names are the stem of the file name up to the file extension e.g. the test for file `"test1.sparql"` -will be named `"test"`. +This essentially says when running the validator to include the `"src"` directory on the JVM's classpath. Next create the suite with the following +directory structure: -Test suite extensions must be acyclic e.g. `:suite1` importing `:suite2` which in turn imports `:suite1` is an error. -An error will be raised if any suite listed within an extension list is not defined, but suites do not need to be defined within the -same suite file. For example given two test files: + /your/validation/repo + |---- deps.edn + |---- src + |---- rdf-validator-suite.edn + |---- myorg + |---- mysuite + |---- test1.sparql + |---- test2.sparql -#### suite1.edn -```clojure -{:suite1 ["test1.sparql"]} -``` +Then in the `rdf-validator-suite.edn` file which must be at a classpath root (i.e. at the root of the "src" directory) specify the suites name and the relative paths +to the SPARQL files to include the suite, e.g. -#### suite2.edn ```clojure -{:suite2 {:import [:suite1] - :tests ["test2.sparql"]}} +{ + :suite-name ["myorg/mysuite/test1.sparql" + "myorg/mysuite/test2.sparql"] +} ``` -this is valid as long as `suite1.edn` is provided as a suite whenever `suite2.edn` is required e.g. +Next write your SPARQL validations and run like so: - java -jar rdf-validator-standalone.jar --endpoint data.ttl --suite suite1.edn --suite suite2.edn - -### Locating suites on the Java classpath + $ clojure -A:rdf-validator --endpoint http://some.domain/sparql/query -In addition to test suites explicitly provided through the `--suite` parameter, rdf-validator also searches the classpath for test -suite EDN definitions. The searched test suite files should be called `rdf-validator-suite.edn` and follow the format detailed above. -When running from the command line, the containing directory should be added to the Java classpath using the `-classpath` option. -Given an `rdf-validator-suite.edn` file: +[More on defining test suites](/docs/DEFINING_TEST_SUITES.md) -#### rdf-validator-suite.edn -```clojure -{:cp-suite ["test1.sparql" - "test2.sparql"]} -``` - -If this file is placed alongside the referenced `test1.sparql` and `test2.sparql` files in the directory `/tmp/rdf/my-suite` it can -be run as follows: +### Writing SPARQL validations - java -cp "/tmp/rdf/my-suite:rdf-validator-standalone.jar" clojure.main -m rdf-validator.core --endpoint data.ttl +Validations are written as either SPARQL `SELECT` queries which should find and return validation failures, or +ASK queries which fail when returning `false`. -Use of the `-jar` option overrides any specified `-classpath` value, so the command above explicitly adds `rdf-validator.jar` to -the classpath and invokes `clojure.main` instead (which in turn executes the `rdf-validator` main method). +We recommend prefering the `SELECT` style as they provide more information to users on what went wrong. For example +this query is a port of IC-1 from the RDF Datacube spec into `SELECT` style. -### Running via the Clojure tool +It will return any `qb:Observation`s that are not also in a `qb:dataSet`: -Manually building a Java classpath as shown above is tedious and error-prone. The [Clojure command-line tool](https://clojure.org/reference/deps_and_cli) -can automate the generation of the classpath and allows test suite directories to be packaged an distributed through `.jar` files or -remote `git` repositories. To run `rdf-validator` through the `clojure` tool, first create a new directory containing a `deps.edn` file: - -#### deps.edn -```clojure -{:deps {swirrl/rdf-validator {:local/root "/path/to/rdf-validator.jar"} - suite {:local/root "/path/to/test/suite"}} - :aliases {:rdf-validator {:main-opts ["-m" "rdf-validator.core"]}}} +```sparql +PREFIX qb: + +SELECT (?obs AS ?obsWithNoDataset) +WHERE { + { + # Check observation has a data set + ?obs a qb:Observation . + FILTER NOT EXISTS { ?obs qb:dataSet ?dataset1 . } + } +} ``` -The `/path/to/test/suite` directory should contain `deps.edn` file along with a `src` directory containing an `rdf-validator-suite.edn` file with the -format described above i.e. - - /path/to/test/suite - |---- deps.edn - |---- src - |---- rdf-validator-suite.edn - |---- test1.sparql - |---- test2.sparql - -The `deps.edn` file can be empty, although it can also be used to reference dependencies such as other test suites it imports -from (see below on how to specify dependencies). The `clojure` tool will put the `/path/to/test/suite/src` directory on the java classpath and the -`:rdf-validator` alias will invoke `clojure.main` with the required arguments. - -Now `rdf-validator` can be run with: +Some more example `SELECT` queries for validating RDF Data cubes can be [found here](https://github.com/Swirrl/pmd-rdf-validations/tree/master/pmd-qb/src/swirrl/validations/pmd-qb) - clj -A:rdf-validator --endpoint data.ttl - -This will run the test cases defined in `/path/to/test/suite/src/rdf-validator-suite.edn` +Additionally RDF Validator supports an advanced feature which usually needn't be used, to pre-process queries with [selmer](https://github.com/yogthos/Selmer) by replacing "handlebars like" variables (e.g `{{dataset-uri}}`) with any bound `--variables` provided via an `.edn` map of bindings, e.g. -The `suite` dependency does not necessarily need to be defined locally. The `clojure` tool allows dependencies to be specified -in remote `git` repositories or `.jar` files. If the test suite was hosted in a `git` repository instead, `deps.edn` could be -modified to refer to the desired commit. Similarly, the `rdf-validator` dependency can refer to a version on Github rather than -a local `.jar` file: - -#### deps.edn ```clojure -{:deps {swirrl/rdf-validator {:git/url "https://github.com/Swirrl/rdf-validator.git" :sha "9e87347db0784cca974ad140b5091e1b3ae3c4f8"} - suite {:git/url "https://github.com/my/rdf/validator/suite" :sha "0f95c170d3799af13f51a5945339cae972866ff0"}} - :aliases {:rdf-validator {:main-opts ["-m" "rdf-validator.core"]}}} +{:dataset-uri "http://my.domain/data/my-dataset"} ``` -### Running individual suites - -By default all test cases within all test suites will be executed when running `rdf-validator`. -This may be undesirable if many test suites are defined, or if one suite imports from another since -this will cause imported test cases to be executed multiple times. - -Individual test suites can be executed by providing the suite names to be run in an argument list -to the command-line invocation e.g. +[More on writing test cases](/docs/WRITING_TEST_CASES.md) -#### tests.edn -```clojure -{:suite1 ["test1.sparql" "test2.sparql" "test3.sparql"] - :suite2 {:import [:suite1] - :exclude [:suite1/test2] - :tests ["test4.sparql"] - :suite3 ["test5.sparql"]} -``` +## Usage - java -jar rdf-validator-standalone.jar --endpoint data.ttl --suite tests.edn suite2 suite3 - -This will execute the tests defined within `suite2` and `suite3` within `tests.edn`. +[More on command line options and usage](/docs/USAGE.md) - $ java -jar rdf-validator-standalone.jar --endpoint data.ttl --suite bad_predicate.sparql --variables variables.edn - ## License Copyright © 2018 Swirrl IT Ltd. diff --git a/docs/COMPILING.md b/docs/COMPILING.md new file mode 100644 index 0000000..7a62793 --- /dev/null +++ b/docs/COMPILING.md @@ -0,0 +1,15 @@ +# Compiling + +Rather than using via the [clojure CLI tools](https://clojure.org/guides/getting_started#_clojure_installer_and_cli_tools) it +is also possible to AOT compile the RDF Validator as an uberjar, and run with the incantation: `java -jar rdf-validator.jar`. + +This has the small advantage that it reduces start up time a little, however it does also make it substantially harder to assemble dependencies via +the command line tools. Hence this mechanism is no longer recommended. + +To compile an uberjar though, you need to first install [leiningen](https://leiningen.org/) and then run: + +``` +lein uberjar +``` + +This will build a standalone jar in the `target/uberjar` directory. diff --git a/docs/DEFINING_TEST_SUITES.md b/docs/DEFINING_TEST_SUITES.md new file mode 100644 index 0000000..1a43186 --- /dev/null +++ b/docs/DEFINING_TEST_SUITES.md @@ -0,0 +1,94 @@ +# Defining test suites + +A test suite defines a group of tests to be run. A test suite can be created from a single test file or a directory containing test files as shown in the +examples above. A test suite can also be defined within an EDN file that lists the tests it contains. The minimal form of this EDN file is: + +```clojure +{ + :suite-name ["test1.sparql" + "dir/test2.sparql"] + :suite2 ["suite2/test3.sparql"] +} +``` + +Each key in the top-level map defines a test suite and the corresponding value contains the suite definition. Each test definition in the associated +list should be a path to a test file relative to the suite definition file. The type and name of each test is derived from the test file name. These +can be stated explicitly by defining tests within a map: + +```clojure +{ + :suite-name [{:source "test1.sparql" + :type :sparql + :name "first"} + {:source "test2.sparql" + :name "second"} + {:source "test3.sparql"} + "dir/test4.sparql"] +} +``` + +When defining test definitions explicitly, only the `:source` key is required, the type and name will be derived from the test file name if not +provided. The two styles of defining tests can be combined within a test suite definition as defined above. + +## Combining test suites + +Test suites can selectively include test cases from other test suites: + +```clojure +{ + :suite1 ["test1.sparql" + "test2.sparql"] + :suite2 ["test3.sparql"] + :suite3 {:import [:suite1 :suite2] + :exclude [:suite1/test1] + :tests [{:source "test4.txt" + :type :sparql}]} +} +``` + +Test suites can import any number of other suites - this includes each test from the referenced suite into the importing suite. Any tests defined +in the imported suites can be selectively excluded by referencing them in the `:exclude` list. Each entry should contain a keyword of the form +`:suite-name/test-name`. By default test names are the stem of the file name up to the file extension e.g. the test for file `"test1.sparql"` +will be named `"test"`. + +Test suite extensions must be acyclic e.g. `:suite1` importing `:suite2` which in turn imports `:suite1` is an error. +An error will be raised if any suite listed within an extension list is not defined, but suites do not need to be defined within the +same suite file. For example given two test files: + +### suite1.edn +```clojure +{:suite1 ["test1.sparql"]} +``` + +### suite2.edn +```clojure +{:suite2 {:import [:suite1] + :tests ["test2.sparql"]}} +``` + +this is valid as long as `suite1.edn` is provided as a suite whenever `suite2.edn` is required e.g. + + clojure -A:rdf-validator --endpoint data.ttl --suite suite1.edn --suite suite2.edn + +### Running individual suites + +By default all test cases within all test suites will be executed when running `rdf-validator`. +This may be undesirable if many test suites are defined, or if one suite imports from another since +this will cause imported test cases to be executed multiple times. + +Individual test suites can be executed by providing the suite names to be run in an argument list +to the command-line invocation e.g. + +```clojure +{:suite1 ["test1.sparql" "test2.sparql" "test3.sparql"] + :suite2 {:import [:suite1] + :exclude [:suite1/test2] + :tests ["test4.sparql"] + :suite3 ["test5.sparql"]} +``` + + clojure -A:rdf-validator --endpoint data.ttl --suite tests.edn suite2 suite3 + +This will execute the tests defined within `suite2` and `suite3` within `tests.edn`. + + $ clojure -A:rdf-validator --endpoint data.ttl --suite bad_predicate.sparql --variables variables.edn diff --git a/docs/USAGE.md b/docs/USAGE.md new file mode 100644 index 0000000..8d1ddf1 --- /dev/null +++ b/docs/USAGE.md @@ -0,0 +1,30 @@ +# Usage (command line options) + +RDF Validator runs a collection of test cases against a SPARQL endpoint. The endpoint can be either a HTTP(s) SPARQL +endpoint or a file or directory on disk. Test cases can be specified as either a SPARQL query file, or a directory +of such files. + +The repository contains versions of the well-formed cube validation queries defined in the [RDF data cube specification](https://www.w3.org/TR/vocab-data-cube/#wf). +These are defined as SPARQL SELECT queries rather than the ASK queries defined in the specification to enable more detailed error reporting. + +To run these tests against a local SPARQL endpoint: + + $ clojure -A:rdf-validator --endpoint http://localhost/sparql/query --suite ./queries + +This will run all test cases in the queries directory against the endpoint. Test cases can be run individually: + + $ clojure -A:rdf-validator --endpoint http://localhost/sparql/query --suite ./queries/01_SELECT_Observation_Has_At_Least_1_Dataset.sparql + +SPARQL endpoints can also be loaded from a file containing serialised RDF triples: + + $ clojure -A:rdf-validator --endpoint data.ttl --suite ./queries + +Multiple test cases can be specified: + + $ clojure -A:rdf-validator --endpoint data.ttl --suite test1.sparql --suite test2.sparql + +The RDF dataset can also be specified: + + $ clojure -A:rdf-validator --endpoint data.ttl --graph http://graph1 --graph http://graph2 --suite test1.sparql + +Graphs are added a named graphs and included in the default graph. diff --git a/docs/WRITING_TEST_CASES.md b/docs/WRITING_TEST_CASES.md new file mode 100644 index 0000000..28556f8 --- /dev/null +++ b/docs/WRITING_TEST_CASES.md @@ -0,0 +1,36 @@ +# Writing test cases + +Test cases are expressed as either SPARQL ASK or SELECT queries. These queries are run against the target endpoint and the outcome of the test is based on the +result of the query execution. + +## ASK queries + +SPARQL ASK queries are considered to have failed if they evaluate to `true` so should be written to find invalid statements. +This is consistent with the queries defined in the RDF data cube specification. + +## SELECT queries + +SPARQL SELECT queries are considered to have failed if they return any matching solutions. Like ASK queries they should return bindings describing invalid resources. + +## Query variables + +Validation queries can be parameterised with query variables which must be provided when the test suite is run. Query variables have the format `{{variable-name}}` +within a query file. For example to validate no statements exist with a specified predicate, the following query could be defined: + +*bad_predicate.sparql* +```sparql +SELECT ?this WHERE { + ?this <{{bad-predicate}}>> ?o . +} +``` + +when running this test case, the value of `bad-predicate` must be provided. This is done by providing an EDN file containing variable +bindings. The EDN document should contain a map from keywords to the corresponding string values e.g. + +*variables.edn* +```clojure +{ :bad-predicate "http://to-be-avoided" + :other-variable "http://other" } +``` + +the file of variable bindings is specified when running the test case(s) using the `--variables` parameter e.g.