Skip to content

Commit

Permalink
Merge pull request #5 from OP-TED/feature/TED-1422
Browse files Browse the repository at this point in the history
Feature/ted 1422
  • Loading branch information
Dragos0000 committed Sep 28, 2023
2 parents 9b3f199 + b7c431f commit 1d96730
Show file tree
Hide file tree
Showing 31 changed files with 1,358 additions and 89 deletions.
2 changes: 1 addition & 1 deletion docs/antora/antora.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: ted-rdf-docs
version: master
title: TED-RDF Conversion Pipeline
title: TED-SWS documentation
start_page: ROOT:index.adoc
asciidoc:
attributes:
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 9 additions & 17 deletions docs/antora/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,27 +1,19 @@
[.separated]#**TED-SWS**#

* xref:index.adoc[Home]
** What is TED SWS
** What is sample app?
** What is mapping?
** How to use TED SWS
** What y’ll find in this documentation
** How to contribute to TED SWS
* xref:mapping_suite/index.adoc[Mapping Suites]
** Getting started
** Who are these docs written for
** Glossary
** Assumptions we make about the skills of the reader
*** Prerequisites
** what the user can achieve through these pages
* xref:mapping_suite/index.adoc[Mapping Suite Docs]
** xref:mapping_suite/repository-structure.adoc[Repository structure]
** xref:mapping_suite/mapping-suite-structure.adoc[Mapping suite anatomy]
** xref:mapping_suite/code-list-resources.adoc[Code list mappings]
** xref:mapping_suite/preparing-test-data.adoc[Data samples]
** xref:mapping_suite/versioning.adoc[Versioning]
** References
* xref:sample_app/index.adoc[TED Data Sample application]
** xref:sample_app/jupyter_notebook.adoc[Jupyter Notebook]
** xref:sample_app/ms_excell.adoc[MS Excel]
* xref:sample_app/index.adoc[Sample application Docs]
** xref:sample_app/jupyter_notebook_python.adoc[Python Jupyter Notebook]
** xref:sample_app/jupyter_notebook_r.adoc[R Jupyter Notebook]
** xref:sample_app/ms_excel.adoc[MS Excel]
** xref:sample_app/sparql_queries.adoc[SPARQL Queries]
41 changes: 32 additions & 9 deletions docs/antora/modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,22 +1,45 @@
= TED-RDF Conversion Pipeline Documentation
= TED-SWS End-User Documentation

The TED-RDF Conversion Pipeline, is part of the TED Semantic Web Services (TED-SWS system) and provides tools an infrastructure to convert TED notices available in XML format into RDF. This conversion pipeline is designed to work with the https://docs.ted.europa.eu/rdf-mapping/index.html[TED-SWS Mapping Suites] - self containing packages with transformation rules and resources.
TED Semantic Web Service (TED-SWS) is a pipeline system that continuously
converts the public procurement notices (in XML format) available on the
TED Website into RDF format based on the eProcurement Ontology, and publishes
them into CELLAR repository, hance making them available to the public
through CELLAR’s SPARQL endpoint.

== What is TED SWS
The TED Semantic Web Service (TED-SWS) is plugging together
the TED infrastructure for the collection and publication of public procurement
notices with the infrastructure of http://data.europa.eu/[data.europa.eu]
in order to make public procurement data accessible and reusable as
Linked Open Data (LOD) by users and stakeholders (see xref:motivation.adoc[the detailed motivation]).

== Audience

== What is sample app?
This documentation is written for a wide audience, with different interests in the TED-SWS project, and different levels of expertise Semantic Web, EU e-Procurement and software infrastructure. More specifically this documentation can be of interest to:

- *End-Users*, such as *Semantic Web Practitioners* or *Experts in eProcurement Domain*, who are interested in understanding how the RDF representation of the e-procurement notices look like, and how this representation conforms to the eProcurement Ontology (ePO).
- *Software Engineers* interested in integrating mapping suite packages into processing pipelines;
- *Semantic Engineers* interested in understanding and writing mappings from XML to RDF, in particular in the EU eProcurement domain;

== What is mapping?
== Contents

[.tile-container]
--

== How to use TED SWS
[.tile]
.Mapping Suites
****
The TED-RDF Mappings are the transformation rules needed by the TED-RDF Conversion Pipeline (both of which are part of the TED Semantic Web Services, aka TED-SWS system) to convert TED notices available in XML format to RDF.
<<ted-rdf-docs:ROOT:mapping_suite/index.adoc#, Read the docs>>
****

== What y’ll find in this documentation

[.tile]
.Sample applications
****
Sample application represents a set of examples that shows how to interact with TED RDF Data (available in CELLAR) using tools like Python, R or Excel.
== How to contribute to TED SWS

<<ted-rdf-docs:ROOT:sample_app/index.adoc#, Read the docs>>
****

--
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
=== Resources for Code List Mappings
== Resources for Code List Mappings

The table below provides a list of resources that are used to map the various code lists used in the XML files to URIs in the RDF representation.

Expand Down
15 changes: 2 additions & 13 deletions docs/antora/modules/ROOT/pages/mapping_suite/index.adoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,7 @@
= What is a Mapping suite?
= Mapping suite documentation

A *mapping suite* is a set of "mappings" that defines how an XML document representing an e-Procurement Notice will be transformed to an equivalent RDF graph representation. These mappings are materialized in different forms, as it will be explained later, and a mapping suite will have all its relevant components organized in a package, which we refer to as a *mapping suite package*.

== Who are these docs written for?

This documentation is written for a wide audience, with different interests in the TED-SWS project, and different levels of expertise Semantic Web, EU e-Procurement and software infrastructure. More specifically this documentation can be of interest to:

- *Semantic Engineers* interested in understanding and writing mappings from XML to RDF, in particular in the EU eProcurement domain;
- *Software Engineers* interested in integrating mapping suite packages into processing pipelines;
- *End-Users*, such as *Semantic Web Practitioners* or *Experts in eProcurement Domain*, who are interested in understanding how the RDF representation of the e-procurement notices look like, and how this representation conforms to the eProcurement Ontology (ePO).


== Prerequisites

To allow for a proper understanding of the Mapping Suite Documentation, the reader should have:
Expand Down Expand Up @@ -61,10 +52,8 @@ https://op.europa.eu/en/web/eu-vocabularies/e-procurement/tedschemas
== Further readings
Depending on the interest of the reader the following pages can be explored (in this logical order):

** xref:mapping_suite/ted-sws-introduction.adoc[]
** xref:mapping_suite/repository-structure.adoc[GitHub Repository structure]
** xref:mapping_suite/mapping-suite-structure.adoc[Mapping suite anatomy]
** xref:mapping_suite/code-list-resources.adoc[Code list mappings]
** xref:mapping_suite/preparing-test-data.adoc[Data samples]
** xref:mapping_suite/versioning.adoc[Versioning]
** xref:mapping_suite/ [References]
** xref:mapping_suite/versioning.adoc[Versioning]
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
= Representative sample data selection
== Representative sample data selection

This section describes TED notice data samples and methods used to generate them. At first a sampling is performed on the notices from 2021, and then on a wider set.

== Sample TED notices from 2021
=== Sample TED notices from 2021

This data sample (`test_data/sampling_2021`) contains carefully selected TED notices based on the following criteria: maximise representativeness, minimise the number of selected documents. The selected notices are guaranteed to cover all possible XPath configurations available in the data. The sampling was performed automatically using a custom algorithm available in the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED-RDF Conversion Pipeline] repository.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Repository structure
== Repository structure

Transformation rules and other artefacts for the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED Semantic Web Services (TED-SWS)] system are organised in https://github.com/OP-TED/ted-rdf-mapping[this repository].

Expand Down
62 changes: 62 additions & 0 deletions docs/antora/modules/ROOT/pages/motivation.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# TED-SWS motivation

In its Strategic Plan for 2020-2024, the Publications Office has
defined specific Ojective 1 on the "European public procurement space"
as part of its general Objective 2 "A Europe fit for the digital age".

In this context the Publications Office has identified the need for reliable and
complete data on public procurement in the EU as being essential
for transparency and accountability of public spending. The ongoing
investments of the Publications Office for the transition to eForms,
and the continued development of the eProcurement Ontology are
identified by the Strategic Plan as being central for
improved data quality and enhanced automation of data processing
and interoperability.

Additionally, in the context of specific objective 2 on the
"European data space", the Publications Office identifies the gap that
still exists between the available wealth of open data, spread across
multiple outlets, and the effort required to discover, access and reuse it.

To bridge this gap, the Strategic Plan for 2020-2024, commits to
generate and share new knowledge as linked open data, through
an ecosystem of datasets, data models, ontologies and specialised services
accessible through a single entry point (http://data.europa.eu/[data.europa.eu])
following a "data-as-a-public-service" approach.

Although TED notice data is already available to the general public
through the search API provided by the TED website, the current offering
has many limitations that impede access to and reuse of the data. One
such important impediment is for example the current format of the data.

Historical TED data come in various XML formats that evolved together
with the standard TED XML schema. The imminent introduction of eForms
will also introduce further diversity in the XML data formats available
through TED's search API. This makes it practically impossible for users
to consume and process data that span across several years, as
their information systems must be able to process several different
flavours of the available XML schemas as well as to keep up with the
schema's continuous evolution. Their search capabilities are therefore
confined to a very limited set of metadata.

The TED Semantic Web Service removes these barriers by providing one
common format for accessing and reusing all TED data. Coupled with the
eProcurement Ontology, the TED data will also have semantics attached to
them allowing users to directly link them with other datasets.
Moreover, users will now be able to perform much more elaborate
queries directly on the data source (through the SPARQL endpoint). This
will reduce their need for data warehousing in order to perform complex
queries.

These developments, by lowering the barriers, will give rise to a vast
number of new use-cases that will enable stakeholders and end-users to
benefit from increased availability of analytics. The ability to perform
complex queries on public procurement data will be equally open to large
information systems as well as to simple desktop users with a copy of
Excel and an internet connection.

To summarize, the TED Semantic Web Service (TED SWS) is a pipeline
system that continuously converts the public procurement notices (in XML
format) available on the TED Website into RDF format, publishes them
into CELLAR and makes them available to the public through CELLAR’s
SPARQL endpoint.
59 changes: 59 additions & 0 deletions docs/antora/modules/ROOT/pages/sample_app/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
= Sample app documentation

A sample application, often referred to as a demo or prototype, is a functional representation of a software program or system that demonstrates its basic features, functionalities, and capabilities. In the context of TED-SWS (TED Semantic Web Services), a sample application would refer to a functional representation of how to access data processed by the system.

== Glossary

* *RDF* stands for Resource Description Framework. RDF is a standardized data model used to represent information on the web. RDF plays a crucial role in xref:ROOT:index.adoc[TED-SWS] because it provides a standardized and structured format for representing the procurement data made available through the service. This allows for efficient querying, processing, and integration of the data into various applications and systems.

* *SPARQL Query* represents query language used to retrieve and manipulate data stored in RDF format.

* *Jupyter Notebook* is an interactive computing environment that allows users to create and share documents containing live code, equations, visualizations, and explanatory text. It's particularly useful for working with data and performing data analysis, making it a valuable tool for xref:ROOT:mapping_suite/index.adoc[accessing and processing] data from TED-SWS.

* *Python* is a widely used programming language that can be employed to retrieve data and xref:ROOT:sample_app/jupyter_notebook_python.adoc[perform operations] on the RDF data provided by TED-SWS.

* *R language* refers to a popular programming language and environment specifically relevant for statistical computing, xref:ROOT:sample_app/jupyter_notebook_r.adoc[data analysis], and graphical representation that is used to retrieve and perform operations on the RDF data provided by TED-SWS.

* *MS Excel* refers to Microsoft Excel, which is a widely used spreadsheet program developed by Microsoft used as a versatile tool for xref:ROOT:sample_app/ms_excel.adoc[handling and analysing] data obtained from TED-SWS.

* *Code Editor* refers to a software tool or environment where users can write, edit, and execute code. It allows to easily create scripts or programs to retrieve data from TED-SWS and perform operations on the RDF data.

* *Jupyter Notebook Kernel* refers to the computational engine that executes the code within a Jupyter Notebook. It determines which programming language is used to run the code in the notebook. For example, if you're working with TED-SWS in a Jupyter Notebook, you might choose to use a Python kernel, which means that you'll be writing and executing Python code.

* *Business Questions* (BQ) refer to specific inquiries or information needs that pertain to business operations, procurement activities, or related aspects. These questions are typically posed by organizations, researchers, or individuals seeking to gain insights, make informed decisions, or conduct analyses based on the data provided by TED-SWS.

== Prerequisites

To use TED-SWS sample apps, you will need the following:

Understanding of RDF and SPARQL:: Familiarity with RDF (Resource Description Framework) and SPARQL (SPARQL Protocol and RDF Query Language) is crucial. TED-SWS provides data in RDF format and utilizes SPARQL for querying.

Access to a Programming Language:: You should have proficiency in a programming language capable of making HTTP requests and processing data. Common choices include Python or R.

Knowledge of Semantic Web Technologies:: A basic understanding of Semantic Web concepts and technologies is beneficial. This includes knowledge of RDF triples, ontologies, and linked data principles.

Development Environment:: Set up a development environment for your chosen programming language or at least ensure that you have installed MS Excel.

Understanding of EU Procurement Data:: If your goal is to work with specific types of EU procurement data, such as contract notices or award notices, it's important to have a basic understanding of these concepts and the associated https://docs.ted.europa.eu/EPO/latest/index.html[ontology].

== Using Jupyter Notebook

* <<ted-rdf-docs:ROOT:sample_app/jupyter_notebook_python.adoc#, Jupyter Notebook - Python>>

Example of using Python language and to access data.

* <<ted-rdf-docs:ROOT:sample_app/jupyter_notebook_r.adoc#, Jupyter Notebook - R>>

Example of using R language and to access data.

== Using MS Excel

* <<ted-rdf-docs:ROOT:sample_app/ms_excel.adoc#, MS Excel Workbook>>

Example of accessing data in a MS Excel workbook.

== SPARQL Query examples

* <<ted-rdf-docs:ROOT:sample_app/sparql_queries.adoc#, SPARQL Query examples>>

Example of accessing data using SPARQL Query examples
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
== Jupyter Notebook - Python

This document shows an example using the Jupyter Notebook in Python. The
Jupyter Notebook is an application for creating and sharing
computational documents. Python represents a programming language for
writing computational documents. To realize the proposed scenario, it is
necessary to install the special tools and use the Python code that will
perform a query to the cellar and display the results in tabular
form.

Example query:

**Who are the contract winners for a given date?**

[source,sparql]
PREFIX epo: <http://data.europa.eu/a4g/ontology#>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX cccev: <http://data.europa.eu/m8g/>
select distinct
?Lot
?Winner
?WinnerCountryCode
?LotAwardetAmountValue
?LotAwardetValueCurrency
where {
values ?NoticePublicationDate {
"20230921"
}
?NoticeId a epo:ResultNotice;
epo:hasPublicationDate ?NoticePublicationDate;
epo:refersToLot ?Lot.
?Lot a epo:Lot.
?LotAwardOutcome epo:describesLot ?Lot;
a epo:LotAwardOutcome;
epo:comprisesTenderAwardOutcome ?TenderAwardOutcome.
?TenderAwardOutcome a epo:TenderAwardOutcome;
epo:indicatesAwardOfLotToWinner / epo:playedBy ?Winner.
?Winner a org:Organization.
optional {
?Winner cccev:registeredAddress / epo:hasCountryCode ?WinnerCountryCode.
}
?LotAwardOutcome epo:hasAwardedValue ?LotAwardetValue.
?LotAwardetValue a epo:MonetaryValue;
epo:hasAmountValue ?LotAwardetAmountValue;
epo:hasCurrency ?LotAwardetValueCurrency.
}

To run the sample application using Python language follow the steps below:

[arabic]
. https://github.com/OP-TED/ted-rdf-docs/blob/main/notebooks/query_cellar_python.ipynb[Download Jupyter Notebook ]


[arabic, start=2]
. Download & Install Python 3.8
[loweralpha]
.. Windows 64bit:
https://www.python.org/ftp/python/3.8.10/python-3.8.10-amd64.exe[[.underline]#download#]

.. Windows 86bit:
https://www.python.org/ftp/python/3.8.10/python-3.8.10.exe[[.underline]#download#]

. Open the Jupyter Notebook file with the code editor

. In the code editor, select the Python interpreter that was installed in the previous step

.Interpreter selection
image::user_manual/jupyter_notebook/image1.png[image,width=817,height=204]


[arabic, start=5]
. Install dependencies

* Use OS command line and run the following command
[source, python]
pip3 install sparqlwrapper pandas Jinja2 matplotlib

NOTE: After installation, restart kernel from Jupyter Notebook to update it with new dependencies. This can be done by clicking on the "Restart" button in your code editor.

[arabic, start=6]
. Run all Jupyter Notebook Cells

.Button that runs all cells
image::user_manual/jupyter_notebook/image2.png[image,width=501,height=84]

[arabic, start=7]
. After running successfully all the cells in the Jupyter Notebook, we can see the result table

.Result table
image::user_manual/jupyter_notebook/image3.png[image,width=987,height=420]

Loading

0 comments on commit 1d96730

Please sign in to comment.