Skip to content

Commit

Permalink
Merge pull request IQSS#9972 from IQSS/9714-files-api-extension-filters
Browse files Browse the repository at this point in the history
Reverts revert of 9714-files-api-extension-filters and adds tabular file tag filtering to getVersionFiles endpoint and new endpoint for tagging tab files
  • Loading branch information
kcondon authored Oct 9, 2023
2 parents 7e0738e + 94fe709 commit a209f43
Show file tree
Hide file tree
Showing 22 changed files with 1,232 additions and 178 deletions.
14 changes: 14 additions & 0 deletions doc/release-notes/9714-files-api-extension-filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support optional filtering by:

- Access status: through the `accessStatus` query parameter, which supports the following values:

- Public
- Restricted
- EmbargoedThenRestricted
- EmbargoedThenPublic


- Category name: through the `categoryName` query parameter. To return files to which the particular category has been added.


- Content type: through the `contentType` query parameter. To return files matching the requested content type. For example: "image/png".
3 changes: 3 additions & 0 deletions doc/release-notes/9785-files-api-extension-search-text.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support optional filtering by search text through the `searchText` query parameter.

The search will be applied to the labels and descriptions of the dataset files.
6 changes: 6 additions & 0 deletions doc/release-notes/9834-files-api-extension-counts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Implemented the following new endpoints:

- getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Given a dataset and its version, retrieves file counts based on different criteria (Total count, per content type, per access status and per category name).


- setFileCategories (/api/files/{id}/metadata/categories): Updates the categories (by name) for an existing file. If the specified categories do not exist, they will be created.
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Implemented the following new endpoints:

- userFileAccessRequested (/api/access/datafile/{id}/userFileAccessRequested): Returns true or false depending on whether or not the calling user has requested access to a particular file.


- hasBeenDeleted (/api/files/{id}/hasBeenDeleted): Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.


In addition, the DataFile API payload has been extended to include the following fields:

- tabularData: Boolean field to know if the DataFile is of tabular type


- fileAccessRequest: Boolean field to know if the file access requests are enabled on the Dataset (DataFile owner)
3 changes: 3 additions & 0 deletions doc/release-notes/9972-files-api-filter-by-tabular-tags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- New query parameter `tabularTagName` added to the getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) to return files to which the particular tabular tag has been added.

- New endpoint to set tabular file tags via API: /api/files/{id}/metadata/tabularTags.
12 changes: 12 additions & 0 deletions doc/sphinx-guides/source/api/dataaccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,18 @@ A curl example using an ``id``::

curl -H "X-Dataverse-key:$API_TOKEN" -X GET http://$SERVER/api/access/datafile/{id}/listRequests

User Has Requested Access to a File:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/access/datafile/{id}/userFileAccessRequested``

This method returns true or false depending on whether or not the calling user has requested access to a particular file.

A curl example using an ``id``::

curl -H "X-Dataverse-key:$API_TOKEN" -X GET "http://$SERVER/api/access/datafile/{id}/userFileAccessRequested"


Get User Permissions on a File:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
250 changes: 244 additions & 6 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -970,6 +970,53 @@ This endpoint supports optional pagination, through the ``limit`` and ``offset``
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?limit=10&offset=20"
Category name filtering is also optionally supported. To return files to which the requested category has been added.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?categoryName=Data"
Tabular tag name filtering is also optionally supported. To return files to which the requested tabular tag has been added.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?tabularTagName=Survey"
Content type filtering is also optionally supported. To return files matching the requested content type.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?contentType=image/png"
Filtering by search text is also optionally supported. The search will be applied to the labels and descriptions of the dataset files, to return the files that contain the text searched in one of such fields.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?searchText=word"
File access filtering is also optionally supported. In particular, by the following possible values:

* ``Public``
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``

If no filter is specified, the files will match all of the above categories.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?accessStatus=Public"
Ordering criteria for sorting the results is also optionally supported. In particular, by the following possible values:

* ``NameAZ`` (Default)
Expand All @@ -979,14 +1026,42 @@ Ordering criteria for sorting the results is also optionally supported. In parti
* ``Size``
* ``Type``

Please note that these values are case sensitive and must be correctly typed for the endpoint to recognize them.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?orderCriteria=Newest"
Please note that both filtering and ordering criteria values are case sensitive and must be correctly typed for the endpoint to recognize them.

Keep in mind that you can combine all of the above query params depending on the results you are looking for.

Get File Counts in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get file counts, for the given dataset and version.

The returned file counts are based on different criteria:

- Total (The total file count)
- Per content type
- Per category name
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export ID=24
export VERSION=1.0
curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION/files/counts"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts"
View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2832,13 +2907,13 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24
curl "$SERVER_URL/api/files/$ID/downloadCount"
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/downloadCount"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/files/24/downloadCount"
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/downloadCount"
A curl example using a ``PERSISTENT_ID``
Expand All @@ -2848,16 +2923,53 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
curl "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
If you are interested in download counts for multiple files, see :doc:`/api/metrics`.
File Has Been Deleted
~~~~~~~~~~~~~~~~~~~~~
Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.
A curl example using an ``ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/hasBeenDeleted"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/hasBeenDeleted"
A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/hasBeenDeleted?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/hasBeenDeleted?persistentId=doi:10.5072/FK2/AAA000"
Updating File Metadata
~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -2907,6 +3019,132 @@ Also note that dataFileTags are not versioned and changes to these will update t
.. _EditingVariableMetadata:
Updating File Metadata Categories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Updates the categories for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the category names.
Although updating categories can also be done with the previous endpoint, this has been created to be more practical when it is only necessary to update categories and not other metadata fields.
The JSON representation of file categories (``categories.json``) looks like this::
{
"categories": [
"Data",
"Custom"
]
}
A curl example using an ``ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
export FILE_PATH=categories.json
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
"$SERVER_URL/api/files/$ID/metadata/categories" \
-H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
"http://demo.dataverse.org/api/files/24/metadata/categories" \
-H "Content-type:application/json" --upload-file categories.json
A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
export FILE_PATH=categories.json
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
"$SERVER_URL/api/files/:persistentId/metadata/categories?persistentId=$PERSISTENT_ID" \
-H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
"https://demo.dataverse.org/api/files/:persistentId/metadata/categories?persistentId=doi:10.5072/FK2/AAA000" \
-H "Content-type:application/json" --upload-file categories.json
Note that if the specified categories do not exist, they will be created.
Updating File Tabular Tags
~~~~~~~~~~~~~~~~~~~~~~~~~~
Updates the tabular tags for an existing tabular file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the tabular tag names.
The JSON representation of tabular tags (``tags.json``) looks like this::
{
"tabularTags": [
"Survey",
"Genomics"
]
}
A curl example using an ``ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
export FILE_PATH=tags.json
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
"$SERVER_URL/api/files/$ID/metadata/tabularTags" \
-H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
"http://demo.dataverse.org/api/files/24/metadata/tabularTags" \
-H "Content-type:application/json" --upload-file tags.json
A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
export FILE_PATH=tags.json
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
"$SERVER_URL/api/files/:persistentId/metadata/tabularTags?persistentId=$PERSISTENT_ID" \
-H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
"https://demo.dataverse.org/api/files/:persistentId/metadata/tabularTags?persistentId=doi:10.5072/FK2/AAA000" \
-H "Content-type:application/json" --upload-file tags.json
Note that the specified tabular tags must be valid. The supported tags are:
* ``Survey``
* ``Time Series``
* ``Panel``
* ``Event``
* ``Genomics``
* ``Network``
* ``Geospatial``
Editing Variable Level Metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 3 additions & 0 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,9 @@

<!-- Container related -->
<fabric8-dmp.version>0.43.4</fabric8-dmp.version>

<!-- Persistence -->
<querydsl.version>5.0.0</querydsl.version>
</properties>

<pluginRepositories>
Expand Down
14 changes: 14 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,20 @@
<artifactId>expressly</artifactId>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-apt</artifactId>
<version>${querydsl.version}</version>
<classifier>jakarta</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-jpa</artifactId>
<version>${querydsl.version}</version>
<classifier>jakarta</classifier>
</dependency>

<dependency>
<groupId>commons-io</groupId>
Expand Down
Loading

0 comments on commit a209f43

Please sign in to comment.