Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: BagIt Support - Add automatic checksum validation on upload #8608

Closed
abujeda opened this issue Apr 13, 2022 · 6 comments · Fixed by #8677
Closed

Feature Request/Idea: BagIt Support - Add automatic checksum validation on upload #8608

abujeda opened this issue Apr 13, 2022 · 6 comments · Fixed by #8677
Labels
HDC Harvard Data Commons HDC: 2 Harvard Data Commons Obj. 2
Milestone

Comments

@abujeda
Copy link
Contributor

abujeda commented Apr 13, 2022

Overview of the Feature Request
Add automatic checksum validation if the file uploaded to Dataverse is in BagIt format.
Upon file upload, we will detect if it is a Bag and perform file checksums against the Bag manifest.
If there are checksum errors, we will display a warning message to the end-user.

UI changes still to be discussed.

What kind of user is the feature intended for?
Curator and Depositor

What inspired the request?
As part of the Harvard Data Commons project there is a requirement to add support for BagIt. The first iteration is to simply inspect the manifest files and perform a file validation when a new deposit is made.

What existing behavior do you want changed?
We want to enhanced the upload functionality to add the file validation based on the BagIt manifest.

Any brand new behavior do you want to add to Dataverse?
No

Any related open or closed issues to this feature request?
N/A

@shlake
Copy link
Contributor

shlake commented Apr 13, 2022

To support this, this issue needs to be fixed: #8449
#8449

As the checksum in the manifest is for the original uploaded file, but the ingested "tab" file is what is in the bag.

@qqmyers
Copy link
Member

qqmyers commented Apr 13, 2022

@shlake - FWIW: my understanding right now is that the import is for vanilla Bags (in particular, no OAI-ORE file describing the dataset), not ones generated by Dataverse (which have the issue you describe and also contain file metadata that is not in a vanilla Bag.)

@abujeda abujeda changed the title Feature Request/Idea: BagIt Support - Add automatic checksum validation Feature Request/Idea: BagIt Support - Add automatic checksum validation on upload Apr 13, 2022
@abujeda
Copy link
Contributor Author

abujeda commented Apr 29, 2022

We had a first attempt at the UI to provide feedback to the users when uploading BagIt packages.
Summary:

  • An error message will be shown at the top of the page.
  • At most, we will show 5 errors to the end user.
  • The BE will stop the file validation if 5 or more errors are found.
  • If errors are found, we will not allow the file/files to be uploaded.

Here is an screenshot for when we find errors in the checksums:

Screenshot 2022-04-29 at 16 05 27

@abujeda
Copy link
Contributor Author

abujeda commented May 5, 2022

Created a draft PR to start the conversation about the implementation.

@abujeda
Copy link
Contributor Author

abujeda commented May 10, 2022

A demo of the functionality and a AWS environment has been shared with a group from the Instituto Brasileiro de Informação em Ciência e Tecnologia

@abujeda
Copy link
Contributor Author

abujeda commented May 10, 2022

The final implementation is ready for review. PR: #8677

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDC Harvard Data Commons HDC: 2 Harvard Data Commons Obj. 2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants