Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iqss/7140 google cloud archiver #7292

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Oct 2, 2020

What this PR does / why we need it: This PR includes an archiver class to send Bags with datacite.xml and OAI_ORE metadata files to Google cloud storage that is analogous to the existing DuraCloud and Local File archivers. It also includes an update to the DuraCloud archiver to add some timing checks developed when testing the Google version that should improve robustness. There is also a one character fix in the documentation of the Local File archiver to replace a non-functional smart quote in a curl command needed to configure it (the smart quote breaks cut/paste from the doc).

The Google Cloud Archiver was developed by/for QDR but should be useful for other community members.

Which issue(s) this PR closes:

Closes #7140

Special notes for your reviewer:

Suggestions on how to test this: Testing requires a Google Cloud account - we can probably coordinate to add the config/key for QDR's dev bucket to an ansible-launched test machine and screen share to verify the Bag arrives at Google.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: no

Is there a release notes update needed for this change?: could note that archiving to Google is possible

Additional documentation:

@coveralls
Copy link

coveralls commented Oct 2, 2020

Coverage Status

Coverage decreased (-0.03%) to 19.433% when pulling 3484bb6 on QualitativeDataRepository:IQSS/7140-GoogleCloudArchiver into 8564ff6 on IQSS:develop.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good but I left a couple comments.

Comment on lines 841 to 853
``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.GoogleCloudSubmitToArchiveCommand"``

``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":":GoogleCloudBucket, :GoogleCloudProject"``

The Google Cloud Archiver defines two custom settings, both are required:

\:GoogleCloudBucket - the name of the bucket to use. For example:

``curl http://localhost:8080/api/admin/settings/:GoogleCloudBucket -X PUT -d "qdr-archive"``

\:GoogleCloudProject - the name of the project managing the bucket. For example:

``curl http://localhost:8080/api/admin/settings/:GoogleCloudProject -X PUT -d "qdr-project"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all the BagIt export stuff was initially merged, it didn't dawn on me that we lost our comprehensive list of settings in one place.

I think it would be nice to document :ArchiverSettings in the big list and probably all the various sub-settings like :GoogleCloudBucket, :GoogleCloudProject, and the older ones (:BagItLocalPath, etc.).

From a code perspective, I think I'd also like to see strings like :GoogleCloudBucket, :GoogleCloudProject, :BagItLocalPath, etc. added to the "Key" enum in SettingsServiceBean.java. That way, developers can also have a comprehensive list of all settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add them to the guides list. The :ArchiverSettings is a Key already. For the others, I think earlier discussions questioned whether the archive-specific classes would end up being deployed separately, in which case having their settings in the core would be odd. Since that's not on the roadmap at this point, I could move them temporarily if you think its worth the effort.

Comment on lines +107 to +108
return new Failure("Error in transferring DataCite.xml file to GoogleCloud",
"GoogleCloud Submission Failure: incomplete metadata transfer");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do failures like this show up in the UI? If so, do they need to be translatable into languages other than English?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't appear anywhere except the log AFAIK, even if you run them as part of a post-publish workflow instead of the API. (Since they run asynchronously, the API just returns an OK, it's launched response.

The /api/datasets/{id}/actions/:publish also runs archiving, synchronously, if configured, and it does have internationalizable success/failure messages.

@djbrooke
Copy link
Contributor

djbrooke commented Oct 5, 2020

Thanks @pdurbin for the code review. @qqmyers, I added some release notes.

Is this ready for QA?

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple typos

doc/sphinx-guides/source/installation/config.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/installation/config.rst Outdated Show resolved Hide resolved
djbrooke and others added 2 commits October 5, 2020 13:29
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
@djbrooke
Copy link
Contributor

djbrooke commented Oct 5, 2020

@pdurbin got 'em, thanks

@pdurbin
Copy link
Member

pdurbin commented Oct 5, 2020

@djbrooke I think it's very close. I read the extra docs @qqmyers added (thanks!) and found a couple typos so I put suggestions in my latest review. I haven't built the docs or code but they seem to be in good shape.

@kcondon kcondon self-assigned this Oct 6, 2020
@kcondon
Copy link
Contributor

kcondon commented Oct 8, 2020

Smoke tested, tested local archive, and with Jim's help set up google cloud archive. Waiting some doc updates before merging.

@kcondon kcondon merged commit 62b7fdf into IQSS:develop Oct 9, 2020
@djbrooke djbrooke added this to the 5.2 milestone Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Archiver to store to Google Cloud
5 participants