-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6522 duplicate dvobjects #6612
6522 duplicate dvobjects #6612
Conversation
diagnostics script plus "pre-release note", to be sent preemptively to partner installations, so that if the issue with the datafile duplicates is detected we could start assisting them with cleanup before they can upgrade to the next release.
…hat we'll need to send out to the remote installations). #6522
@landreev The script url in the release notes results in 404: https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6522/find_duplicates.sh but this one in prerelease notes works: https://github.com/IQSS/dataverse/raw/6522-duplicate-dvobjects/scripts/issues/6522/find_duplicates.sh Also getting a flyway failure due to checksum. That usually happens when there is already a script applied? Does this script need to be renamed for 4.20? |
@kcondon The link is just a matter of changing one of them to "develop" right before we merge, as discussed.
|
@kcondon checked in |
…we are merging it! (#6522)
Doh - we may need to revert this PR, on account of harvested files. Those often have remote urls in the storageidentifier field, that are totally not guaranteed to be unique... Our own prod. database has a ton of these, so the constraint doesn't work. We don't really use these urls for any practical purposes; so we can probably stop shoving them into the field... But we can't proceed with adding the constraint until we figure it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable. I don't have anything locally to trigger all of the possible results.
What this PR does / why we need it:
This addresses the issue of duplicate datafiles/dvobjects created on upload under some (unknown) conditions, referencing the same physical file. The code change adds the unique constraint on the storageidentifier within dataset that should prevent this from happening in the future.
Before it can be deployed however, the database must be checked for any existing duplicates. We've already done it in our prod., but we'll have to notify other installations and guide them through the cleanup process.
In addition to the normal release note, the PR contains a "pre-release-note", plus the diagnostic script, for sending out to the partner installation. This should be done ahead of the next release, as soon as the branch is merged into develop.
Which issue(s) this PR closes:
Closes #6522
Special notes for your reviewer:
Up for review is the approach I chose for dealing with this on remote installations. Specifically, I concluded it was not practical to try and produce a script that would fix any potentially affected installation. There are too many variables involved - whether an affected file is published vs. draft, ingested as tabular, physical file intact, etc. etc. So my approach is to send out the diagnostics script (that has been tested on one remote installation already) that will detect if this condition is present. If any duplicates are found, I'm asking them to send us the output and then we'll assist them with cleanup.
This involves committing to more support work down the road, but I feel like this is the best and safest way to address this.
Suggestions on how to test this:
We do not know of a way to reproduce the condition that leads to the creation of duplicates on upload. So there is no clear way to test it.
Mostly the QA should be about reviewing the diagnostic script and information note that we'll be sending out to the remote installations.
Does this PR introduce a user interface change?:
No
Is there a release notes update needed for this change?:
Yes
Additional documentation: