Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an option to batch file replace? #5924

Closed
amberleahey opened this issue Jun 6, 2019 · 8 comments · Fixed by #9018
Closed

Is there an option to batch file replace? #5924

amberleahey opened this issue Jun 6, 2019 · 8 comments · Fixed by #9018
Labels
Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: API Guide Feature: File Upload & Handling Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc.
Milestone

Comments

@amberleahey
Copy link

Hello, related to file replace, we are wondering if there is a way to replace ALL files in a dataset at once?

"My question is how do I replace a dataset that has over 47 files? I have deposited my dataset in geodatabase format which has over 46 separate files in the structure. Is there a way to do a bulk replace?"

For now we see individual files can be replaced, using file replace feature, but is there a better way for many files to be replaced? Deleting the replaced files and uploading new files is I guess another option.

Any other thoughts?
Thanks,
Amber

@pdurbin
Copy link
Member

pdurbin commented Jun 6, 2019

@amberleahey I have this problem too. If you compare version 2 and version 3 of my dataset below you'll see that the file names are very similar.

I'm lazy so this is what I've done so far:

  • Make a new version.
  • Delete all the files.
  • Upload the new files, many of which are technically the same file, just newer.
  • Publish.

What I think I'd rather do since there's currently no support in the UI for batch file replace is:

  • Make a new version.
  • For each of the files that are the same, probably based on the filename, use the File Replace API one by one.
  • For files that should be deleted, delete them via API.
  • For files that are new, add them via API.

This is enough work that I haven't bothered.

Lately I've been talking about an idea for what I started calling "sidecar" files.

sidecars

What if I could upload a zip file that contains something like this:

  • code/reproduce.ipynb
  • data/projects.tsv
  • data/primary/foo.json
  • data/primary/bar.json
  • .sidecars/filemetadata.json (descriptions, tags, etc, per file)
  • .sidecars/provenance/data/projects.tsv.prov.json

The idea here is that Dataverse would be smart enough while unpacking the zip file to populate file descriptions based on the filemetadata.json sidecar. This would be a fix for #723. If the file is the same file, an automatic File Replace could happen. If there's provenance information for a file, great, add it.

Maybe something like BagIt does some of this stuff already? Or ORE? I've been meaning to ask @qqmyers about this.

Anyway, here are versions 2 and 3 of my dataset so you can see what I'm talking about in terms of opportunity to use the File Replace feature, if I weren't so lazy. 😄 🛌

Version 2

Open_Source_at_Harvard_-Open_Source_at_Harvard-_2019-06-06_13 53 17

Version 3 (similar files)

Open_Source_at_Harvard_-Open_Source_at_Harvard-_2019-06-06_13 53 41

@amberleahey
Copy link
Author

yes, my sense too, it would be nice to see DV get smarter about automatically replacing files (if newer) via the regular file upload (which would also support batch and zip upload and replacing on unpacking). i think its hard to tell when DV will accept new versions of files via the regular file upload, so replace is nice to have. A batch file replace option from the main dataset landing page could also work.

For now, I've recommended as you say, delete and upload all new (doesn't remove files from previously published versions viewed under 'versions').

@pdurbin
Copy link
Member

pdurbin commented Oct 14, 2022

"This PR adds a /replaceFiles api call to allow bulk direct upload/out-of-band upload replace operations."

Also...

For now, I've recommended as you say, delete and upload all new (doesn't remove files from previously published versions viewed under 'versions').

I'm still making this recommendation because it's way easier than other options. Most recently @atrisovic asked about this and implemented a delete step in her GitHub Action uploader: IQSS/dataverse-uploader@3e5c567 . In that context the uploader (the client, basically) could take its best guess of which files are being replaced (probably based on filenames) but I'm sure there could be tricky edge cases. Git does something similar based on the content of the file but even it gets confused.

Perhaps the focus should be on the most straightforward case: all filenames match exactly. If not, tell the user that bulk replace is not available. It would be a step in the right direction, at least.

On more thought, in addition to the GitHub Action, there are two other clients, both created by @qqmyers, where the logic could be placed:

Anyway, the point is that perhaps the logic could be figured out in some client code first, and then maybe Dataverse itself could follow.

@pdurbin pdurbin added Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc. Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: File Upload & Handling labels Oct 14, 2022
@qqmyers
Copy link
Member

qqmyers commented Dec 13, 2022

@amberleahey - once #9018 is merged, should we consider this closed? (#9018 provides a bulk replace for s3/direct upload)

@amberleahey
Copy link
Author

@amberleahey - once #9018 is merged, should we consider this closed? (#9018 provides a bulk replace for s3/direct upload)

Potentially yes, can anyone run the API call? If it's not as important to have this available in the UI, then we will promote the API call, thanks for flagging!

@qqmyers
Copy link
Member

qqmyers commented Dec 13, 2022

It's an option in the S3 direct upload API (which is a sequence of 3 calls). Those are open to anyone (who can upload to a given dataset). It could eventually be added to DVUploader, etc.

I raised the question because I was asked if merging should auto-close this issue and didn't want to just say yes and have this issue disappear. Up to you whether you think #9018 covers what you wanted, or not, or perhaps means this issue should close and a new issue opened for just a UI option, etc.

@amberleahey
Copy link
Author

It's an option in the S3 direct upload API (which is a sequence of 3 calls). Those are open to anyone (who can upload to a given dataset). It could eventually be added to DVUploader, etc.

I raised the question because I was asked if merging should auto-close this issue and didn't want to just say yes and have this issue disappear. Up to you whether you think #9018 covers what you wanted, or not, or perhaps means this issue should close and a new issue opened for just a UI option, etc.

yes close it :)

@pdurbin
Copy link
Member

pdurbin commented Dec 13, 2022

@amberleahey you're the boss! Closing.

@pdurbin pdurbin closed this as completed Dec 13, 2022
@pdurbin pdurbin added this to the 5.13 milestone Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: API Guide Feature: File Upload & Handling Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants