Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file download support #7524

Closed
philippconzett opened this issue Jan 22, 2021 · 8 comments · Fixed by #8891
Closed

Large file download support #7524

philippconzett opened this issue Jan 22, 2021 · 8 comments · Fixed by #8891
Assignees
Labels
NIH OTA: 1.1.1 1 | 1.1.1 | Minimum Viable Product (MVP) for registering metadata in the repository and connectin... pm.GREI-d-1.1.1 NIH, yr1, aim1, task1: MVP for registering metadata in the repository
Milestone

Comments

@philippconzett
Copy link
Contributor

I just noticed the following desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:

2.5 The repository provides a mechanism to make very large files available to users outside of the normal user-interface (in cases where the size of the file becomes unwieldy for the user).

There are already a number of open issues related to upload support for large files, but I couldn't find any issue dealing with download support for large files.

@pdurbin
Copy link
Member

pdurbin commented Jan 22, 2021

This is S3-specific, but from the guides ( https://guides.dataverse.org/en/5.3/installation/config.html#second-configure-dataverse-to-use-s3-storage ):

"Optionally, you can have users download files from S3 directly rather than having files pass from S3 through Payara to your users. To accomplish this, set dataverse.files..download-redirect to true like this"

We use this feature in Harvard Dataverse and it works well via both GUI and API. The idea is that the file is streamed directly from S3 to the user's computer.

@philippconzett
Copy link
Contributor Author

Sounds good! How does the GUI alternative work from the end user side: Is there a button called "Download from S3 directly" or similar?

@qqmyers
Copy link
Member

qqmyers commented Jan 23, 2021

Once an admin has selected direct downloads for a given store, for a download from the UI, the browser just follows a redirect request and the file downloads from S3. The user sees no difference (except hopefully speed). (Note that direct uploads and downloads use pre-signed URLs - the enable the users browser to up/download to S3 without having access to the overall bucket or making the file URLs public.)

@scolapasta
Copy link
Contributor

Note: we do have some logic for rsync uploaded package files that instead of following the redirect, we show the link and ask the user to user their favorite web downloader (that could, for example, allow for pausing and resuming). The user could, of course, still use their browser with the link.

We've discussed wanting to use this popup more generally for large files.

@pdurbin
Copy link
Member

pdurbin commented Aug 23, 2022

@philippconzett hi! I'm looking at the Globus pull request (#8891) which also promises to support large file download. Globus works outside the normal user interface (from characteristic you quoted).

What would you consider the "definition of done" for this issue? Are two options, S3 and Globus, enough? Or did you have something else in mind? Thanks!

@philippconzett
Copy link
Contributor Author

Hi Phil! I don't really know what exactly COAR means by "outside of the normal user-interface", but I think options like S3 and Globus (or even just API calls through the command line) should qualify for this. So, as for me, we can close this issue once large file download via S3 and Globus works. Thanks!

@pdurbin
Copy link
Member

pdurbin commented Aug 25, 2022

@philippconzett sounds good to me! I just marked this issue to be automatically closed when that pull request I mentioned is merged:

At some point in the future we should move the "Big Data Support" content from the Dev Guide to the Admin Guide (or maybe the Installation Guide) to indicate that it's more official, less of a dev thing to play with. But we can create a separate issue for that some day. 😄

@pdurbin pdurbin added this to the 5.12 milestone Sep 19, 2022
@pdurbin
Copy link
Member

pdurbin commented Sep 19, 2022

Ok, I just merged #8891 (Globus support) so this issue is closed. Please open follow up issues, as needed. Thanks.

@mreekie mreekie added the NIH OTA: 1.1.1 1 | 1.1.1 | Minimum Viable Product (MVP) for registering metadata in the repository and connectin... label Oct 8, 2022
@mreekie mreekie added the pm.GREI-d-1.1.1 NIH, yr1, aim1, task1: MVP for registering metadata in the repository label Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NIH OTA: 1.1.1 1 | 1.1.1 | Minimum Viable Product (MVP) for registering metadata in the repository and connectin... pm.GREI-d-1.1.1 NIH, yr1, aim1, task1: MVP for registering metadata in the repository
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants