Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement asynchronous/parallel transfers #202

Open
qubixes opened this issue Jun 21, 2024 · 4 comments
Open

Implement asynchronous/parallel transfers #202

qubixes opened this issue Jun 21, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@qubixes
Copy link
Collaborator

qubixes commented Jun 21, 2024

Currently iBridges relies on the parallelism in python-irodsclient for large transfers. This works well for upload/download speed in general, but there are some limitations:

  • Multiple smaller files are not transferred in parallel.
  • We can't update our progressbar while the transfer is taking place.
  • For large files, the time to get the checksum can be very large. It would be nice if this could overlap with other transfers.
  • Currently, we are limited in our ability to manage the timeouts (timeout on big files #197) that happen during transfers/checksum calculations. Performing the operations asynchronously might help with this.

If we want to do this properly and solve all the limitations, this would require quite a bit of work (and possibly help from the python-irodsclient). We could also consider solutions that would only resolve part of the limitations.

@qubixes qubixes added the enhancement New feature or request label Jun 21, 2024
@qubixes qubixes added this to the v2.0 milestone Jun 21, 2024
@trel
Copy link
Contributor

trel commented Jun 21, 2024

Sounds like a client-side transfer manager. A queue of files-to-send/receive and a set of worker threads to do that work in parallel, and some knobs to control the number of workers / pool size / min/max.

Better progress indicators could be provided by PRC... happy to discuss.

@qubixes
Copy link
Collaborator Author

qubixes commented Jun 21, 2024

@trel Yes, that would be a client-side transfer manager. The alternative might be to just send all requests asynchronously, but with millions of files/data objects that might result in some issues (depending on the server configuration).

Thanks for being open to better progress indicators in the PRC! Let's discuss it when we know a little bit better what we actually need. I will also look into some detail how it could be integrated into PRC.

@trel
Copy link
Contributor

trel commented Jun 29, 2024

The other reason a transfer manager is preferable over async requests.... is the ability for each worker to retry a number of times and/or to have a counter and just throw itself back in the queue instead of waiting for time to pass.

@qubixes
Copy link
Collaborator Author

qubixes commented Jul 1, 2024

@trel Yes, I agree.

@qubixes qubixes self-assigned this Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants