-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeout on big files #197
Comments
Note that it doesn't work if you add the timeout after the line |
True. That said, if you invoke cleanup( ) on the session object , the increased time out should always take effect. This empties the idle and active pools of existing connections.
|
@wurDevTim Thanks for the bug report. From the discussion and reading through the PRC code, if I understand correctly, the following happens:
As for the solution, I'm not sure what the best durable solution would be. As far as I understand, there is no mechanism to ensure that connections in the pool will have the new connection_timeout, since old/idle connections can be reused. My first idea would be to setup a whole new connection pool when iBridges detects a file that might be too large. This pool would have the new connection timeout then. Another option would be to reset the session (with the For now, a workaround would be to increase the connection_timeout in the |
I might have another suggestion, have a look here: https://github.com/irods/python-irodsclient/blob/c3963c21914ac859a9ae65d5834681c566d5ada3/irods/pool.py#L65 Adding iBridges/ibridges/data_operations.py Line 389 in d7d0a34 did not work in my tests, I also forsee issues with asynchronous transfers. Therefor I have not ran any additional tests. Adding "connection_timeout": 250000 to the json does solve the issue, while not ideal it can be used as a workaround. It has the same effect as setting the connection_timeout of the session object directly after its creation.
|
@wurDevTim What did you try exactly in your tests? I would indeed from the code expect that if |
@qubixes my main challenge with debugging further is that I am not sure if what we expect is indeed the intended behavior of the 'refresh_time'. If it is, than it looks like we found another bug. @d-w-moore can you shed some light on this? A description of my tests: Test 2 (newly added) Test 3 Test 4 |
*Setup
Server: ubuntu 20, running
iRODS 4.3.1
Client: Windows 10 VM
PRC 2.0.1
ibridges 0.2.2
As the issue is time related it might help to know the connection to the iRODS server is 1GB/sec which is fully utilized during the transfer.
Background information
The default connection timeout in the PRC is 120 seconds: https://github.com/irods/python-irodsclient/blob/c3963c21914ac859a9ae65d5834681c566d5ada3/irods/__init__.py#L43C33-L44C1
It is known that this leads to timeout during the checksum calculation of larger files:
irods/python-irodsclient#564
As a solution the following lines have been added in iBridges:
iBridges/ibridges/data_operations.py
Line 383 in 9ff0d55
Issue
When we perform data operations, like checking if the object already exists two sockets will be created with the default timeout. If we than upload a large file, 50Gb still goes fine. At 100GB we consistently see this issue (we did not look for the exact file size the issue starts.)
A breakpoint at irods/connection.py line 245
informs us that two new sockets are created when the upload starts, both with a timeout of 12885.
Next the file uploads as expected. When the transfer is complete and the server starts the checksum calculation we get this network exception (after exactly 2 minutes), the socket which throws the error has a timeout of 120 seconds as shown in the image.
And if we wait a few minutes we see the checksum appears in iRODS:
icommands: ils -l
/xx/home/xx:
xx 0 hot_1; 8589934588 2024-06-20.08:53 & 100GB_file.txt
example code
If we modify iBridges by adding:
too:
iBridges/ibridges/session.py
Line 201 in 50ab693
and
iBridges/ibridges/session.py
Line 213 in 50ab693
We see that all sockets get the higher timeout and we don't experience any network exceptions.
The text was updated successfully, but these errors were encountered: