Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we do anything about timeouts for subparts of an operation? #686

Closed
alexshpilkin opened this issue Sep 27, 2018 · 8 comments
Closed

Comments

@alexshpilkin
Copy link

@njsmith has written rather a lot on timeouts and why they should be provided generically by the asyncronous processing library, instead of being passed as an additional parameter to the individual calls.

Imagine my dismay when I saw not one, but two separate timeout arguments in @theelous3’s asks HTTP client library (the only one currently available with trio support). And the accompanying discussion, frankly speaking, felt extremely underwhelming from the point of view of @njsmith’s argument (“what thing is to be accomplished” as opposed to “what sequence of things is going to be performed”).

... And then I realized there was—in my particular context—no obvious way to do better. There is, at any given moment in my application, a pool 10 to 100 HTTP connections managed by asks, and thousands of tasks trying to make an HTTP request. If I enclose the request/get/... call in trio’s move_on_after, the thing that would be time-limited is not the server’s response time, but that time plus the time to acquire a connection from the pool. Which might be what’s needed in a particular situation, but not in mine.

So, I guess my question is both to @njsmith and to @theelous3: how can trio’s timeout API can be usefully applied (or extended to apply) in the situation asks faces?

@alexshpilkin
Copy link
Author

As usual, a possible answer came to me while I was writing down the question in detail—not sure it is the answer, though:

Be honest about what’s happening and separate getting a connection from the pool from making the actual request, so that a timeout can be imposed on either one separately or on both at once. Not sure how to prevent user code from grabbing a connection and monopolizing it; but that not sure this needs to be prevented. This requires departing from a requests-like API, but... is that necessarily a problem when there is no de facto standard library for asynchronous HTTP (like requests is for synchronous HTTP)?

P.S.: That reminds me: this, of course, is equally relevant to the design of requests3, so maybe @kennethreitz has already thought of something.
P.P.S.: I also forgot to mention in the original post that this also appears to apply equally to anything that involves thread pools, which are currently in the design phase in trio IIUC.

@theelous3
Copy link

theelous3 commented Sep 27, 2018

To further your dismay, there are actually three. One to be applied for each iter of a stream's body.

We could do something nuts like expose a series of steps around which the user of asks can implement their own timeouts with their async lib of choice, but to what gain? Is it worth it?

Breaking it in to an optional second API that treats creating connections, getting them from the pool, sending the request, reading the headers and status line, reading the body etc.

Edit: fatfingered on mobile. Cont...

Treating the above steps individually we can get this "trio timeout" ethos in to asks, or any other lib.

Fundamentally we write libs to simplify. I don't think it's a great idea to remove the current style of api, but I'm also not against exposing a second more granular api for greater user control.

@smurfix
Copy link
Contributor

smurfix commented Sep 27, 2018 via email

@alexshpilkin
Copy link
Author

@theelous3 Well, if the question is I don’t like having a timeout argument in asks, the answer is explained at length in my first link above (incl. the specific case of the original requests library), but the short version is that the composability sucks, i.e. it becomes extremely tedious to say “I want such-and-such done in five seconds” when such-and-such involves async operations other than a single asks call, the common result being that nobody cares to do so (and ultimately software that appears to hang). Your example of a one-second network operation taking more like one and a half seconds, in particular, sounds like a bug to be fixed in the underlying network API rather than harsh reality to be dealt with. (Let’s assume we’re not CPU-bound and keep the OS’s preemptive scheduling out of the picture.)

If you’re finding it curious that I’m speaking about bunching things together when suggesting that the asks request call be split, well, so am I; but if we (as trio users) settled on design principle (here: don’t roll your own timeouts), we’d better apply it across the board, and my point is we can’t, really, given the current asks API.

@alexshpilkin
Copy link
Author

@smurfix Well, yeah, but this appears to be a separate point: reserving a connection explicitly, as opposed to separating the time to reserve from the time to make a request; I was specifically thinking of any way to do the latter without doing the former. The rabbit hole of a general API for reserving connections is just too deep: I’ll then want to reserve five connections out of the common pool and put them into a smaller pool to use on my own, or pool connections to separate domains separately (as web browsers commonly do), &c. I’d rather not try to solve that without an actual use case.

@njsmith
Copy link
Member

njsmith commented Sep 28, 2018

I totally get why asks has the connection limit (and aiohttp does something similar), but over time I'm coming to believe that this is just the wrong level to do connection limiting. If you set the asks connection pool to unlimited, and then use your own CapacityLimiter (which could be per-host or whatever), or some more generic tool to control concurrency level, then that part of the problem goes away.

I'm a bit less dogmatic about the "read timeout" then I used to be... it's absurd that there's no good way for requests to provide an overall timeout, but even once we solve that, the "read timeout" is still somewhat useful – there are lots of cases where it will successfully catch things, and you don't have to think about the size of the file being downloaded when you set it.

@oremanj oremanj changed the title There are timeouts and then there are timeouts Should we do anything about timeouts for subparts of an operation? May 1, 2019
@oremanj
Copy link
Member

oremanj commented May 1, 2019

Renamed this to reflect the open question. I think the answer is "no" -- individual APIs of outside libraries can take a timeout argument if there's some subpart they want to have a separate timeout on; I can't think of a way to make subpart timeouts "better" analogously to how Trio's cancel scopes make entire-operation timeouts "better". But leaving open for now in case anyone has a different opinion.

@Zac-HD
Copy link
Member

Zac-HD commented Mar 17, 2023

Closing because we haven't seen discussion of this - but if you've found this with an idea please do post it below!

@Zac-HD Zac-HD closed this as completed Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants