-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bridge: call connection_lost "soon" #20881
bridge: call connection_lost "soon" #20881
Conversation
Note: in #20880 I speculated that this might be behind the issue we had with fswatch1 being "closed twice". ie: my idea was that it wasn't closed twice, but had actually never finished being opened. I don't think that this is true, though, and here's why: Imagine a scenario where someone creates a fswatch1 channel and immediately slams the transport closed. Then when the def do_open(self, options: JsonObject) -> None:
self._path = get_str(options, 'path')
self._tag = None
self._active = False
self._watch = PathWatch(self._path, self)
self._active = True
self.ready() ie: by the time we call For that reason, I didn't revert #20634 here. |
I smoke-tested this against c-files (see cockpit-project/bots#6730 (comment)) After a few iterations I still get the crash which the first commit in #20880 fixed:
For devensiveness we really should initialize this variable. |
Yes. This was meant as an alternative for the second commit, indeed. The uninitialized variable is still wrong. |
The standard Python asyncio Transport classes protect their caller from having to deal with exceptions when calling .write(). In case an exception does occur, they capture it and close the transport, passing the exception on to the connection_lost() handler. This makes it a lot easier to interact with the transport: because OS-level errors are handled out of band, protocols don't have to consider that any write might fail. Our Transport classes do the same thing, but there's an important difference: the standard library defers the call to .connection_lost(), but our transports invoke it immediately. This creates a weird scenario whereby your connection_lost() handler can get called in the middle of your connection_made() handler, if you should happen to do a .write() there which fails. Since connection_lost() reasonably assumes that connection_made() has run, you can get into all kinds of trouble. Worst of all, once .connection_lost() is done running, control flow returns back to your .connection_made() handler, which continues running as if nothing has happened in the meantime. Let's adopt the standard Python behaviour here and defer the .connection_lost() call. We need to adjust some unit tests which assumed the old behaviour: adjust them to ensure that we specifically *don't* have that behaviour anymore. This is something that I always knew we did differently than the Python standard library, but I didn't understand the purpose of the additional complexity until now. Thanks goes to Martin for the initial research to uncover this issue.
This is the only thing in the class which isn't initialized and it's causing issues. Fix that. Thanks for tracking this down go to Martin as well :)
fd489d8
to
562ae22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers! Stop bridge from crashing beats pitti being grumpy about class variables.
686a5d3
into
cockpit-project:main
The standard Python asyncio Transport classes protect their caller from having to deal with exceptions when calling .write(). In case an exception does occur, they capture it and close the transport, passing the exception on to the connection_lost() handler. This makes it a lot easier to interact with the transport: because OS-level errors are handled out of band, protocols don't have to consider that any write might fail.
Our Transport classes do the same thing, but there's an important difference: the standard library defers the call to .connection_lost(), but our transports invoke it immediately.
This creates a weird scenario whereby your connection_lost() handler can get called in the middle of your connection_made() handler, if you should happen to do a .write() there which fails. Since connection_lost() reasonably assumes that connection_made() has run, you can get into all kinds of trouble. Worst of all, once .connection_lost() is done running, control flow returns back to your .connection_made() handler, which continues running as if nothing has happened in the meantime.
Let's adopt the standard Python behaviour here and defer the .connection_lost() call. We need to adjust some unit tests which assumed the old behaviour: adjust them to ensure that we specifically don't have that behaviour anymore.