Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYTHON-3587 Do not perform server selection to determine sessions support #1491

Merged
merged 16 commits into from
Feb 5, 2024

Conversation

NoahStapp
Copy link
Contributor

No description provided.

@NoahStapp NoahStapp requested a review from a team as a code owner January 30, 2024 21:34
@NoahStapp NoahStapp requested review from caseyclements and removed request for a team January 30, 2024 21:34
Comment on lines 1101 to 1112
# Although the Driver Sessions Spec says we only clear stale sessions
# in return_server_session, PyMongo can't take a lock when returning
# sessions from a __del__ method (like in Cursor.__die), so it can't
# clear stale sessions there. In case many sessions were returned via
# __del__, check for stale sessions here too.
self._clear_stale(session_timeout_minutes)

def get_server_session(self) -> _ServerSession:
# The most recently used sessions are on the left.
while self:
s = self.popleft()
if not s.timed_out(session_timeout_minutes):
return s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary because we no longer have access to session_timeout_minutes at the time of session creation. Is it feasible to move this timeout check to the moment of session application instead, where we have the underlying connection and access to the timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this method the same but delay calling it until ClientSession._materialize() is called.

Comment on lines 1101 to 1112
# Although the Driver Sessions Spec says we only clear stale sessions
# in return_server_session, PyMongo can't take a lock when returning
# sessions from a __del__ method (like in Cursor.__die), so it can't
# clear stale sessions there. In case many sessions were returned via
# __del__, check for stale sessions here too.
self._clear_stale(session_timeout_minutes)

def get_server_session(self) -> _ServerSession:
# The most recently used sessions are on the left.
while self:
s = self.popleft()
if not s.timed_out(session_timeout_minutes):
return s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this method the same but delay calling it until ClientSession._materialize() is called.

if implicit:
self._topology._check_implicit_session_support()
server_session: Union[_EmptyServerSession, _ServerSession] = _EmptyServerSession()
else:
server_session = self._get_server_session()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have both implicit and explicit sessions start with _EmptyServerSession(). That way we can check logicalSessionTimeoutMinutes on first use in _materialize().

@@ -243,8 +243,7 @@ def is_server_type_known(self) -> bool:
def retryable_writes_supported(self) -> bool:
"""Checks if this server supports retryable writes."""
return (
self._ls_timeout_minutes is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove _ls_timeout_minutes and all code for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, there's no more use for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, according to the spec https://github.com/mongodb/specifications/blob/master/source/sessions/driver-sessions.rst, we still need to store the lowest logicalSessionTimeoutMinutes value in the ToplogyDescription.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we still need to track it so we can use the lowest value for the server session cache expiration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bingo.

@@ -609,8 +609,7 @@ def _check_session_support(self) -> float:
def get_server_session(self) -> _ServerSession:
"""Start or resume a server session, or raise ConfigurationError."""
with self._lock:
session_timeout = self._check_session_support()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove _check_session_support?

MONGOCRYPTD_PORT = 27020

@classmethod
@client_context.require_sessions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for servers that don't support sessions, but require_sessions does the opposite. Can we remove this?


def setUp(self) -> None:
self.listener = OvertCommandListener()
self.listener.reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this reset() call since the listener was just created.

def test_explicit_session_errors_when_unsupported(self):
self.listener.reset()
with self.mongocryptd_client.start_session() as s:
with self.assertRaises(ConfigurationError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use assertRaisesRegex to assert the error message too.

self.mongocryptd_client.db.test.find_one(session=s)
with self.assertRaises(ConfigurationError):
self.mongocryptd_client.db.test.insert_one({"x": 1}, session=s)
self.mongocryptd_client.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of closing at the end of each test I suggest using self.addCleanup(self.mongocryptd_client.close) right after creating the client in setup.

@@ -22,6 +22,7 @@
import socket
import socketserver
import ssl
import subprocess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused.

@@ -965,10 +965,12 @@ def _txn_read_preference(self) -> Optional[_ServerMode]:
return self._transaction.opts.read_preference
return None

def _materialize(self) -> None:
def _materialize(self, logical_session_timeout_minutes: float) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float -> int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code has session_timeout_minutes as a float, is that an unfixed mistake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was a mistake. It's int everywhere else.

@@ -1097,7 +1104,9 @@ def pop_all(self) -> list[_ServerSession]:
ids.append(self.pop().session_id)
return ids

def get_server_session(self, session_timeout_minutes: float) -> _ServerSession:
def get_server_session(
self, session_timeout_minutes: float, old: _EmptyServerSession
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float -> int.

@@ -1111,7 +1120,7 @@ def get_server_session(self, session_timeout_minutes: float) -> _ServerSession:
if not s.timed_out(session_timeout_minutes):
return s

return _ServerSession(self.generation)
return _ServerSession(self.generation, old.session_id)
Copy link
Member

@ShaneHarvey ShaneHarvey Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cache is empty we'll create a new session with the same id but if the cache isn't empty we'll return an existing session with a different id. What problem are we trying to solve here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point if the cache is not empty. Our unified test runner expects that sessions have an id on creation, with several tests relying on that behavior. Adding a simple ping command after creation to populate the session id emits unexpected events that cause those same tests to fail.

Copy link
Member

@ShaneHarvey ShaneHarvey Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I think the only approach is that accessing the session_id property needs to materialize the session too either without checking the lstm field or using the topologies lstm (either way without doing server selection).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite understand what you mean. You're saying we need to have the session materialize when session_id is accessed, but without actually checking if sessions are supported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The session support check would still happen later in _apply_to.

pymongo/pool.py Outdated
@@ -870,6 +871,7 @@ def _hello(
self.max_message_size = hello.max_message_size
self.max_write_batch_size = hello.max_write_batch_size
self.supports_sessions = hello.logical_session_timeout_minutes is not None
self.logical_session_timeout_minutes = hello.logical_session_timeout_minutes or 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or 0 since logical_session_timeout_minutes is an int.

@@ -116,16 +116,12 @@ def _test_ops(self, client, *ops):

for f, args, kw in ops:
with client.start_session() as s:
last_use = s._server_session.last_use
start = time.monotonic()
self.assertLessEqual(last_use, start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other tests for last_use? Are we loosing coverage here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can fix the issue by materializing the session before accessing last_use.

pymongo/client_session.py Outdated Show resolved Hide resolved
if self._retryable and not supports_session:
sessions_supported = (
self._retryable
and self._server.description.retryable_writes_supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just check conn.supports_sessions instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove two of the variables in the check, the remaining two are required for tests to pass.

with self._client._checkout(self._server, self._session) as conn:
max_wire_version = conn.max_wire_version
if self._retryable and not supports_session:
sessions_supported = (
self._server.description.retryable_writes_supported and conn.supports_sessions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is self._session never None anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back for safety.

@@ -245,10 +245,12 @@ def test_pool_lifo(self):
b.end_session()

s = self.client.start_session()
self.client.admin.command("ping", session=s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these pings now that s.session_id materializes the session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this test requires the sessions actually enter the pool, which materializing itself does not do. The test fails without these pings, sadly.

Copy link
Member

@ShaneHarvey ShaneHarvey Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain? It appears this test is only checking session_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test checks that session a and b are both in the session pool after they are ended, with b at the top due to LIFO ordering. We expect the next session created to be b, but since session creation no longer interacts with the pool unless a timeout is present, materializing the new session instead of using a ping results in a new session entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a bug in get_server_session. After it's fixed we should remove the pings here.

@@ -274,6 +276,8 @@ def test_end_sessions(self):
client = rs_or_single_client(event_listeners=[listener])
# Start many sessions.
sessions = [client.start_session() for _ in range(_MAX_END_SESSIONS + 1)]
for s in sessions:
client.admin.command("ping", session=s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gonna make the test take a long time. Can we instead just do s.session_id?

while self:
s = self.popleft()
if not s.timed_out(session_timeout_minutes):
return s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to acquire a session from the pool here even if the timeout is None.

@@ -245,10 +245,12 @@ def test_pool_lifo(self):
b.end_session()

s = self.client.start_session()
self.client.admin.command("ping", session=s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a bug in get_server_session. After it's fixed we should remove the pings here.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you schedule the mockupdb tests in EVG? It would also be good to run the motor test suite after merging this to see if anything breaks there.

if session_timeout_minutes is None:
return False
else:
idle_seconds = time.monotonic() - self.last_use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since the if-statement returns we don't need the else. The benefit is that:

  1. code is easier to read with less indentation
  2. there will be less code churn in the PR and in the git history.

What do you think?

@NoahStapp
Copy link
Contributor Author

Could you schedule the mockupdb tests in EVG? It would also be good to run the motor test suite after merging this to see if anything breaks there.

mockupdb tests timed out on test_rsghost: https://spruce.mongodb.com/version/65bd38bbc9ec443a931ddce7/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Feb 2, 2024

Looks like the test gets stuck attempting to run endSessions when closing the client in test_rsghost:

 [2024/02/02 11:19:49.085] Thread 0x00007f64de62f740 (most recent call first):
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/socket_checker.py", line 66 in select
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 321 in wait_for_read
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 340 in _receive_data_on_socket
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 263 in receive_message
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 186 in command
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/pool.py", line 997 in command
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/helpers.py", line 322 in inner
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/mongo_client.py", line 1193 in _end_sessions
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/mongo_client.py", line 1216 in close
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/mongo_client.py", line 2112 in __exit__
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/test/mockupdb/test_rsghost.py", line 56 in test_rsghost

@NoahStapp
Copy link
Contributor Author

Looks like the test gets stuck attempting to run endSessions when closing the client in test_rsghost:

 [2024/02/02 11:19:49.085] Thread 0x00007f64de62f740 (most recent call first):
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/socket_checker.py", line 66 in select
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 321 in wait_for_read
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 340 in _receive_data_on_socket
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 263 in receive_message
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/network.py", line 186 in command
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/pool.py", line 997 in command
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/helpers.py", line 322 in inner
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/mongo_client.py", line 1193 in _end_sessions
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/mongo_client.py", line 1216 in close
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/pymongo/mongo_client.py", line 2112 in __exit__
 [2024/02/02 11:19:49.085]   File "/data/mci/1adae7ce73c33f49ebb23345c8812e53/src/test/mockupdb/test_rsghost.py", line 56 in test_rsghost

Yeah, while sending the endSessions commands back to the server.

@ShaneHarvey
Copy link
Member

test_rsghost needs to set up an auto responder for endSessions. Although I wonder why the endSession was not needed before.

@ShaneHarvey
Copy link
Member

Previously we never sent sessions to RSGhost servers but now we do. That could be a problem.

@NoahStapp
Copy link
Contributor Author

Previously we never sent sessions to RSGhost servers but now we do. That could be a problem.

We do now because of the sessions changes? That shouldn't happen, should it?

@NoahStapp
Copy link
Contributor Author

pymongo/pool.py Outdated
@@ -752,6 +752,7 @@ def __init__(
self.active = False
self.last_timeout = self.opts.socket_timeout
self.connect_rtt = 0.0
self.logical_session_timeout_minutes: int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's default this to None or avoid confusing the 0 and None cases. (It's technically possible to set logical_session_timeout_minutes=0.)

pymongo/pool.py Outdated
self.supports_sessions = (
hello.logical_session_timeout_minutes is not None and hello.is_readable
)
self.logical_session_timeout_minutes = hello.logical_session_timeout_minutes or 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the or 0.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NoahStapp NoahStapp merged commit 62c6d0f into mongodb:master Feb 5, 2024
68 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants