Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Mitigate a race where /make_join could 403 for restricted rooms #15080

Merged
merged 1 commit into from
Feb 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/15080.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Reduce the likelihood of a rare race condition where rejoining a restricted room over federation would fail.
16 changes: 15 additions & 1 deletion synapse/handlers/federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -952,7 +952,20 @@ async def on_make_join_request(
#
# Note that this requires the /send_join request to come back to the
# same server.
prev_event_ids = None
if room_version.msc3083_join_rules:
# Note that the room's state can change out from under us and render our
# nice join rules-conformant event non-conformant by the time we build the
# event. When this happens, our validation at the end fails and we respond
# to the requesting server with a 403, which is misleading — it indicates
# that the user is not allowed to join the room and the joining server
# should not bother retrying via this homeserver or any others, when
# in fact we've just messed up with building the event.
#
# To reduce the likelihood of this race, we capture the forward extremities
# of the room (prev_event_ids) just before fetching the current state, and
Comment on lines +965 to +966
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantry: there can be at most 20 prev events in a given event, so in the worst case the prev events are a subset of the forward extremities of the room. I don't think we need to say that here (but I couldn't resist pointing it out now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying, since the prev events we feed in won't always be representative of the room state the event is evaluated against.

Though I note that right now we hit an assert if there are more than 10 prev events.

if prev_event_ids is not None:
assert (
len(prev_event_ids) <= 10
), "Attempting to create an event with %i prev_events" % (
len(prev_event_ids),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. FWIW The spec says

Must contain less than or equal to 20 events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying, since the prev events we feed in won't always be representative of the room state the event is evaluated against.

It turns out that get_prev_events_for_room only returns up to 10 forward extremities, so the prev events we feed in to the event builder will be the ones used. I hadn't quite realised that get_prev_events_for_room was subtly different to fetching the forward extremities.

def _get_prev_events_for_room_txn(
self, txn: LoggingTransaction, room_id: str
) -> List[str]:
# we just use the 10 newest events. Older events will become
# prev_events of future events.
sql = """
SELECT e.event_id FROM event_forward_extremities AS f
INNER JOIN events AS e USING (event_id)
WHERE f.room_id = ?
ORDER BY e.depth DESC
LIMIT 10
"""

# hope that the state we fetch corresponds to the prev events we chose.
prev_event_ids = await self.store.get_prev_events_for_room(room_id)
state_ids = await self._state_storage_controller.get_current_state_ids(
room_id
)
Comment on lines 969 to 971
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to calculate the state off the prev events, but that looked like it would involve state resolution which is slow.

Expand Down Expand Up @@ -994,7 +1007,8 @@ async def on_make_join_request(
event,
unpersisted_context,
) = await self.event_creation_handler.create_new_client_event(
builder=builder
builder=builder,
prev_event_ids=prev_event_ids,
)
except SynapseError as e:
logger.warning("Failed to create join to %s because %s", room_id, e)
Expand Down