-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
b035b79
to
f9a5e36
Compare
synapse/storage/events.py
Outdated
|
||
sibling_events.update(pes) | ||
|
||
self.table = ChunkDBOrderedListStore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be assigning on self
...
f9a5e36
to
f533a9e
Compare
We also calculate a consistent topological ordering within a chunk, but it isn't used yet.
2c492a3
to
13dbcaf
Compare
synapse/storage/events.py
Outdated
def _compute_chunk_id_txn(self, txn, room_id, event_id, prev_event_ids): | ||
"""Computes the chunk ID and topological ordering for an event. | ||
|
||
Also handles updating chunk_graph table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very surprising in a function called _compute_chunk_id
. Rename the function?
synapse/storage/events.py
Outdated
@@ -1108,12 +1115,21 @@ def event_dict(event): | |||
], | |||
) | |||
|
|||
if event.internal_metadata.is_outlier(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this need to loop over events_and_contexts
?
synapse/storage/events.py
Outdated
# | ||
# 1. If all prev events have the same chunk ID then use that chunk ID | ||
# 2. If we have none of the prev events but do have events pointing to | ||
# it, then we use their chunk ID if: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "it" here?
synapse/storage/events.py
Outdated
# 1. If all prev events have the same chunk ID then use that chunk ID | ||
# 2. If we have none of the prev events but do have events pointing to | ||
# it, then we use their chunk ID if: | ||
# - They’re all in the same chunk, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smart ' is too smart
prev_chunk_ids = set() | ||
|
||
for eid in prev_event_ids: | ||
chunk_id = self._simple_select_one_onecol_txn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a general comment in this function, all these db hits look sloooow. do you plan to go via caches at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queries should all hit indices and so should be fast, though yeah, the number of them about does mean even the RTT will start adding up. I'm not sure how much caches will help tbh, as for most cases what we fetch will change each time (though we could possible prefill the caches).
Really, I'd quite like to split a lot of the persist_event logic out into per room logic, so that if something goes slow for a particular room it won't block events in other rooms being persisted. I.e., when persisting an event it first gets added to a per room queue to have chunk/current_state/etc calculated, and then that result gets fed into the persist event queue. Maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup I'm worried about the RTT, and expecting that we ought to have prefilled caches in the common case.
table="event_edges", | ||
keyvalues={ | ||
"event_id": eid, | ||
"is_state": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, why
synapse/storage/events.py
Outdated
"room_id": room_id, | ||
"chunk_id": chunk_id, | ||
}, | ||
retcol="COALESCE(MAX(topological_ordering), 0)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit uneasy about the coalesce here. surely if we've got NULLs in here then this will give bad results, and we should fail loudly rather than subtly?
# ChunkDBOrderedListStore about that. | ||
table.add_node(chunk_id) | ||
|
||
# We need to now update the database with any new edges between chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth trying to optimise this, depending on which path we've taken above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example 1: if it's a new chunk, it's not going to have any existing edges
example 2: if we've established that there is exactly one prev_chunk_id, then we know that we do not need to add any new prev_chunk edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point well made
synapse/storage/events.py
Outdated
if fid not in current_forward_ids and fid != chunk_id | ||
) | ||
|
||
if prev_chunk_ids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this condition redundant?
synapse/storage/events.py
Outdated
retcol="chunk_id", | ||
) | ||
|
||
prev_chunk_ids = set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we bothering to build a new set here rather than just iterating through prev_chunk_ids
at line 1517?
(and if you are going to build a new set, could it have a different name?)
lgtm modulo comments above |
Conclusion is to come back and look at performance in a separate PR |
We also calculate a consistent topological ordering within a chunk, but it isn't used yet.