Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Compute new chunks for new events #3240

Merged
merged 8 commits into from
May 31, 2018
Merged

Conversation

erikjohnston
Copy link
Member

We also calculate a consistent topological ordering within a chunk, but it isn't used yet.


sibling_events.update(pes)

self.table = ChunkDBOrderedListStore(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be assigning on self...

@erikjohnston erikjohnston force-pushed the erikj/events_chunks branch 2 times, most recently from f9a5e36 to f533a9e Compare May 23, 2018 09:56
We also calculate a consistent topological ordering within a chunk, but
it isn't used yet.
def _compute_chunk_id_txn(self, txn, room_id, event_id, prev_event_ids):
"""Computes the chunk ID and topological ordering for an event.

Also handles updating chunk_graph table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very surprising in a function called _compute_chunk_id. Rename the function?

@@ -1108,12 +1115,21 @@ def event_dict(event):
],
)

if event.internal_metadata.is_outlier():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this need to loop over events_and_contexts ?

#
# 1. If all prev events have the same chunk ID then use that chunk ID
# 2. If we have none of the prev events but do have events pointing to
# it, then we use their chunk ID if:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "it" here?

# 1. If all prev events have the same chunk ID then use that chunk ID
# 2. If we have none of the prev events but do have events pointing to
# it, then we use their chunk ID if:
# - They’re all in the same chunk, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart ' is too smart

prev_chunk_ids = set()

for eid in prev_event_ids:
chunk_id = self._simple_select_one_onecol_txn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a general comment in this function, all these db hits look sloooow. do you plan to go via caches at some point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queries should all hit indices and so should be fast, though yeah, the number of them about does mean even the RTT will start adding up. I'm not sure how much caches will help tbh, as for most cases what we fetch will change each time (though we could possible prefill the caches).

Really, I'd quite like to split a lot of the persist_event logic out into per room logic, so that if something goes slow for a particular room it won't block events in other rooms being persisted. I.e., when persisting an event it first gets added to a per room queue to have chunk/current_state/etc calculated, and then that result gets fed into the persist event queue. Maybe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup I'm worried about the RTT, and expecting that we ought to have prefilled caches in the common case.

table="event_edges",
keyvalues={
"event_id": eid,
"is_state": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, why

"room_id": room_id,
"chunk_id": chunk_id,
},
retcol="COALESCE(MAX(topological_ordering), 0)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit uneasy about the coalesce here. surely if we've got NULLs in here then this will give bad results, and we should fail loudly rather than subtly?

# ChunkDBOrderedListStore about that.
table.add_node(chunk_id)

# We need to now update the database with any new edges between chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth trying to optimise this, depending on which path we've taken above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example 1: if it's a new chunk, it's not going to have any existing edges
example 2: if we've established that there is exactly one prev_chunk_id, then we know that we do not need to add any new prev_chunk edges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point well made

if fid not in current_forward_ids and fid != chunk_id
)

if prev_chunk_ids:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this condition redundant?

retcol="chunk_id",
)

prev_chunk_ids = set(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we bothering to build a new set here rather than just iterating through prev_chunk_ids at line 1517?

(and if you are going to build a new set, could it have a different name?)

@richvdh richvdh assigned erikjohnston and unassigned richvdh May 30, 2018
@erikjohnston erikjohnston assigned richvdh and unassigned erikjohnston May 30, 2018
@richvdh
Copy link
Member

richvdh commented May 30, 2018

lgtm modulo comments above

@erikjohnston
Copy link
Member Author

Conclusion is to come back and look at performance in a separate PR

@erikjohnston erikjohnston merged commit 867132f into erikj/room_chunks May 31, 2018
@hawkowl hawkowl deleted the erikj/events_chunks branch September 20, 2018 14:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants