Compute new chunks for new events #3240

erikjohnston · 2018-05-18T14:37:45Z

We also calculate a consistent topological ordering within a chunk, but it isn't used yet.

erikjohnston · 2018-05-21T16:01:54Z

synapse/storage/events.py

+
+            sibling_events.update(pes)
+
+        self.table = ChunkDBOrderedListStore(


This shouldn't be assigning on self...

We also calculate a consistent topological ordering within a chunk, but it isn't used yet.

richvdh · 2018-05-30T08:00:36Z

synapse/storage/events.py

+    def _compute_chunk_id_txn(self, txn, room_id, event_id, prev_event_ids):
+        """Computes the chunk ID and topological ordering for an event.
+
+        Also handles updating chunk_graph table.


This is very surprising in a function called _compute_chunk_id. Rename the function?

richvdh · 2018-05-30T08:00:47Z

synapse/storage/events.py

@@ -1108,12 +1115,21 @@ def event_dict(event):
            ],
        )

+        if event.internal_metadata.is_outlier():


doesn't this need to loop over events_and_contexts ?

richvdh · 2018-05-30T08:17:23Z

synapse/storage/events.py

+        #
+        # 1. If all prev events have the same chunk ID then use that chunk ID
+        # 2. If we have none of the prev events but do have events pointing to
+        #    it, then we use their chunk ID if:


what is "it" here?

richvdh · 2018-05-30T08:17:35Z

synapse/storage/events.py

+        # 1. If all prev events have the same chunk ID then use that chunk ID
+        # 2. If we have none of the prev events but do have events pointing to
+        #    it, then we use their chunk ID if:
+        #     - They’re all in the same chunk, and


smart ' is too smart

richvdh · 2018-05-30T08:25:33Z

synapse/storage/events.py

+        prev_chunk_ids = set()
+
+        for eid in prev_event_ids:
+            chunk_id = self._simple_select_one_onecol_txn(


as a general comment in this function, all these db hits look sloooow. do you plan to go via caches at some point?

The queries should all hit indices and so should be fast, though yeah, the number of them about does mean even the RTT will start adding up. I'm not sure how much caches will help tbh, as for most cases what we fetch will change each time (though we could possible prefill the caches).

Really, I'd quite like to split a lot of the persist_event logic out into per room logic, so that if something goes slow for a particular room it won't block events in other rooms being persisted. I.e., when persisting an event it first gets added to a per room queue to have chunk/current_state/etc calculated, and then that result gets fed into the persist event queue. Maybe.

yup I'm worried about the RTT, and expecting that we ought to have prefilled caches in the common case.

richvdh · 2018-05-30T08:34:23Z

synapse/storage/events.py

+                table="event_edges",
+                keyvalues={
+                    "event_id": eid,
+                    "is_state": False,


richvdh · 2018-05-30T09:04:28Z

synapse/storage/events.py

+                    "room_id": room_id,
+                    "chunk_id": chunk_id,
+                },
+                retcol="COALESCE(MAX(topological_ordering), 0)",


bit uneasy about the coalesce here. surely if we've got NULLs in here then this will give bad results, and we should fail loudly rather than subtly?

richvdh · 2018-05-30T09:07:26Z

synapse/storage/events.py

+            # ChunkDBOrderedListStore about that.
+            table.add_node(chunk_id)
+
+        # We need to now update the database with any new edges between chunks


is it worth trying to optimise this, depending on which path we've taken above?

I'm not sure I see how?

example 1: if it's a new chunk, it's not going to have any existing edges
example 2: if we've established that there is exactly one prev_chunk_id, then we know that we do not need to add any new prev_chunk edges.

Ah, good point well made

richvdh · 2018-05-30T09:12:08Z

synapse/storage/events.py

+            if fid not in current_forward_ids and fid != chunk_id
+        )
+
+        if prev_chunk_ids:


isn't this condition redundant?

richvdh · 2018-05-30T09:13:39Z

synapse/storage/events.py

+            retcol="chunk_id",
+        )
+
+        prev_chunk_ids = set(


why are we bothering to build a new set here rather than just iterating through prev_chunk_ids at line 1517?

(and if you are going to build a new set, could it have a different name?)

richvdh · 2018-05-30T18:46:27Z

lgtm modulo comments above

erikjohnston · 2018-05-31T08:37:49Z

Conclusion is to come back and look at performance in a separate PR

erikjohnston assigned richvdh May 18, 2018

erikjohnston force-pushed the erikj/events_chunks branch from b035b79 to f9a5e36 Compare May 18, 2018 14:47

erikjohnston commented May 21, 2018

View reviewed changes

erikjohnston force-pushed the erikj/events_chunks branch 2 times, most recently from f9a5e36 to f533a9e Compare May 23, 2018 09:56

Compute new chunks for new events

13dbcaf

We also calculate a consistent topological ordering within a chunk, but it isn't used yet.

erikjohnston force-pushed the erikj/events_chunks branch from 2c492a3 to 13dbcaf Compare May 25, 2018 09:54

richvdh suggested changes May 30, 2018

View reviewed changes

richvdh assigned erikjohnston and unassigned richvdh May 30, 2018

erikjohnston added 7 commits May 30, 2018 11:30

Correctly loop over events_and_contexts

6c1d13a

Remove unnecessary set

1810cc3

Remove redundant conditions

1cdd0d3

Just iterate once rather than create a new set

ecd4931

Comments

f687d8f

Remove unnecessary COALESCE

9e1d3f1

Rename func to _insert_into_chunk_txn

3847313

erikjohnston assigned richvdh and unassigned erikjohnston May 30, 2018

richvdh approved these changes May 30, 2018

View reviewed changes

erikjohnston merged commit 867132f into erikj/room_chunks May 31, 2018

hawkowl deleted the erikj/events_chunks branch September 20, 2018 14:01

ara4n mentioned this pull request Jun 3, 2019

Proposal: mitigate extremities accumulation using lazy-transmitted dummy events #5319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute new chunks for new events #3240

Compute new chunks for new events #3240

erikjohnston commented May 18, 2018

erikjohnston May 21, 2018

richvdh May 30, 2018

richvdh May 30, 2018

richvdh May 30, 2018

richvdh May 30, 2018

richvdh May 30, 2018

erikjohnston May 30, 2018

richvdh May 30, 2018

richvdh May 30, 2018

richvdh May 30, 2018

richvdh May 30, 2018

erikjohnston May 30, 2018

richvdh May 30, 2018

erikjohnston May 31, 2018

richvdh May 30, 2018

richvdh May 30, 2018

richvdh commented May 30, 2018

erikjohnston commented May 31, 2018


		sibling_events.update(pes)

		self.table = ChunkDBOrderedListStore(

Compute new chunks for new events #3240

Compute new chunks for new events #3240

Conversation

erikjohnston commented May 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh commented May 30, 2018

erikjohnston commented May 31, 2018