-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Remove database differentiation in _get_state_groups_from_groups_txn
- SQLite supports recursive queries
#14531
Conversation
# should be the same. | ||
sql = """ | ||
WITH RECURSIVE sgs(state_group) AS ( | ||
VALUES(CAST(? AS bigint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only change to this code here is changing the cast to be compatible between SQLite and Postgres.
SQLite chokes on the previous ?::bigint
syntax
typ, state_key, event_id = row | ||
key = (intern_string(typ), intern_string(state_key)) | ||
results[group][key] = event_id | ||
overall_select_clause = " UNION ".join(select_clause_list) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is a bit weird to follow but we're just removing the whole SQLite specific else
and unindenting the Postgres one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW https://github.com/matrix-org/synapse/pull/14531/files?w=1 shows it pretty clearly!
@@ -98,153 +98,100 @@ def _get_state_groups_from_groups_txn( | |||
# a temporary hack until we can add the right indices in | |||
txn.execute("SET LOCAL enable_seqscan=off") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting is Postgres specific so it's the only thing left that is gated.
Do we have to worry about this in SQLite? I don't see a similar setting for SQLite
I think we have the correct indexes in place and this is just a hint to use those indexes.
Is this worth resurrecting? |
else: | ||
max_entries_returned = state_filter.max_entries_returned() | ||
|
||
where_clause, where_args = state_filter.make_sql_filter_clause() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth resurrecting?
@DMRobertson It seems possible to accomplish but there are a few more hurdles. If you're keen to take it over, feel free. I won't be looking at it until I get more backend facing again.
…get_state_groups_from_groups_txn
See #14531 (comment) This does mean that I had to leave one SQLite specific query in place. But at least we're using the same recursive query pattern now.
# Small helper function to wrap the union clause in parenthesis if we're | ||
# using postgres. This is because SQLite doesn't allow `LIMIT`/`ORDER` | ||
# clauses in the union subquery but postgres does as long as they are | ||
# wrapped in parenthesis which this function handles the complexity of | ||
# handling. | ||
def wrap_union_if_postgres( | ||
union_clause: str, extra_order_or_limit_clause: str = "" | ||
) -> str: | ||
if isinstance(self.database_engine, PostgresEngine): | ||
return f"""({union_clause} {extra_order_or_limit_clause})""" | ||
|
||
return union_clause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better just to define this function somewhere above once but thought it lost the specific usage context when we put it somewhere else. Feel free to poke ⏩
# SQLite doesn't support `SELECT DISTINCT ON`, so we have to just get | ||
# some potential duplicate (type, state_key) pairs and then only use the | ||
# first of each kind we see. | ||
overall_select_clause = f""" | ||
SELECT type, state_key, event_id | ||
FROM state_groups_state | ||
WHERE state_group IN ( | ||
SELECT state_group FROM sgs | ||
) {where_clause} | ||
ORDER BY type, state_key, state_group DESC | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query is exactly the same as the Postgres version above except for the DISTINCT ON (type, state_key)
part at the beginning.
Any interest in sharing the bulk of the query here? Not sure it would be more clear
# If the number of entries in the (type,state_key)->event_id dict | ||
# matches the number of (type,state_keys) types we were searching | ||
# for, then we must have found them all, so no need to go walk | ||
# further down the tree... UNLESS our types filter contained | ||
# wildcards (i.e. Nones) in which case we have to do an exhaustive | ||
# search | ||
if ( | ||
max_entries_returned is not None | ||
and len(results[group]) == max_entries_returned | ||
): | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current consolidation, we're losing this shortcut when running with SQLite (and the shortcut is probably pretty useful) 😬
And as mentioned in another comment, also conscious that some of these subtle hacks to consolidate might not be better than the big if-else
distinction between Postgres and SQLite we had before 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing as I'm not convinced this PR is better now that we've reached this point where things work. It was looking very promising before these latest changes necessitated by SQLite not supporting SELECT DISTINCT ON
syntax.
It is less code especially considering comments added but wouldn't consider it easier to reason about and we lose the SQLite optimization mentioned above.
( | ||
SELECT DISTINCT ON (type, state_key) | ||
type, state_key, event_id | ||
FROM state_groups_state | ||
INNER JOIN sgs USING (state_group) | ||
WHERE {where_clause} | ||
ORDER BY type, state_key, state_group DESC | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One improvement that I thought I could take out of this was simplifying these conditions like the following but it seems to perform about the same (at least in my one example I use against matrix.org
) -> https://explain.depesz.com/s/pWH9
( | |
SELECT DISTINCT ON (type, state_key) | |
type, state_key, event_id | |
FROM state_groups_state | |
INNER JOIN sgs USING (state_group) | |
WHERE {where_clause} | |
ORDER BY type, state_key, state_group DESC | |
) | |
( | |
SELECT type, state_key, event_id | |
FROM state_groups_state | |
INNER JOIN sgs USING (state_group) | |
WHERE {where_clause} | |
ORDER BY type, state_key, state_group DESC | |
LIMIT 1 | |
) |
Raw query and EXPLAIN ANALYZE
EXPLAIN ANALYZE WITH RECURSIVE sgs(state_group) AS (
VALUES(739988088::bigint)
UNION ALL
SELECT prev_state_group FROM state_group_edges e, sgs s
WHERE s.state_group = e.state_group
)
(
SELECT type, state_key, event_id, state_group
FROM state_groups_state
INNER JOIN sgs USING (state_group)
WHERE (type = 'm.room.history_visibility' AND state_key = '')
ORDER BY type, state_key, state_group DESC
LIMIT 1
)
UNION
(
SELECT type, state_key, event_id, state_group
FROM state_groups_state
INNER JOIN sgs USING (state_group)
WHERE (type = 'm.room.member' AND state_key = '@madlittlemods:matrix.org')
ORDER BY type, state_key, state_group DESC
LIMIT 1
);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1272.13..1272.15 rows=2 width=104) (actual time=1.707..1.717 rows=2 loops=1)
CTE sgs
-> Recursive Union (cost=0.00..284.28 rows=101 width=8) (actual time=0.004..0.504 rows=59 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
-> Nested Loop (cost=0.57..28.23 rows=10 width=8) (actual time=0.007..0.007 rows=1 loops=59)
-> WorkTable Scan on sgs s (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.000 rows=1 loops=59)
-> Index Only Scan using state_group_edges_unique_idx on state_group_edges e (cost=0.57..2.79 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=59)
Index Cond: (state_group = s.state_group)
Heap Fetches: 0
-> Sort (cost=987.85..987.85 rows=2 width=104) (actual time=1.705..1.708 rows=2 loops=1)
Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.event_id, state_groups_state.state_group
Sort Method: quicksort Memory: 25kB
-> Append (cost=493.90..987.84 rows=2 width=104) (actual time=1.136..1.545 rows=2 loops=1)
-> Limit (cost=493.90..493.90 rows=1 width=91) (actual time=1.134..1.136 rows=1 loops=1)
-> Sort (cost=493.90..493.90 rows=1 width=91) (actual time=1.133..1.134 rows=1 loops=1)
Sort Key: state_groups_state.state_group DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.84..493.89 rows=1 width=91) (actual time=1.111..1.122 rows=1 loops=1)
-> CTE Scan on sgs (cost=0.00..2.02 rows=101 width=8) (actual time=0.007..0.532 rows=59 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.84..4.86 rows=1 width=91) (actual time=0.009..0.009 rows=0 loops=59)
Index Cond: ((state_group = sgs.state_group) AND (type = 'm.room.history_visibility'::text) AND (state_key = ''::text))
-> Limit (cost=493.90..493.90 rows=1 width=91) (actual time=0.405..0.406 rows=1 loops=1)
-> Sort (cost=493.90..493.90 rows=1 width=91) (actual time=0.404..0.405 rows=1 loops=1)
Sort Key: state_groups_state_1.state_group DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.84..493.89 rows=1 width=91) (actual time=0.398..0.399 rows=1 loops=1)
-> CTE Scan on sgs sgs_1 (cost=0.00..2.02 rows=101 width=8) (actual time=0.000..0.013 rows=59 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state state_groups_state_1 (cost=0.84..4.86 rows=1 width=91) (actual time=0.006..0.006 rows=0 loops=59)
Index Cond: ((state_group = sgs_1.state_group) AND (type = 'm.room.member'::text) AND (state_key = '@madlittlemods:matrix.org'::text))
Planning Time: 7.594 ms
Execution Time: 3.023 ms
(31 rows)
Compare to the after query performance from the previous PR, #14527 -> https://explain.depesz.com/s/Qi4A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I already explored this in #14494 (comment) (see Other exploration section) and came to the same conclusion
Remove database differentiation (SQLite vs Postgres specific code) in
_get_state_groups_from_groups_txn
(_get_state_groups_from_groups
). We can use recursive queries in our supported SQLite version rangeFollow-up to #14527
Dev notes
Testing queries against the
matrix.org
database: https://gitlab.matrix.org/new-vector/internal/-/wikis/Matrix.org-Synapse-ops#access-matrixorg-synapses-databaseQuery test dummy data
Postgres vs SQLite:
Recursive queries work just fine
Recursive queries are supported in "SQLite 3.8.3 or higher". Our minimum supported SQLite version is 3.27.0
SQLite doesn't support
?::bigint
Solution: Use
cast(? as bigint)
✔️SQLite doesn't support
SELECT DISTINCT ON
These nice
SELECT DISTINCT ON (a, b) a, b, c
queries that work in postgres are not supported in SQLite which make our lives difficult.There are many context-specific ways to workaround/ignore these problems. One way is just to not get the distinct results in favor of sorting them and then only care about the first row result. But this means you're transporting a lot of duplicate pairs from the database back to your app just to ignore.
If you're lucky, maybe you can use
GROUP BY
if the columns you group by are the same ones you want to select (doesn't work in our case since we want to group by(type, state_key)
but selecttype, state_key, event_id
.Or maybe you want to complicate things with a bunch of sub-queries.
Other references:
SELECT DISTINCT ON
was WONTFIXed in SQLite: https://code.djangoproject.com/ticket/22696" (https://stackoverflow.com/a/71924314/796832)SQLite doesn't support parenthesis around
UNION
subqueries which also means no per-subqueryORDER
/LIMIT
SQLite doesn't like parenthesis around each clause like
(select_clause_A) UNION (select_clause_B)
->sqlite3.OperationalError: near "(": syntax error
If you're just doing a
SELECT
withoutORDER
/LIMIT
, then you can easily just remove the parenthesis grouping. But you're kinda stuck if you wanted the per-subqueryORDER
/LIMIT
. You canORDER
/LIMIT
at the end after theUNION
but it's not the same.Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off(run the linters)