Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Remove database differentiation in _get_state_groups_from_groups_txn - SQLite supports recursive queries #14531

1 change: 1 addition & 0 deletions changelog.d/14531.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Remove database differentiation (SQLite vs Postgres specific code) in `_get_state_groups_from_groups_txn`.
256 changes: 128 additions & 128 deletions synapse/storage/databases/state/bg_updates.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,81 +98,115 @@ def _get_state_groups_from_groups_txn(
# a temporary hack until we can add the right indices in
txn.execute("SET LOCAL enable_seqscan=off")
Copy link
Contributor Author

@MadLittleMods MadLittleMods Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is Postgres specific so it's the only thing left that is gated.

Do we have to worry about this in SQLite? I don't see a similar setting for SQLite

I think we have the correct indexes in place and this is just a hint to use those indexes.


# The below query walks the state_group tree so that the "state"
# table includes all state_groups in the tree. It then joins
# against `state_groups_state` to fetch the latest state.
# It assumes that previous state groups are always numerically
# lesser.
# This may return multiple rows per (type, state_key), but last_value
# should be the same.
sql = """
WITH RECURSIVE sgs(state_group) AS (
VALUES(?::bigint)
UNION ALL
SELECT prev_state_group FROM state_group_edges e, sgs s
WHERE s.state_group = e.state_group
)
%s
"""
# The below query walks the state_group tree so that the "state"
# table includes all state_groups in the tree. It then joins
# against `state_groups_state` to fetch the latest state.
# It assumes that previous state groups are always numerically
# lesser.
sql = """
WITH RECURSIVE sgs(state_group) AS (
VALUES(CAST(? AS bigint))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change to this code here is changing the cast to be compatible between SQLite and Postgres.

SQLite chokes on the previous ?::bigint syntax

UNION ALL
SELECT prev_state_group FROM state_group_edges e, sgs s
WHERE s.state_group = e.state_group
)
%s
"""

overall_select_query_args: List[Union[int, str]] = []
overall_select_query_args: List[Union[int, str]] = []

# This is an optimization to create a select clause per-condition. This
# makes the query planner a lot smarter on what rows should pull out in the
# first place and we end up with something that takes 10x less time to get a
# result.
use_condition_optimization = (
not state_filter.include_others and not state_filter.is_full()
)
state_filter_condition_combos: List[Tuple[str, Optional[str]]] = []
# We don't need to caclculate this list if we're not using the condition
# optimization
if use_condition_optimization:
for etype, state_keys in state_filter.types.items():
if state_keys is None:
state_filter_condition_combos.append((etype, None))
else:
for state_key in state_keys:
state_filter_condition_combos.append((etype, state_key))
# And here is the optimization itself. We don't want to do the optimization
# if there are too many individual conditions. 10 is an arbitrary number
# with no testing behind it but we do know that we specifically made this
# optimization for when we grab the necessary state out for
# `filter_events_for_client` which just uses 2 conditions
# (`EventTypes.RoomHistoryVisibility` and `EventTypes.Member`).
if use_condition_optimization and len(state_filter_condition_combos) < 10:
select_clause_list: List[str] = []
for etype, skey in state_filter_condition_combos:
if skey is None:
where_clause = "(type = ?)"
overall_select_query_args.extend([etype])
else:
where_clause = "(type = ? AND state_key = ?)"
overall_select_query_args.extend([etype, skey])

select_clause_list.append(
# This is an optimization to create a select clause per-condition. This
# makes the query planner a lot smarter on what rows should pull out in the
# first place and we end up with something that takes 10x less time to get a
# result.
use_condition_optimization = (
not state_filter.include_others and not state_filter.is_full()
)
state_filter_condition_combos: List[Tuple[str, Optional[str]]] = []
# We only need to caclculate this list if we're using the condition optimization
if use_condition_optimization:
for etype, state_keys in state_filter.types.items():
if state_keys is None:
state_filter_condition_combos.append((etype, None))
else:
for state_key in state_keys:
state_filter_condition_combos.append((etype, state_key))
# And here is the optimization itself. We don't want to do the optimization
# if there are too many individual conditions. 10 is an arbitrary number
# with no testing behind it but we do know that we specifically made this
# optimization for when we grab the necessary state out for
# `filter_events_for_client` which just uses 2 conditions
# (`EventTypes.RoomHistoryVisibility` and `EventTypes.Member`).
if use_condition_optimization and len(state_filter_condition_combos) < 10:
select_clause_list: List[str] = []
for etype, skey in state_filter_condition_combos:
if skey is None:
where_clause = "(type = ?)"
overall_select_query_args.extend([etype])
else:
where_clause = "(type = ? AND state_key = ?)"
overall_select_query_args.extend([etype, skey])

# Small helper function to wrap the union clause in parenthesis if we're
# using postgres. This is because SQLite doesn't allow `LIMIT`/`ORDER`
# clauses in the union subquery but postgres does as long as they are
# wrapped in parenthesis which this function handles the complexity of
# handling.
def wrap_union_if_postgres(
union_clause: str, extra_order_or_limit_clause: str = ""
) -> str:
if isinstance(self.database_engine, PostgresEngine):
return f"""({union_clause} {extra_order_or_limit_clause})"""

return union_clause
Comment on lines +150 to +161
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better just to define this function somewhere above once but thought it lost the specific usage context when we put it somewhere else. Feel free to poke ⏩


# We could use `SELECT DISTINCT ON` here to align with the query below
# but that isn't compatible with SQLite and we can get away with `LIMIT
# 1` here instead because the `WHERE` clause will only ever match and
# target one event; and is simpler anyway. And it's better to use
# something that's simpler and compatible with both Database engines.
select_clause_list.append(
wrap_union_if_postgres(
# We only select `state_group` here for use in the `ORDER`
# clause later after the `UNION`
f"""
(
SELECT DISTINCT ON (type, state_key)
type, state_key, event_id
FROM state_groups_state
INNER JOIN sgs USING (state_group)
WHERE {where_clause}
ORDER BY type, state_key, state_group DESC
)
Comment on lines -155 to -162
Copy link
Contributor Author

@MadLittleMods MadLittleMods May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One improvement that I thought I could take out of this was simplifying these conditions like the following but it seems to perform about the same (at least in my one example I use against matrix.org) -> https://explain.depesz.com/s/pWH9

Suggested change
(
SELECT DISTINCT ON (type, state_key)
type, state_key, event_id
FROM state_groups_state
INNER JOIN sgs USING (state_group)
WHERE {where_clause}
ORDER BY type, state_key, state_group DESC
)
(
SELECT type, state_key, event_id
FROM state_groups_state
INNER JOIN sgs USING (state_group)
WHERE {where_clause}
ORDER BY type, state_key, state_group DESC
LIMIT 1
)
Raw query and EXPLAIN ANALYZE
EXPLAIN ANALYZE WITH RECURSIVE sgs(state_group) AS (
    VALUES(739988088::bigint)
    UNION ALL
    SELECT prev_state_group FROM state_group_edges e, sgs s
    WHERE s.state_group = e.state_group
)
(
    SELECT type, state_key, event_id, state_group
    FROM state_groups_state
    INNER JOIN sgs USING (state_group)
    WHERE (type = 'm.room.history_visibility' AND state_key = '')
    ORDER BY type, state_key, state_group DESC
    LIMIT 1
)
 UNION 
(
    SELECT type, state_key, event_id, state_group
    FROM state_groups_state
    INNER JOIN sgs USING (state_group)
    WHERE (type = 'm.room.member' AND state_key = '@madlittlemods:matrix.org')
    ORDER BY type, state_key, state_group DESC
    LIMIT 1
);
                                                                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1272.13..1272.15 rows=2 width=104) (actual time=1.707..1.717 rows=2 loops=1)
   CTE sgs
     ->  Recursive Union  (cost=0.00..284.28 rows=101 width=8) (actual time=0.004..0.504 rows=59 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
           ->  Nested Loop  (cost=0.57..28.23 rows=10 width=8) (actual time=0.007..0.007 rows=1 loops=59)
                 ->  WorkTable Scan on sgs s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.000 rows=1 loops=59)
                 ->  Index Only Scan using state_group_edges_unique_idx on state_group_edges e  (cost=0.57..2.79 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=59)
                       Index Cond: (state_group = s.state_group)
                       Heap Fetches: 0
   ->  Sort  (cost=987.85..987.85 rows=2 width=104) (actual time=1.705..1.708 rows=2 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.event_id, state_groups_state.state_group
         Sort Method: quicksort  Memory: 25kB
         ->  Append  (cost=493.90..987.84 rows=2 width=104) (actual time=1.136..1.545 rows=2 loops=1)
               ->  Limit  (cost=493.90..493.90 rows=1 width=91) (actual time=1.134..1.136 rows=1 loops=1)
                     ->  Sort  (cost=493.90..493.90 rows=1 width=91) (actual time=1.133..1.134 rows=1 loops=1)
                           Sort Key: state_groups_state.state_group DESC
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.84..493.89 rows=1 width=91) (actual time=1.111..1.122 rows=1 loops=1)
                                 ->  CTE Scan on sgs  (cost=0.00..2.02 rows=101 width=8) (actual time=0.007..0.532 rows=59 loops=1)
                                 ->  Index Scan using state_groups_state_type_idx on state_groups_state  (cost=0.84..4.86 rows=1 width=91) (actual time=0.009..0.009 rows=0 loops=59)
                                       Index Cond: ((state_group = sgs.state_group) AND (type = 'm.room.history_visibility'::text) AND (state_key = ''::text))
               ->  Limit  (cost=493.90..493.90 rows=1 width=91) (actual time=0.405..0.406 rows=1 loops=1)
                     ->  Sort  (cost=493.90..493.90 rows=1 width=91) (actual time=0.404..0.405 rows=1 loops=1)
                           Sort Key: state_groups_state_1.state_group DESC
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.84..493.89 rows=1 width=91) (actual time=0.398..0.399 rows=1 loops=1)
                                 ->  CTE Scan on sgs sgs_1  (cost=0.00..2.02 rows=101 width=8) (actual time=0.000..0.013 rows=59 loops=1)
                                 ->  Index Scan using state_groups_state_type_idx on state_groups_state state_groups_state_1  (cost=0.84..4.86 rows=1 width=91) (actual time=0.006..0.006 rows=0 loops=59)
                                       Index Cond: ((state_group = sgs_1.state_group) AND (type = 'm.room.member'::text) AND (state_key = '@madlittlemods:matrix.org'::text))
 Planning Time: 7.594 ms
 Execution Time: 3.023 ms
(31 rows)

Compare to the after query performance from the previous PR, #14527 -> https://explain.depesz.com/s/Qi4A

Copy link
Contributor Author

@MadLittleMods MadLittleMods May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I already explored this in #14494 (comment) (see Other exploration section) and came to the same conclusion

SELECT type, state_key, event_id, state_group
FROM state_groups_state
INNER JOIN sgs USING (state_group)
WHERE {where_clause}
""",
# The `ORDER BY`/`LIMIT` is an extra nicety that saves us from
# having to ferry a bunch of duplicate state pairs back from the
# database since we only need the one with the greatest
# state_group (most recent). Since this only applies to
# postgres, we do have to be careful to take care of the
# duplicate pairs in the downstream code when running with
# SQLite.
"""
ORDER BY type, state_key, state_group DESC
LIMIT 1
""",
)
)

overall_select_clause = " UNION ".join(select_clause_list)
else:
where_clause, where_args = state_filter.make_sql_filter_clause()
# Unless the filter clause is empty, we're going to append it after an
# existing where clause
if where_clause:
where_clause = " AND (%s)" % (where_clause,)
overall_select_clause = (
" UNION ".join(select_clause_list)
# We `ORDER` after the union results because it's compatible with both
# Postgres and SQLite. And we need the rows to by ordered by
# `state_group` in both cases so the greatest state_group pairs are
# first and we only care about the first distinct (type, state_key) pair later on.
+ " ORDER BY type, state_key, state_group DESC"
)
else:
Copy link
Contributor Author

@MadLittleMods MadLittleMods Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is a bit weird to follow but we're just removing the whole SQLite specific else and unindenting the Postgres one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where_clause, where_args = state_filter.make_sql_filter_clause()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth resurrecting?

@DMRobertson It seems possible to accomplish but there are a few more hurdles. If you're keen to take it over, feel free. I won't be looking at it until I get more backend facing again.

# Unless the filter clause is empty, we're going to append it after an
# existing where clause
if where_clause:
where_clause = " AND (%s)" % (where_clause,)

overall_select_query_args.extend(where_args)
overall_select_query_args.extend(where_args)

if isinstance(self.database_engine, PostgresEngine):
overall_select_clause = f"""
SELECT DISTINCT ON (type, state_key)
type, state_key, event_id
Expand All @@ -182,69 +216,35 @@ def _get_state_groups_from_groups_txn(
) {where_clause}
ORDER BY type, state_key, state_group DESC
"""
else:
# SQLite doesn't support `SELECT DISTINCT ON`, so we have to just get
# some potential duplicate (type, state_key) pairs and then only use the
# first of each kind we see.
overall_select_clause = f"""
SELECT type, state_key, event_id
FROM state_groups_state
WHERE state_group IN (
SELECT state_group FROM sgs
) {where_clause}
ORDER BY type, state_key, state_group DESC
"""
Comment on lines +220 to +230
Copy link
Contributor Author

@MadLittleMods MadLittleMods May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is exactly the same as the Postgres version above except for the DISTINCT ON (type, state_key) part at the beginning.

Any interest in sharing the bulk of the query here? Not sure it would be more clear


for group in groups:
args: List[Union[int, str]] = [group]
args.extend(overall_select_query_args)

txn.execute(sql % (overall_select_clause,), args)
for row in txn:
typ, state_key, event_id = row
key = (intern_string(typ), intern_string(state_key))
for group in groups:
args: List[Union[int, str]] = [group]
args.extend(overall_select_query_args)

txn.execute(sql % (overall_select_clause,), args)
for row in txn:
# The `*_` rest syntax is to ignore the `state_group` column which we
# only select in the optimized case
typ, state_key, event_id, *_ = row
key = (intern_string(typ), intern_string(state_key))
# Deal with the potential duplicate (type, state_key) pairs from the
# SQLite specific query above. We only want to use the first row which
# is from the greatest state group (most-recent) because that is that
# applicable state in that state group.
if key not in results[group]:
results[group][key] = event_id
else:
max_entries_returned = state_filter.max_entries_returned()

where_clause, where_args = state_filter.make_sql_filter_clause()
# Unless the filter clause is empty, we're going to append it after an
# existing where clause
if where_clause:
where_clause = " AND (%s)" % (where_clause,)

# We don't use WITH RECURSIVE on sqlite3 as there are distributions
# that ship with an sqlite3 version that doesn't support it (e.g. wheezy)
for group in groups:
next_group: Optional[int] = group

while next_group:
# We did this before by getting the list of group ids, and
# then passing that list to sqlite to get latest event for
# each (type, state_key). However, that was terribly slow
# without the right indices (which we can't add until
# after we finish deduping state, which requires this func)
args = [next_group]
args.extend(where_args)

txn.execute(
"SELECT type, state_key, event_id FROM state_groups_state"
" WHERE state_group = ? " + where_clause,
args,
)
results[group].update(
((typ, state_key), event_id)
for typ, state_key, event_id in txn
if (typ, state_key) not in results[group]
)

# If the number of entries in the (type,state_key)->event_id dict
# matches the number of (type,state_keys) types we were searching
# for, then we must have found them all, so no need to go walk
# further down the tree... UNLESS our types filter contained
# wildcards (i.e. Nones) in which case we have to do an exhaustive
# search
if (
max_entries_returned is not None
and len(results[group]) == max_entries_returned
):
break
Comment on lines -229 to -239
Copy link
Contributor Author

@MadLittleMods MadLittleMods May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current consolidation, we're losing this shortcut when running with SQLite (and the shortcut is probably pretty useful) 😬

And as mentioned in another comment, also conscious that some of these subtle hacks to consolidate might not be better than the big if-else distinction between Postgres and SQLite we had before 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing as I'm not convinced this PR is better now that we've reached this point where things work. It was looking very promising before these latest changes necessitated by SQLite not supporting SELECT DISTINCT ON syntax.

It is less code especially considering comments added but wouldn't consider it easier to reason about and we lose the SQLite optimization mentioned above.


next_group = self.db_pool.simple_select_one_onecol_txn(
txn,
table="state_group_edges",
keyvalues={"state_group": next_group},
retcol="prev_state_group",
allow_none=True,
)

# The results shouldn't be considered mutable.
return results
Expand Down