Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Update SQL statements in docs for Synapse Admins #12536

Merged
merged 9 commits into from
May 9, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/12536.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update SQL statements and replace use of old table `user_stats_historical` in docs for Synapse Admins.
234 changes: 180 additions & 54 deletions docs/usage/administration/useful_sql_for_admins.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,78 @@
## Some useful SQL queries for Synapse Admins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit scary that we document these SQL statements instead of adding APIs! Can any of these be removed and use the admin API instead?

I wonder if this should at least point to the admin API...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I think all the queries here are SELECTs. Though as you say elsewhere, an expensive SELECT can still cause pain!


## Size of full matrix db
`SELECT pg_size_pretty( pg_database_size( 'matrix' ) );`
```sql
SELECT
pg_size_pretty( pg_database_size( 'matrix' ) );
```
### Result example:
```
pg_size_pretty
----------------
6420 MB
(1 row)
```

## Show top 20 larger rooms by state events count
```sql
SELECT r.name, s.room_id, s.current_state_events
FROM room_stats_current s
LEFT JOIN room_stats_state r USING (room_id)
ORDER BY current_state_events DESC
LIMIT 20;
SELECT
r.name,
s.room_id,
s.current_state_events
FROM
room_stats_current s
LEFT JOIN
room_stats_state r USING (room_id)
ORDER BY
current_state_events DESC LIMIT 20;
```

and by state_group_events count:
```sql
SELECT rss.name, s.room_id, count(s.room_id) FROM state_groups_state s
LEFT JOIN room_stats_state rss USING (room_id)
GROUP BY s.room_id, rss.name
ORDER BY count(s.room_id) DESC
LIMIT 20;
SELECT
rss.name,
s.room_id,
COUNT(s.room_id)
FROM
state_groups_state s
LEFT JOIN
room_stats_state rss USING (room_id)
GROUP BY
s.room_id,
rss.name
ORDER BY
COUNT(s.room_id) DESC LIMIT 20;
```
plus same, but with join removed for performance reasons:
```sql
SELECT s.room_id, count(s.room_id) FROM state_groups_state s
GROUP BY s.room_id
ORDER BY count(s.room_id) DESC
LIMIT 20;
SELECT
s.room_id,
COUNT(s.room_id)
FROM
state_groups_state s
GROUP BY
s.room_id
ORDER BY
COUNT(s.room_id) DESC LIMIT 20;
```

## Show top 20 larger tables by row count
```sql
SELECT relname, n_live_tup as rows
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20;
SELECT
relname,
n_live_tup AS "rows"
FROM
pg_stat_user_tables
ORDER BY
n_live_tup DESC LIMIT 20;
```
This query is quick, but may be very approximate, for exact number of rows use:
```sql
SELECT
COUNT(*)
FROM
<table_name>;
```
This query is quick, but may be very approximate, for exact number of rows use `SELECT COUNT(*) FROM <table_name>`.
### Result example:
```
state_groups_state - 161687170
Expand All @@ -68,43 +99,84 @@ user_directory_search - 316433

## Show top 20 rooms by new events count in last 1 day:
```sql
SELECT e.room_id, r.name, COUNT(e.event_id) cnt FROM events e
LEFT JOIN room_stats_state r USING (room_id)
WHERE e.origin_server_ts >= DATE_PART('epoch', NOW() - INTERVAL '1 day') * 1000 GROUP BY e.room_id, r.name ORDER BY cnt DESC LIMIT 20;
SELECT
e.room_id,
r.name,
COUNT(e.event_id) cnt
FROM
events e
LEFT JOIN
room_stats_state r USING (room_id)
WHERE
e.origin_server_ts >= DATE_PART('epoch', NOW() - INTERVAL '1 day') * 1000
GROUP BY
e.room_id,
r.name
ORDER BY
cnt DESC LIMIT 20;
```

## Show top 20 users on homeserver by sent events (messages) at last month:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the potential to be very slow, I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance for another solution / query? Or should I am add a warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding a warning about it would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synapse=> explain SELECT
synapse->   COUNT(*),
synapse->   sender 
synapse-> FROM
synapse->   events 
synapse-> WHERE
synapse->   (
synapse(>     type = 'm.room.encrypted' 
synapse(>     OR type = 'm.room.message'
synapse(>   )
synapse->   AND origin_server_ts >= DATE_PART('epoch', NOW() - INTERVAL '1 month') * 1000 
synapse-> GROUP BY
synapse->   sender 
synapse-> ORDER BY
synapse->   COUNT(*) DESC LIMIT 20;
                                                                                                              QUERY PLAN                                                                                                               
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2077.77..2077.82 rows=20 width=64)
   ->  Sort  (cost=2077.77..2083.53 rows=2303 width=64)
         Sort Key: (count(*)) DESC
         ->  HashAggregate  (cost=1993.46..2016.49 rows=2303 width=64)
               Group Key: sender
               ->  Seq Scan on events  (cost=0.00..1975.47 rows=3598 width=56)
                     Filter: (((type = 'm.room.encrypted'::text) OR (type = 'm.room.message'::text)) AND ((origin_server_ts)::double precision >= (date_part('epoch'::text, (now() - '1 mon'::interval)) * '1000'::double precision)))
(7 rows)

```sql
SELECT user_id, SUM(total_events)
FROM user_stats_historical
WHERE TO_TIMESTAMP(end_ts/1000) AT TIME ZONE 'UTC' > date_trunc('day', now() - interval '1 month')
GROUP BY user_id
ORDER BY SUM(total_events) DESC
LIMIT 20;
SELECT
COUNT(*),
sender
FROM
events
WHERE
(
type = 'm.room.encrypted'
OR type = 'm.room.message'
)
AND origin_server_ts >= DATE_PART('epoch', NOW() - INTERVAL '1 month') * 1000
GROUP BY
sender
ORDER BY
COUNT(*) DESC LIMIT 20;
```

## Show last 100 messages from needed user, with room names:
```sql
SELECT e.room_id, r.name, e.event_id, e.type, e.content, j.json FROM events e
LEFT JOIN event_json j USING (room_id)
LEFT JOIN room_stats_state r USING (room_id)
WHERE sender = '@LOGIN:example.com'
AND e.type = 'm.room.message'
ORDER BY stream_ordering DESC
LIMIT 100;
SELECT
e.room_id,
r.name,
e.event_id,
e.type,
e.content,
j.json
FROM
events e
LEFT JOIN
event_json j USING (room_id)
LEFT JOIN
room_stats_state r USING (room_id)
WHERE
sender = '@LOGIN:example.com'
AND e.type = 'm.room.message'
ORDER BY
stream_ordering DESC LIMIT 100;
```

## Show top 20 larger tables by storage size
```sql
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 20;
SELECT
nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(c.oid)) AS "total_size"
FROM
pg_class c
LEFT JOIN
pg_namespace n
ON (n.oid = c.relnamespace)
WHERE
nspname NOT IN
(
'pg_catalog',
'information_schema'
)
AND c.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(c.oid) DESC LIMIT 20;
```
### Result example:
```
Expand All @@ -131,7 +203,33 @@ public.state_group_edges - 122 MB
```

## Show rooms with names, sorted by events in this rooms
`echo "select event_json.room_id,room_stats_state.name from event_json,room_stats_state where room_stats_state.room_id=event_json.room_id" | psql synapse | sort | uniq -c | sort -n`

**Sort and order with bash**
```bash
echo "SELECT event_json.room_id, room_stats_state.name FROM event_json, room_stats_state \
WHERE room_stats_state.room_id = event_json.room_id" | psql -d synapse -h localhost -U synapse_user -t \
| sort | uniq -c | sort -n
```
Documentation for `psql` command line parameters: https://www.postgresql.org/docs/current/app-psql.html

**Sort and order with SQL**
```sql
SELECT
COUNT(*),
event_json.room_id,
room_stats_state.name
FROM
event_json,
room_stats_state
WHERE
room_stats_state.room_id = event_json.room_id
GROUP BY
event_json.room_id,
room_stats_state.name
ORDER BY
COUNT(*) DESC LIMIT 50;
```

### Result example:
```
9459 !FPUfgzXYWTKgIrwKxW:matrix.org | This Week in Matrix
Expand All @@ -146,11 +244,39 @@ public.state_group_edges - 122 MB

## Lookup room state info by list of room_id
```sql
SELECT rss.room_id, rss.name, rss.canonical_alias, rss.topic, rss.encryption, rsc.joined_members, rsc.local_users_in_room, rss.join_rules
FROM room_stats_state rss
LEFT JOIN room_stats_current rsc USING (room_id)
WHERE room_id IN (WHERE room_id IN (
'!OGEhHVWSdvArJzumhm:matrix.org',
'!YTvKGNlinIzlkMTVRl:matrix.org'
)
```
SELECT
rss.room_id,
rss.name,
rss.canonical_alias,
rss.topic,
rss.encryption,
rsc.joined_members,
rsc.local_users_in_room,
rss.join_rules
FROM
room_stats_state rss
LEFT JOIN
room_stats_current rsc USING (room_id)
WHERE
room_id IN
(
WHERE
room_id IN
(
'!OGEhHVWSdvArJzumhm:matrix.org',
'!YTvKGNlinIzlkMTVRl:matrix.org'
);
```

## Show users and devices that have not been online for a while
```sql
SELECT
user_id,
device_id,
user_agent,
TO_TIMESTAMP(last_seen / 1000) AS "last_seen"
FROM
devices
WHERE
last_seen < DATE_PART('epoch', NOW() - INTERVAL '3 month') * 1000;
```