Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add documentation about the user directory search algorithm #16320

Merged
merged 9 commits into from
Sep 26, 2023

Conversation

clokep
Copy link
Member

@clokep clokep commented Sep 14, 2023

Cleans-up the current user directory documentation and expands it a bit. It documents the algorithm (which is kind of readable from the code, but kind of not).

@clokep clokep marked this pull request as ready for review September 14, 2023 14:36
@clokep clokep requested a review from a team as a code owner September 14, 2023 14:36
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this (and for fixing up my prose from a couple of years ago).

I think this looks good. I've left some thoughts and a few suggestions, but don't consider them blockers.

docs/user_directory.md Outdated Show resolved Hide resolved
docs/user_directory.md Show resolved Hide resolved
docs/user_directory.md Outdated Show resolved Hide resolved
docs/user_directory.md Show resolved Hide resolved
docs/user_directory.md Outdated Show resolved Hide resolved

Results are sorted by a rank derived by:

* 4x if a user ID exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, when would this not be the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question. I think it is not-nullable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> \d user_directory_search
        Table "user_directory_search"
 Column  |   Type   | Collation | Nullable | Default 
---------+----------+-----------+----------+---------
 user_id | text     |           | not null | 
 vector  | tsvector |           |          | 
Indexes:
    "user_directory_search_fts_idx" gin (vector)
    "user_directory_search_user_idx" UNIQUE, btree (user_id)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually means the initial weight is 4, not 1? 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively, though it won't make much difference: sorting by val and 4 * val should give the same results1!

Footnotes

  1. except that the former is less prone to overflow, I guess

docs/user_directory.md Show resolved Hide resolved
docs/user_directory.md Outdated Show resolved Hide resolved
docs/user_directory.md Outdated Show resolved Hide resolved
@clokep
Copy link
Member Author

clokep commented Sep 26, 2023

@DMRobertson I tried to take your feedback into account. I suspect it is worth a re-read and not looking at the diff?

Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments left to your discretion. 🚢 🇮🇹 !

docs/user_directory.md Outdated Show resolved Hide resolved
docs/user_directory.md Show resolved Hide resolved
Comment on lines 34 to 37
* `user_directory_search`. To be joined to `user_directory`. It contains an extra
column that enables full text search based on user ids and display names.
Different schemas for SQLite and Postgres with different code paths to match.
column that enables full text search based on user IDs and display names.
Different schemas for SQLite and Postgres are used.
- Indexed on the full text search data. Indexed on users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I always found it strange that this is a separate table. I guess it's to limit the sqlite vs postgres differences to a small table rather than a larger one?

Copy link
Member Author

@clokep clokep Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because you enable "full text search" on a table, but the database engines then backs that by multiple tables that are hidden from you.

Comment on lines 39 to 42
* `user_directory_stream_pos`. When the initial background update to populate
the directory is complete, we record a stream position here. This indicates
that synapse should now listen for room changes and incrementally update
the directory where necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/user_directory.md Outdated Show resolved Hide resolved
Co-authored-by: David Robertson <davidr@element.io>
@clokep clokep enabled auto-merge (squash) September 26, 2023 13:41
@clokep clokep merged commit ec1c709 into develop Sep 26, 2023
31 checks passed
@clokep clokep deleted the clokep/user-dir-docs branch September 26, 2023 13:44
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Oct 30, 2023
No significant changes since 1.94.0rc1.

- Render plain, CSS, CSV, JSON and common image formats in the browser (inline) when requested through the /download endpoint. ([\matrix-org#15988](matrix-org#15988))
- Add experimental support for [MSC4028](matrix-org/matrix-spec-proposals#4028) to push all encrypted events to clients. ([\matrix-org#16361](matrix-org#16361))
- Minor performance improvement when sending presence to federated servers. ([\matrix-org#16385](matrix-org#16385))
- Minor performance improvement by caching server ACL checking. ([\matrix-org#16360](matrix-org#16360))

- Add developer documentation concerning gradual schema migrations with column alterations. ([\matrix-org#15691](matrix-org#15691))
- Improve documentation of the user directory search algorithm. ([\matrix-org#16320](matrix-org#16320))
- Fix rendering of user admin API documentation around deactivation. This was broken in Synapse 1.91.0. ([\matrix-org#16355](matrix-org#16355))
- Update documentation around message retention policies. ([\matrix-org#16382](matrix-org#16382))
- Add note to `federation_domain_whitelist` config option to clarify its usage. ([\matrix-org#16416](matrix-org#16416))
- Improve legacy release notes. ([\matrix-org#16418](matrix-org#16418))

- Remove Python version from `/_synapse/admin/v1/server_version`. ([\matrix-org#16380](matrix-org#16380))

- Avoid running CI steps when the files they check have not been changed. ([\matrix-org#14745](matrix-org#14745), [\matrix-org#16387](matrix-org#16387))
- Improve type hints. ([\matrix-org#14911](matrix-org#14911), [\matrix-org#16350](matrix-org#16350), [\matrix-org#16356](matrix-org#16356), [\matrix-org#16395](matrix-org#16395))
- Added support for pydantic v2 in addition to pydantic v1. Contributed by Maxwell G (@gotmax23). ([\matrix-org#16332](matrix-org#16332))
- Get CI to check PRs have been signed-off. ([\matrix-org#16348](matrix-org#16348))
- Add missing licence header. ([\matrix-org#16359](matrix-org#16359))
- Improve type hints, and bump types-psycopg2 from 2.9.21.11 to 2.9.21.14. ([\matrix-org#16381](matrix-org#16381))
- Improve comments in `StateGroupBackgroundUpdateStore`. ([\matrix-org#16383](matrix-org#16383))
- Update maturin configuration. ([\matrix-org#16394](matrix-org#16394))
- Downgrade replication stream time out error log lines to warning. ([\matrix-org#16401](matrix-org#16401))

* Bump actions/checkout from 3 to 4. ([\matrix-org#16250](matrix-org#16250))
* Bump cryptography from 41.0.3 to 41.0.4. ([\matrix-org#16362](matrix-org#16362))
* Bump dawidd6/action-download-artifact from 2.27.0 to 2.28.0. ([\matrix-org#16374](matrix-org#16374))
* Bump docker/setup-buildx-action from 2 to 3. ([\matrix-org#16375](matrix-org#16375))
* Bump gitpython from 3.1.35 to 3.1.37. ([\matrix-org#16376](matrix-org#16376))
* Bump msgpack from 1.0.5 to 1.0.6. ([\matrix-org#16377](matrix-org#16377))
* Bump msgpack from 1.0.6 to 1.0.7. ([\matrix-org#16412](matrix-org#16412))
* Bump phonenumbers from 8.13.19 to 8.13.22. ([\matrix-org#16413](matrix-org#16413))
* Bump psycopg2 from 2.9.7 to 2.9.8. ([\matrix-org#16409](matrix-org#16409))
* Bump pydantic from 2.3.0 to 2.4.2. ([\matrix-org#16410](matrix-org#16410))
* Bump regex from 1.9.5 to 1.9.6. ([\matrix-org#16408](matrix-org#16408))
* Bump sentry-sdk from 1.30.0 to 1.31.0. ([\matrix-org#16378](matrix-org#16378))
* Bump types-netaddr from 0.8.0.9 to 0.9.0.1. ([\matrix-org#16411](matrix-org#16411))
* Bump types-psycopg2 from 2.9.21.11 to 2.9.21.14. ([\matrix-org#16381](matrix-org#16381))
* Bump urllib3 from 1.26.15 to 1.26.17. ([\matrix-org#16422](matrix-org#16422))

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE8SRSDO7gYkSP4chELS76LzL74EcFAmUlINwACgkQLS76LzL7
# 4EdvExAAgjk6+/Fu45MRG7u5kFmFzoZWLOPD10XROANaSeqW1l/pBhFh+XvwR4TZ
# l/FdkSfS9YpHnw3aof13TclLu6IVWDM+vqYFuY2HSY/yzbcGvJFHqr26kOccpTTd
# 2r9m/AkguyHEBECDW8qJLXb8M7dqNa2SydTBu1+IrKfj6nq+fRxVyQhyAJXrI1Ta
# Dnz0XJ4TcwTrMPVk4MYrAcYjID6IV89dtp7ttH4DwXKDeSjMtxM/46EIg4u+VXDz
# fzK25JHVFYJA5+/rOn/RslmxjJHQfEIEB6NYxQwLeMeZuGSZooTebKn1odwogvhI
# Srtfsytum+twgSHD1s+7KldM+EjTiu7ouKi8VcfOlFuLnuBiROEc5WUljcL5K63F
# kVx2bXGU/eNkPp6ntNhYfgswx+yk2rXFqkTjz+xZQIZcOBqehHBDy8VhtwlRkTUw
# bzocdKkLMA4nfSlq5fFOAErMqJKsPS8aN9yYPShqEUiSUOKle8eHfA1cTXJuK0MS
# K2/YcDDZmJBrwVADyNDk5GKaDx39rR752OSuJb57Sp/edwUg6+H1I6lIN6YTeoJw
# FzJwGMzuMCktOQRW2enxQiA6RZjXFCwvD1LoWMjyO4YTXQwXxNCXsb0kLKUqfwsy
# qMGphWEl3rdzVSuFapNAgOLF0RfFNYZdhQnk+3fNEwxumxoqgho=
# =hx8G
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Oct 10 11:01:00 2023 BST
# gpg:                using RSA key F124520CEEE062448FE1C8442D2EFA2F32FBE047
# gpg: Can't check signature: No public key

# Conflicts:
#	.github/workflows/docker.yml
#	.github/workflows/docs-pr-netlify.yaml
#	.github/workflows/docs-pr.yaml
#	.github/workflows/docs.yaml
#	.github/workflows/latest_deps.yml
#	.github/workflows/poetry_lockfile.yaml
#	.github/workflows/push_complement_image.yml
#	.github/workflows/release-artifacts.yml
#	.github/workflows/tests.yml
#	.github/workflows/twisted_trunk.yml
#	poetry.lock
#	rust/src/push/base_rules.rs
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants