-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3700: Deprecate plaintext sender key #3700
Changes from 6 commits
3908f07
b594a0b
81b38c8
e97461a
2eb74bf
c080d7e
5e760bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# MSC3700: Deprecate plaintext sender_key | ||
|
||
This MSC proposes to deprecate superfluous fields from the plaintext event | ||
content of encrypted events, increasing privacy and enhancing security. | ||
|
||
An encrypted message that uses an algorithm of | ||
[`m.megolm.v1.aes-sha2`](https://spec.matrix.org/v1.2/client-server-api/#mmegolmv1aes-sha2) | ||
(such as an `m.room.encrypted` event) contains the following plain text keys in | ||
its contents: `algorithm`, `session_id`, `sender_key` and `device_id`. Both the | ||
`algorithm` and `session_id` are required for clients to be able to decrypt the | ||
ciphertext: the algorithm explains how to decrypt and the session ID says which | ||
session to use to decrypt the ciphertext. | ||
|
||
The `sender_key` and `device_id` are currently used by clients to store and | ||
lookup sessions in addition to the `session_id`, however the `session_id` is | ||
globally unique and so no disambiguation using `sender_key` or `device_id` is | ||
needed. | ||
|
||
Session IDs are encoded ed25519 public keys. In particular, the session ID is | ||
the public part of the key used to sign the session when it is shared. | ||
|
||
## Proposal | ||
|
||
The `sender_key` and `device_id` in `m.megolm.v1.aes-sha2` message contents are | ||
deprecated. Clients must ignore those fields when processing events, but should | ||
still include the fields when generating events to maintain backwards | ||
compatibility. At a future time the fields will stop being included. | ||
erikjohnston marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Similarly, the `sender_key` field in `m.room_key_request` to-device messages is | ||
deprecated. Clients must ignore the field when processing the request, but | ||
should still include it when generating *if* there is a `sender_key` field in | ||
the event we're requesting keys for. | ||
|
||
Clients must store and lookup sessions based purely on the session ID. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes it sound like clients are not allowed to store sessions based on room ID or sender user ID, which I don't think is what is intended. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, that is an interesting question actually. I think we don't want clients to ever lookup inbound sessions based on room ID or sender user ID, as they can be faked in exactly the same way as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least for room ID, it's hard to fake since the event itself is tied to the room. (Also, your rust SDK PR stores by room ID and session ID.) Maybe it could say something like:
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The server could easily lie about the room ID for an event though. If we don't check the room ID matches the session room ID then it allows servers to forward encrypted events into other rooms. I've updated the language to:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It can't, the room ID is part of the encrypted payload as well[1]. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, so there are three places where we have a room ID? The event, the session and the encrypted payload? Presumably we still want to check that they all match? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, the event room id is a hint to find the right bucket, the session room id is just a hint into which bucket to put the session. The room id on the session also can't be replaced by the server. The encrypted payload room id ensures that the server doesn't lie about the room id. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, but that relies on the client actually verifying the room IDs match. This can be done as you say by storing the sessions under the room ID they correspond to and then looking the session up via the room ID. However, there is a risk here that clients will do the following:
At that point the client is vulnerable to the server lying about the event's room ID and making the client display the verified encrypted content into the wrong room. A similar attack can happen with the sender. The key issue here is that clients need to handle both encrypted and unecrypted events, so they're very likely to use the event's room ID and sender when rendering things in the UI, rather than any data in the encrypted payload or session. This makes it very important for clients to actually check that the user ID/senders and room IDs match. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed. But what can we do to make this more prominent? The spec has some vague wording on this for room IDs:
But this is IMO neither clear nor noticeable enough. There is also this bogus paragraph:
This seems like a description of exactly what not to do. Clients should not look at I guess we need clear, actionable steps of what to do upon an encrypted room message receipt. Something like:
Ideally these rules would be in a differently coloured box to make it stand out from the rest of the text. |
||
|
||
When updating an existing session key, clients must ensure: | ||
1. that the updated session data comes from a trusted source, e.g. either the | ||
session data has a) a valid signature, or b) comes from the user’s session | ||
key backup; and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should spell out what we mean by "trusted source" here |
||
2. that the new session key has a lower message index than the existing session | ||
key. | ||
|
||
When clients receive an encrypted event with an unknown session they will need | ||
to send a key request to all clients, rather than the device specified by | ||
`sender_key` and `device_id`. This is the current behaviour used by Element | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current behavior is to send to all your own devices with wildcard, but to the specific device_id of the sender. So this will need to change as it's not the current behavior. Notice that it will produce new to_device traffic |
||
clients. | ||
|
||
### Benefits | ||
|
||
There are two main benefits of removing the `sender_key` and `device_id`: | ||
enhanced privacy and better security. | ||
|
||
Including these extra fields leaks which device was used to send the message, | ||
and so removing them has an obvious privacy benefit. | ||
|
||
On the security side, these fields are untrusted as: a malicious server (or | ||
other man-in-the-middle (MITM) attacker) can change these values; and other | ||
devices/users can simply lie about these values. | ||
|
||
Currently, clients therefore need to take care to only use these values to look | ||
up the session. If the client needs to know the associated `sender_key` they | ||
must use the identity key of the Olm session that was used to send them the | ||
Megolm session data, and not the `sender_key` in the event contents. | ||
|
||
This is an obvious footgun, and therefore removing/ignoring these untrusted | ||
fields reduces the risk of security bugs being introduced. | ||
|
||
## Potential issues | ||
Removing the `sender_key` and `device_id` means that clients don’t know which | ||
remote device to ask for the session key if they don’t already have it. Instead, | ||
clients will need to send a key request to all devices of the event sender. This | ||
will also reduce the information available when debugging encryption issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the key backup API already only indexes by room ID and session ID, so we already have an assumption that sender key/device ID are not necessary to ensure uniqueness.