Skip to content
This repository has been archived by the owner on Sep 10, 2024. It is now read-only.

Accepted format of Device ID is too restrictive #2654

Closed
hughns opened this issue Apr 25, 2024 · 2 comments
Closed

Accepted format of Device ID is too restrictive #2654

hughns opened this issue Apr 25, 2024 · 2 comments

Comments

@hughns
Copy link
Member

hughns commented Apr 25, 2024

The authorization_grant policy requires the device ID to be 10+ alphanumeric characters:

regex.match("urn:matrix:org.matrix.msc2967.client:device:[A-Za-z0-9-]{10,}", scope)

The Device::try_from() implementation allows any length of alphanumeric or - characters:

if !id.chars().all(|c| c.is_ascii_alphanumeric() || c == '-') {
return Err(InvalidDeviceID::InvalidCharacters);
}

So, there is an inconsistency there.

MSC2967 notes:

For the purpose of this MSC we are assuming that device IDs are as per MSC1597 and, as such, are already URL safe and so can be represented as a scope without modification.

What is completely unhelpful though as MSC1597 only says that the max length is 31 characters (which isn't necessarily sensible anyway).

So, if we were to take the MSC as it reads today we should allow all URL safe characters and % with a max length of 31. This isn't what MAS is doing but probably isn't sensible either.

This issue was highlighted by generating a device ID by base64 encoding a Curve25519 public part which would give alphanumeric and +, /, =

Two things:

  1. Given that device IDs are still not well constrained after 20 months (since that part of MSC2967 was written), I think we should choose a proper encoding for arbitrary device IDs in the scope. e.g. URL encode
  2. We should make MAS use that format.
@hughns
Copy link
Member Author

hughns commented Apr 26, 2024

MSC2967 notes:

For the purpose of this MSC we are assuming that device IDs are as per MSC1597 and, as such, are already URL safe and so can be represented as a scope without modification.

What is completely unhelpful though as MSC1597 only says that the max length is 31 characters (which isn't necessarily sensible anyway).

On re-reading this morning, I actually see it does say more:

They also appear in key IDs and must therefore be a subset of that grammar.

Which I think means that they can appear in the tok part of the key_id grammar:

tok            = 1*31 tok_chars

tok_chars      = ALPHA / DIGIT / "." / "~" / "_" / "-"
                   ; A-Z a-z 0-9 . ~ _ -

So, allowed: A-Z a-z 0-9 . ~ _ -

But the is a note to say:

Note that enforcing this grammar will mean:

  • Making libolm not put + and / characters in key IDs (easy enough, but there will be a bunch of malformed unique keys out there in the wild. Possibly they would just get thrown away. Servers may need to continue to tolerate + and / in e2e keys for a while.)

So, it probably makes sense to extend to allow device IDs: A-Z a-z 0-9 . ~ _ - + /

We chose to express the device ID as a URN urn:matrix:org.matrix.msc2967.client:device:* so the URN format from RFC3986:

Which (assuming I've interpreted correctly) allows a-z A-Z 0-9 -._~!$&'()*+,;=:@/ and other characters need to be percent encoded.

As a sanity check, OAuth2 RFC6749 says:

The authorization and token endpoints allow the client to specify the
scope of the access request using the "scope" request parameter.
...
The value of the scope parameter is expressed as a list of space-
delimited, case-sensitive strings. The strings are defined by the
authorization server. If the value contains multiple space-delimited
strings, their order does not matter, and each string adds an
additional access range to the requested scope.

scope       = scope-token *( SP scope-token )
scope-token = 1*( %x21 / %x23-5B / %x5D-7E )

Allowed: !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_``abcdefghijklmnopqrstuvwxyz{|}~ which again is superset of the URN format so we are good here.

Given that MSC1597 has not progressed in 20 months, I suggest:

  • updating MSC2967 to say that the URN representing the device ID should be properly encoded
  • MAS should be decoding/encoding the URN syntax correctly (it might already be doing this, I haven't checked) for when it stores it and communicates it
  • For compatibility, MAS should allow arbitrary device IDs (or at least those in MSC1597)
  • Synapse should be decoding the URN syntax correctly in the token introspection responses that it gets from MAS (it might already be doing this, I haven't checked)

@hughns
Copy link
Member Author

hughns commented May 10, 2024

Following up on my previous suggestions:

Given that MSC1597 has not progressed in 20 months, I suggest:

  • updating MSC2967 to say that the URN representing the device ID should be properly encoded

Done in matrix-org/matrix-spec-proposals@8539ab2

    • MAS should be decoding/encoding the URN syntax correctly (it might already be doing this, I haven't checked) for when it stores it and communicates it

Done in #2718.

    • For compatibility, MAS should allow arbitrary device IDs (or at least those in MSC1597)

I believe @sandhose has confirmed that device IDs constrained to MSC1597 are okay in MAS.

    • Synapse should be decoding the URN syntax correctly in the token introspection responses that it gets from MAS (it might already be doing this, I haven't checked)

I believe @sandhose has confirmed that device IDs constrained to MSC1597 are okay in Synapse.

As such I'm closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant