Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identifier syntax #296

Open
wants to merge 9 commits into
base: persisted-documents
Choose a base branch
from

Conversation

martinbonnin
Copy link
Contributor

Copy link
Contributor

@Shane32 Shane32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps only use quotes around strings and characters?

spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
martinbonnin and others added 2 commits July 21, 2024 15:52
@benjie
Copy link
Member

benjie commented Jul 22, 2024

It's interesting you should raise this now; I was working on Friday on a reworking of the persisted documents spec that relates to using the /graphql/<hash>/<operationId> URLs and so I also felt this need to define the identifiers. I think we should encourage these URLs as the default going forward, but that's a topic for a different PR; I'm not ready to open that PR just yet but here's the commit where I added the syntax (note that I also narrowed the allowed characters to make them URL-safe):

a3d9ecb

I present this only for comparison, in case it's helpful.

spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved

IdentifierCharacter :: SourceCharacter but not `:`

SourceCharacter :: Any Unicode scalar value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredibly broad; I know that's effectively what we have currently but I feel like we should lock it down a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we could allow all Unicode characters, the main spec defines a specific set of characters for use in that document. Something similar could be done here.

SourceCharacter
U+0009
U+000A
U+000D
U+0020–U+FFFF

On a side note, the spec is defining these characters as UTF-16 codes, not Unicode code points.

I’m fine with allowing any Unicode character in this context, or any non control character would be fine too.

Copy link
Contributor Author

@martinbonnin martinbonnin Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only alphanumericals initially and then backtracked to "any unicode" because "why not". But I think your point about having the identifier in url path is the important one. I'll move to alphanumerical + a bunch of unreserved chars like you did here. Questions:

  • do we even need ~, ., _, -?
  • do we need uppercase letters? This might cause issues on case-insensitive filesystems (hello macOS 👋 )

Edit: updated in 5d48e0e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend a wide range of characters. URLs can escape any Unicode character so there need be no limit on what characters can be used for document ids. Also, url paths are by definition case sensitive; but they may be treated with no case sensitivity. For instance, if the document id was a guid it would need dashes. If it was base64 it would need lowercase and uppercase and a couple extra characters. If it was readable names, it would need to support Unicode for foreign languages. And a custom implementation may decide to treat the document id with no case sensitivity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the document id was a guid it would need dashes

Excellent point.

If it was base64 it would need lowercase and uppercase and a couple extra characters

Another excellent point. +, / and = would be needed too. Which would make them a bit awkard to use as url path segments... Interesting!

If it was readable names, it would need to support Unicode for foreign languages

I think I disagree on this one. I don't really see anyone using readable names as identifiers? Sounds like a footgun to me that if we can, we should discourage. Just like the relay cursors are encouraged to be opaque, I would encourage the document ids to be similarly opaque.

Tangential: I have a GraphQL-over-HTTP meeting in my calendar on Thursday. Is that still happening? Want to discuss over there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current mental model is:

  • OS filenames are user interface. They are visible to the user and should be human readable.
  • POs identifiers are technical and should be machine readable.

I don't really see myself using french diacritics in my POs ids (although that could make for a fun API!). But again all the POs I have ever used are sha256 so maybe there are other use cases out there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another excellent point. +, / and = would be needed too. Which would make them a bit awkard to use as url path segments... Interesting!

There's standard URL-safe base64 encoding that uses - and _: https://datatracker.ietf.org/doc/html/rfc4648#section-5 (the padding can be omitted).

I think I disagree on this one. I don't really see anyone using readable names as identifiers? Sounds like a footgun to me that if we can, we should discourage.

Very strong agree that persisted document IDs should not be hand-crafted. They should be hashes or counters or similar - computer assigned.

I have a GraphQL-over-HTTP meeting in my calendar on Thursday. Is that still happening? Want to discuss over there?

I've just created agenda files for them so feel free to add yourself if you want 👍

Personally, the most I would exclude is control characters, colon, and the two slashes. Or otherwise similar to most OS filename conventions.

I don't see much value in this. In general it's best to be strict first and loosen up when a need arises, since you cannot become stricter later without it being a breaking change. What use case do you have for being defined loosely? Our intended use cases are GUIDs/UUIDs, hashes and counters, and those are typically numeric, hexadecimal, or base64 encoded. I can imagine namespacing them too (e.g. MyApp.f9db3fbb3cfe...) but other than the a I don't think they should be much wider than a GraphQL Name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I added my name for Thursday's meeting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What use case do you have for being defined loosely?

Personally, none at all. Just nowhere else are there similar restrictions on identifiers. Filenames have a very few restrictions. And here we can’t use colon or the string would be confused with a prefixed identifier. So I would wonder why we are further restricting these identifiers which are otherwise opaque to the application?

Comparing to GraphQL names, there it’s part of a formatted document where a parser must be able to read the identifier separate from other tokens. But here all these identifiers are already escaped by JSON just like any other string. As part of a url path, again there is a well defined encoding scheme available.

Personally, I’d probably use url-safe base64 and period or fewer characters.

Copy link
Contributor

@Shane32 Shane32 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree on this one. I don't really see anyone using readable names as identifiers? Sounds like a footgun to me that if we can, we should discourage.

Very strong agree that persisted document IDs should not be hand-crafted. They should be hashes or counters or similar - computer assigned.

I just want to point out that part of the discussion during the 7-25-2024 working group was whether we should add a recommendation to include the document identifier and operation name within the url, specifically to make it easier to be human-readable in logs and during debugging. Given that many programmers do not utilize operation names, I think it is reasonable to guess that a valid document identifier naming scheme may be a combination of a description and hash/id, similar to machine-generated human-readable SEO product identifiers.

For instance, the algorithm may take the filename of the tsx file containing the query along with 8 base64 characters of the hash (48 bits-worth) for something like edit_product~3KS-A2J- as the document identifier, forgoing the prefix, for urls like https://example.com/graphql/edit_product~3KS-A2J-.

With a 48-bit hash, each unique filename could have 10,000 queries over the files' lifetime with a chance of collision of approximately 0.0000178%.

spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
- move `x-` and `sha256` to validation
- allow `:` in prefixed identifier payloads
- add link to RFC3986
Comment on lines +100 to +104
A _document identifier_ must only contain colons (`:`) and characters that are
defined as
[`unreserved` in RFC3986](https://datatracker.ietf.org/doc/html/rfc3986#section-2.3)
(alphanumeric characters (`A-Z`, `a-z`, `0-9`), dashes (`-`), periods (`.`),
underscores (`_`), and tildes (`~`)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add / to the list of allowed characters?

For GraphQL Hive, we already use "<app_name>/<app_version>/<document_hash>" as the document id for namespacing documents. (https://the-guild.dev/graphql/hive/docs/features/app-deployments#sending-persisted-document-requests-from-your-app)

From a persisted document store point of view, it does not make sense for us to share the hashes between all the different GraphQL clients.

You might not own all the teams that write to the persisted document store and dictating the hash algorithm to be used for everyone might not make sense.

In addition, we use the extracted app_name and app_version for analytics of the GraphQL API client usage, but that is irrelevant for this request, but might be relevant for the motivations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd works for me 👍 .

IIRC, there was a desire that the identifier should be "url-compatible" to use it as part of the url with GET request and improve the debugging/monitoring experience. And / might have a specific meaning there. I'm personally fine with letting implementers deal with that.

I'll ultimately defer to @benjie on this one as he has put a lot of thought into this.

Copy link
Contributor

@n1ru4l n1ru4l Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GET requests, this would require simply encoding the URI parameters, which I assume is already a best-practise? 🤔

Copy link
Contributor

@Shane32 Shane32 Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, already part of the spec:

https://github.com/graphql/graphql-over-http/blob/main/spec/GraphQLOverHTTP.md#get

For HTTP GET requests, the GraphQL-over-HTTP request parameters MUST be provided in the query component of the request URL, encoded in the application/x-www-form-urlencoded format as specified by the WhatWG URLSearchParams class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the GET requests being recommended as part of #305 are /graphql/<hash>/<operationname>, not using query strings. Technically you can have hash here have %2F in it to escape the /, but I think it would be better to stick to URL-safe characters if possible.

Happy to change a MUST to a SHOULD such that / is allowed. E.g. we RECOMMEND that you only use characters in the given set, but you MAY use others if there's a good reason to do so.

Before doing so, though, I'd ask whether using ~ might be a more acceptable character substitute - suitably uncommon, but also URL-safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change a MUST to a SHOULD such that / is allowed

Personally, I'd do this, along with a note about encoding the hash alongside the /graphql/<hash>/<operationname> recommendation in #305 . I probably would not add / to the safe list of characters when we are simultaneously recommending / be used to separate the hash from the operation name.

So the additional note might read something like this:

The document id / hash MUST be encoded using percent-encoding as specified by the WhatWG URLSearchParams class.

We should not need to describe how to encode the document id as a query parameter (e.g. /graphql?documentId=<hash>) as this is already specified in the spec.

Copy link
Contributor

@n1ru4l n1ru4l Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me aware of something else, doesnt /graphql/<sha> already conflict with /graphql/stream used by the GraphQL over SSE protocol?

Would in that case it make sense to use a different pattern such as /graphql/persisted/sha?

That would still not solve the /operation-name appendix though..

Using tilde is okay, but certainly looks weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at the GraphQL over SSE specification. I can't find that the url is defined anywhere, besides being used in some samples - ?? Seems to me that when the server receives Accept: text/event-stream then it may (not must, in case of validation errors) respond with a SSE stream...

Copy link
Contributor

@n1ru4l n1ru4l Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shane32 You are right it is not part of the specification. But the reference implementation graphql-sse kind of promotes using /graphql/stream: https://the-guild.dev/graphql/sse/recipes

It is out of topic here though, and I am fine with changing / to ~. It might be worth to also ask other vendors that provide a persisted query feature though, since it can imply breaking changes.

@benjie ✅ I am fine with having this be a MUST. We can adjust!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostGraphile also uses /graphql/stream as the endpoint to subscribe to schema change notifications as specified in #48

I agree with using a prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants