Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Add note for IsStructurallyValidLanguageTag #431

Closed
wants to merge 1 commit into from

Conversation

ryzokuken
Copy link
Member

Add a note for IsStructurallyValidLanguageTag, clarifying the exact
variant of locale indentifer that's expected and point to a resource
that clarifies the differences between the two.

Fixes: #425

/cc @littledan

Add a note for IsStructurallyValidLanguageTag, clarifying the exact
variant of locale indentifer that's expected and point to a resource
that clarifies the differences between the two.

Fixes: tc39#425
@littledan
Copy link
Member

Rereading @iamstolis's comments, I guess the change we need to make is in the immediately preceding paragraphs, adding something right after

can be generated from the EBNF grammar in section 3.2 of the Unicode Technical Standard 35

to explain that we're taking into account the restrictions for section 3.3, and not accepting everything within the grammar.

I'm kinda wondering why we have that paragraph at all. It's really redundant. Maybe that whole thing should be a note? It's not great to have normative text that's duplicated like that. @anba do you know?

@@ -52,6 +52,10 @@ <h1>IsStructurallyValidLanguageTag ( _locale_ )</h1>
<p>
The abstract operation returns true if _locale_ can be generated from the EBNF grammar in section 3.2 of the Unicode Technical Standard 35, starting with `unicode_locale_id`, and does not contain duplicate variant or singleton subtags (other than as a private use subtag). It returns false otherwise. Terminal value characters in the grammar are interpreted as the Unicode equivalents of the ASCII octet values given.
</p>

<emu-note>
This function <em>specifically</em> accepts a "Unicode BCP 47 locale identifier", i.e. the backwards compatible syntax from "Unicode CLDR locale identifier" is not accepted. The difference between the two syntaxes is specified in <a href="https://unicode.org/reports/tr35/#BCP_47_Conformance">Unicode Technical Standard 35 section 3.3</a>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this gets the job done, sort of. But -- it seems like the crux of the confusion is that the bullet points give one explanation for when the function returns true; then the paragraph just above this gives a second, and inconsistent, explanation for when the function returns true. The bullets describe a narrower syntax; the paragraph describes a broader syntax. Why do we have (arguably) two explanations?

Or, stepping back from those nitty-gritty details: if the bullets already say this only recognizes a "Unicode BCP 47 locale identifier", adding a note that has to repeat that claim indicates something has already gone awry.

Or to put it the way I previously noted, #429 would address this concern by listing the requirements of the algorithm, then leaving the implications of those requirements to a note that neither reiterates what those requirements say, nor (even implicitly) contradicts them.

@anba
Copy link
Contributor

anba commented May 27, 2020

I'm kinda wondering why we have that paragraph at all. It's really redundant. Maybe that whole thing should be a note? It's not great to have normative text that's duplicated like that. @anba do you know?

The duplicate information was already present in the first edition. Except that at that time there wasn't any (semantic) difference between the bullet points and the text.

If I had to guess, I'd say the text was only added to explicitly define the return values of the function. And possibly to state that the duplicate singleton x in und-x-x isn't an error.

It probably doesn't hurt to keep:

Terminal value characters in the grammar are interpreted as the Unicode equivalents of the ASCII octet values given.

but even without that sentence the intent should be clear.

@ryzokuken
Copy link
Member Author

@anba @jswalden do we still need this?

@jswalden
Copy link
Collaborator

I don't think we do, but then most of the IsStructurallyValidLanguageTag spec text is my fault at this point, so I'm a little biased. :-)

The current text is clear that backwards compatibility syntax is not accepted.

The current text does clarify that any string it accepts must be a "Unicode BCP 47 locale identifier", in a manner that -- if you work through it -- also restricts beyond what that term presently means.

And the current text links to relevant UTS35 text.

This covers everything the proposed patch did (except for it not including #429's additional restrictions).

@ryzokuken
Copy link
Member Author

@jswalden that sounds good to me, I'll close this PR in that case.

@ryzokuken ryzokuken closed this Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing description of IsStructurallyValidLanguageTag() operation
4 participants