Editorial: Add note for IsStructurallyValidLanguageTag #431

ryzokuken · 2020-05-02T17:32:20Z

Add a note for IsStructurallyValidLanguageTag, clarifying the exact
variant of locale indentifer that's expected and point to a resource
that clarifies the differences between the two.

Fixes: #425

/cc @littledan

Add a note for IsStructurallyValidLanguageTag, clarifying the exact variant of locale indentifer that's expected and point to a resource that clarifies the differences between the two. Fixes: tc39#425

littledan · 2020-05-04T21:18:08Z

Rereading @iamstolis's comments, I guess the change we need to make is in the immediately preceding paragraphs, adding something right after

can be generated from the EBNF grammar in section 3.2 of the Unicode Technical Standard 35

to explain that we're taking into account the restrictions for section 3.3, and not accepting everything within the grammar.

I'm kinda wondering why we have that paragraph at all. It's really redundant. Maybe that whole thing should be a note? It's not great to have normative text that's duplicated like that. @anba do you know?

jswalden · 2020-05-04T21:22:50Z

spec/locales-currencies-tz.html

@@ -52,6 +52,10 @@ <h1>IsStructurallyValidLanguageTag ( _locale_ )</h1>
      <p>
        The abstract operation returns true if _locale_ can be generated from the EBNF grammar in section 3.2 of the Unicode Technical Standard 35, starting with `unicode_locale_id`, and does not contain duplicate variant or singleton subtags (other than as a private use subtag). It returns false otherwise. Terminal value characters in the grammar are interpreted as the Unicode equivalents of the ASCII octet values given.
      </p>
+
+      <emu-note>
+        This function <em>specifically</em> accepts a "Unicode BCP 47 locale identifier", i.e. the backwards compatible syntax from "Unicode CLDR locale identifier" is not accepted. The difference between the two syntaxes is specified in <a href="https://unicode.org/reports/tr35/#BCP_47_Conformance">Unicode Technical Standard 35 section 3.3</a>.


I suppose this gets the job done, sort of. But -- it seems like the crux of the confusion is that the bullet points give one explanation for when the function returns true; then the paragraph just above this gives a second, and inconsistent, explanation for when the function returns true. The bullets describe a narrower syntax; the paragraph describes a broader syntax. Why do we have (arguably) two explanations?

Or, stepping back from those nitty-gritty details: if the bullets already say this only recognizes a "Unicode BCP 47 locale identifier", adding a note that has to repeat that claim indicates something has already gone awry.

Or to put it the way I previously noted, #429 would address this concern by listing the requirements of the algorithm, then leaving the implications of those requirements to a note that neither reiterates what those requirements say, nor (even implicitly) contradicts them.

anba · 2020-05-27T14:44:27Z

I'm kinda wondering why we have that paragraph at all. It's really redundant. Maybe that whole thing should be a note? It's not great to have normative text that's duplicated like that. @anba do you know?

The duplicate information was already present in the first edition. Except that at that time there wasn't any (semantic) difference between the bullet points and the text.

If I had to guess, I'd say the text was only added to explicitly define the return values of the function. And possibly to state that the duplicate singleton x in und-x-x isn't an error.

It probably doesn't hurt to keep:

Terminal value characters in the grammar are interpreted as the Unicode equivalents of the ASCII octet values given.

but even without that sentence the intent should be clear.

ryzokuken · 2021-05-24T05:39:23Z

@anba @jswalden do we still need this?

jswalden · 2021-05-24T22:54:30Z

I don't think we do, but then most of the IsStructurallyValidLanguageTag spec text is my fault at this point, so I'm a little biased. :-)

The current text is clear that backwards compatibility syntax is not accepted.

The current text does clarify that any string it accepts must be a "Unicode BCP 47 locale identifier", in a manner that -- if you work through it -- also restricts beyond what that term presently means.

And the current text links to relevant UTS35 text.

This covers everything the proposed patch did (except for it not including #429's additional restrictions).

ryzokuken · 2021-06-07T05:21:34Z

@jswalden that sounds good to me, I'll close this PR in that case.

Editorial: Add note for IsStructurallyValidLanguageTag

d0c6a4a

Add a note for IsStructurallyValidLanguageTag, clarifying the exact variant of locale indentifer that's expected and point to a resource that clarifies the differences between the two. Fixes: tc39#425

ryzokuken mentioned this pull request May 3, 2020

Confusing description of IsStructurallyValidLanguageTag() operation #425

Closed

jswalden reviewed May 4, 2020

View reviewed changes

ryzokuken closed this Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial: Add note for IsStructurallyValidLanguageTag #431

Editorial: Add note for IsStructurallyValidLanguageTag #431

ryzokuken commented May 2, 2020

littledan commented May 4, 2020

jswalden May 4, 2020

anba commented May 27, 2020

ryzokuken commented May 24, 2021

jswalden commented May 24, 2021

ryzokuken commented Jun 7, 2021

Editorial: Add note for IsStructurallyValidLanguageTag #431

Editorial: Add note for IsStructurallyValidLanguageTag #431

Conversation

ryzokuken commented May 2, 2020

littledan commented May 4, 2020

jswalden May 4, 2020

Choose a reason for hiding this comment

anba commented May 27, 2020

ryzokuken commented May 24, 2021

jswalden commented May 24, 2021

ryzokuken commented Jun 7, 2021