Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Do not allow duplicate variants within the tlang component of a transformed_extensions #429

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

jswalden
Copy link
Collaborator

That is, because en-emodeng-emodeng with duplicated emodeng variant is not structurally valid, neither ought de-t-en-emodeng-emodeng be valid. This fixes one of the issues raised in #330.

This patch's wording (and how it talks about restrictions on subtag sequences) is somewhat messy. Suggestions obviously appreciated for how to do this more clearly or concisely.

@leobalter
Copy link
Member

Another one to bring up in the next meeting.

@jswalden jswalden force-pushed the tlang-no-duplicate-variants branch from dbde17b to b5ea6fe Compare May 18, 2020 23:21
<li>does not include duplicate singleton subtags.</li>
<li>_locale_ can be generated from the EBNF grammar for `unicode_locale_id` in <a href="https://unicode.org/reports/tr35/#Unicode_locale_identifier">Unicode Technical Standard #35 LDML § 3.2 Unicode Locale Identifier</a>;</li>
<li>_locale_ does not use any of the backwards compatibility syntax described in <a href="https://unicode.org/reports/tr35/#BCP_47_Conformance">Unicode Technical Standard #35 LDML § 3.3 BCP 47 Conformance</a>;</li>
<li>the `unicode_language_id` within _locale_ does not contain ASCII case-insensitively equivalent `unicode_variant_subtag` subtags; and</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be changed so reader won't misread it as there should be no variant tag at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep staring at this and can't see this meaning in it...but also I wrote it. ^_^ I updated this with different wording that puts "duplicate" closer to the start; let met know if there's still a problem with that.

@jswalden
Copy link
Collaborator Author

@macchiati @zbraniecki Do either of you have comments on this structural-validity adjustment, and particularly how this intersects with BCP 47/UTS 35 validity requirements? The no-duplicates restriction at top level is already in specs; the one for tlang within a t extension is not yet specified, but it seems like a pretty natural fit if a top-level language tag is invalid for duplicate variants, it ought also be invalid inside a t extension's tlang.

@jswalden jswalden force-pushed the tlang-no-duplicate-variants branch from b5ea6fe to 7da6855 Compare May 21, 2020 23:30
@macchiati
Copy link

macchiati commented May 22, 2020 via email

@jswalden jswalden force-pushed the tlang-no-duplicate-variants branch from 7da6855 to 66ba7c4 Compare May 25, 2020 00:52
@zbraniecki
Copy link
Member

The no-duplicates restriction at top level is already in specs; the one for tlang within a t extension is not yet specified, but it seems like a pretty natural fit if a top-level language tag is invalid for duplicate variants, it ought also be invalid inside a t extension's tlang.

Is the de-duplication a canonicalization operation or validity check? I think in ICU4X we treat it as a canonicalization operation in accordance with the "Be liberal in what you accept, and conservative in what you send" principle.

I'd suggest doing that for both, language identifier and tlang along with case normalization and sorting.

@jswalden
Copy link
Collaborator Author

Is the de-duplication a canonicalization operation or validity check? I think in ICU4X we treat it as a canonicalization operation in accordance with the "Be liberal in what you accept, and conservative in what you send" principle.

It's a bad principle. :-) If you don't impose discipline in what you accept, senders will not self-discipline in what they send.

I'd suggest doing that for both, language identifier and tlang along with case normalization and sorting.

Duplicates in the language tag are excluded as a validity check -- and have been ever since BCP47. In principle that could be relaxed...but then you may need to drag along the rest of the ecosystem to also accept duplicates. That seems much harder than simply being more restrictive about what we accept, in the much-narrower area of tlang syntax. (Note that we currently are "conservative in what we accept" by excluding compatibility syntax, irregular tags BCP47 and ECMA-402 previously allowed, and extlang syntax, just off the top of my head.) To my knowledge, we have never loosened beyond what BCP47 describes, and coalescing duplicate variants in en-emodeng-Emodeng would be a loosening.

As usual @anba figured this out before both of us. :-)

@zbraniecki
Copy link
Member

It's a bad principle. :-)

I don't agree.

That seems much harder than simply being more restrictive about what we accept, in the much-narrower area of tlang syntax.

Yes. that makes sense.

@sffc
Copy link
Contributor

sffc commented Aug 13, 2020

What is the status of this PR? Are we still waiting for feedback?

@jswalden
Copy link
Collaborator Author

I don't think we are any more -- the request for feedback was answered with "don't repeat the ASCII duplication language multiple times", the latest iteration states it only once, so this should be good to go, as far as I can tell.

@jswalden
Copy link
Collaborator Author

jswalden commented Oct 8, 2020

@sffc At this point, assuming you agree with my last comment, I think this can be merged. Unless adding test262 data at the same time is prerequisite for doing so, in which case I can file the issue and do the necessary work on it, or there's some other blocking reason I'm not aware of...

@sffc
Copy link
Contributor

sffc commented Oct 8, 2020

Is it correct to say that this is technically a normative PR but it reflects web reality and therefore is a no-op change from the implementation's point of view?

@jswalden
Copy link
Collaborator Author

jswalden commented Oct 8, 2020

@sffc It's a normative change. I don't know if it reflects web reality, but it seems doubtful to me there are users out there who knowingly pass duplicate variants in tlang. Chrome beta allows new Intl.Locale("de-t-en-emodeng-emodeng").toString() right now, as does Firefox. This would change that to instead throw RangeError.

@sffc sffc changed the title Do not allow duplicate variants within the tlang component of a transformed_extensions Norminative: Do not allow duplicate variants within the tlang component of a transformed_extensions Oct 8, 2020
@sffc sffc changed the title Norminative: Do not allow duplicate variants within the tlang component of a transformed_extensions Normative: Do not allow duplicate variants within the tlang component of a transformed_extensions Oct 8, 2020
@sffc
Copy link
Contributor

sffc commented Oct 8, 2020

Oh ok, so this is a normative PR. That means we need to do the dance of getting implementations, adding tests, MDN, etc. I opened tc39/test262#2857 for tests. If you're aware of implementation bug numbers, can you add them to the wiki? Thanks.

@Constellation
Copy link
Member

FYI, JavaScriptCore rejects duplicate variants and throws RangeError from https://trac.webkit.org/changeset/266039/webkit

@jswalden
Copy link
Collaborator Author

jswalden commented Oct 9, 2020

SpiderMonkey bug for this is https://bugzilla.mozilla.org/show_bug.cgi?id=1670165 with patch posted. The test in it ought fairly readily be massaged into a test262 test -- I'll see about doing that, likely tomorrow.

@jswalden
Copy link
Collaborator Author

jswalden commented Oct 9, 2020

tc39/test262#2858 adds a test for this to test262.

@sffc
Copy link
Contributor

sffc commented Jan 25, 2021

This achieved TC39 consensus today.

@leobalter leobalter merged commit 378ba6f into tc39:master Feb 18, 2021
@jswalden jswalden deleted the tlang-no-duplicate-variants branch March 4, 2021 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus (TG1) Has consensus from TC39-TG1 has consensus Has consensus from TC39-TG2 has tests
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants