Skip to content

Latest commit

 

History

History
208 lines (107 loc) · 35.1 KB

notes-2024-05-23.md

File metadata and controls

208 lines (107 loc) · 35.1 KB

2024-05-23 ECMA-402 Meeting

Logistics

Attendees

  • Ben Allen - Igalia (BAN)
  • Eemeli Aro - Mozilla (EAO)
  • Chris de Almeida - IBM (CDA)
  • Henri Sivonen - Mozilla (HJS)
  • Louis-Aimé de Fouquières - Invited Expert (LAF)
  • Richard Gibson - OpenJS Foundation (RGN)
  • Sean Burke - Mozilla (SBE)
  • Shane Carr - Google i18n (SFC), Co-Moderator
  • Yusuke Suzuki - Apple (YSZ)

Standing items

Status Updates

Updates from the Editors

BAN: Several editorial PRs fixing long standing issues have been merged.

RGN: I'm happy with the editorial cleanup mode we're in.

BAN: We have two PRs from Anba from last summer, #826 and #827. They are really good but really big. I would like to put up a new PR with those changes.

Updates from the MessageFormat Working Group

EAO: We're thinking about changing the verbiage around error handling. A strict reading of the spec is that APIs must always return lists of errors. But we're considering making that requirement a bit more loose. Rather than having .format() and .formatToParts() that can return an Object/tuple of some sort for errors. Alternately: four AOs, .formatFallback() and .formatToPartsFallback(). Discussion ongoing.

Updates from the W3C i18n Group

BAN: I was hoping that we could bring the MF2 discussion to the W3C i18n WG.

Proposals and Discussion Topics

https://github.com/tc39/ecma402/projects/2

I18N objections to reducing accept-language #10

explainers-by-googlers/reduce-accept-language#10

YSZ: I discussed this with Apple i18n and privacy folks. We don't have a formal definitive position, but right now we're thinking that reducing A-L into a single item to one is not the right approach. For example, myself, I probably want to have 2 items in the list. Web sites that have Han/Kanji need to know whether I prefer Japanese or Chinese for rendering to be correct. We favor limiting fingerprinting but this is a big tradeoff. Limiting to only 1 is probably too problematic in real-world use cases. We are continually discussing these things; it’s possible that we could have a proposal with ideas, but right now we’re not agreeing with the direct proposal.

HJS: Previously when this has come up, it's been noted that APple can take a llonger list of prefs but make it shorter. It's not public in WebKit but something in the OS. It would be useful for this discussion if that mechanism could be publicly described.

YSZ: Right now, my understanding is that this Accept-Language handling is done in a CXNetwork framework. Basically we are sending up to two languages right now, the most preferred language and the system display language. It’s possible that we could change something, but at least my understanding is that this is the current behaviour.

HJS: I was previously under the impression that there were APple internal experts that developed an allowlist of combinations.

YSZ: This is much more about a definition of locale itself, this is Accept-Language sending what language we’d like to send.

For example, the system itself allows us to specify Arabic or Latin number. But this is not strictly tied to the Arabic locale itself, because we need to pass that as a Unicode extension. So we are bucketing these sets of configurations. That is different than Accept-Language.

HJS: It would be great on these threads to have a description of what's going on in Safari.

SFC: Thank you. We’ll continue having these discussions. If there are major updates, I’ll bring them back to the agenda. Mike wanted the feedback from the l10n groups, and I think we've done that now.

Smart Unit Preferences

https://notes.igalia.com/p/5wv2lK6Fp#/

BAN: (presents slides)

EAO: I have two separate thoughts. 1, I think unit conversion should be a 262 matter… I can imagine cases where the converted value would be valuable outside of formatting. Baking this capability into NumberFormat would make working with this clumsy. The other aspect is that through working with MessageFormat and other contexts and quite influenced by things that SFC has mentioned, I have come to understand and agree that the unit of a number is not a formatting option, but a quantity of a value that’s being formatted, and a more general solution that could help us with building an appropriate API is bringing in some sort of support for a “measured unit” as a compound that could be acceptable to NumberFormat

HJS: Do you have quantification of how much data this is, when the data is arranged in a way that a reasonable programmer would do?

SFC: There are several types of data that go into this, but most of the data is included in units.xml in CLDR.

HJS: If you wrote an ICU4X transform for this, how much would the ICU4X data.

SFC: Younis has been working on the ICU4X implementation, he already has a data structure for the conversion factors. We can look up exactly how big that is.

HJS: From seeing what you’ve showed, it is a bad case of the US region overriding all of these.

SFC: It looks like the unit conversion factors are about 8 kB; see here.

SFC: We got two types of feedback here and at TG1: is 402 the right place, and is the ‘en-US’ problem unsolvable? Having a way to pull some other variable than browser locale would be ideal. Fingerprinting challenges.

BAN: I wouldn't expect that this wouldn't split into too many buckets.

HJS: When TG1 was asking if 402 was the right place, was it presupposed that some JS place that’s appropriate for this, or HTML/CSS issue?

SFC: If I understand correctly, there is generally a feeling that a unit conversion API should be put in a library / blessed library, but also definitely a l10n concern. That requires deep integration with Intl API, not something userland library can provide. They can do conversions, but not formatting.

HJS: Was the topic of doing it somewhere other than JavaScript discussed at TG1?

BAN: It wasn't a topic I saw in the TG1 notes.

SFC: One thing that we should discuss as a group and get a position on two axes: the user preferences thing, and the unit conversion thing. We might find it useful to have a more clear message to send regarding this topic. This formatting is a 402 concern, but conversion is not, and it’s also not a 262 concern. Resolving this can let us make a clearer recommendation.

EAO: I'd say that one way in which we could assist solution would be to define something like a CompoundValue API that would be supported by NumberFormat and others, so you could have for example an object with a value property that's a number and a unit property, or a currency property, and this value could be directly formatted with NumberFormat accordingly if defined in the options bag and the input value. By defining the API, we’d be giving an output target for libraries for conversions to use to produce. We could get the whole experience, and a part of it would be done – the localization aspect – could be done in 402, and the rest in userland.

SFC: My thoughts on unit conversion is that we’re not in the business, particularly with the lack of a decimal type, we’re not in the businesses of giving scientific conversions in ECMAScript, because we don’t have a scientific type. The only risk is that when we add this feature, we in effect do have unit conversion in the platform, it’s just not wrapped in a unit conversion API. So what people will do is what people do with time zones, where people have all these weird hacks that parse output from DateTimeFormat. This causes a lot of problems, and what stable formatting tries to solve. A principle is that if there’s any functionality embedded in an Intl API that’s useful outside of Intl, we should do that outside of Intl. Another solution we should do in more places is we should have more expressivity around the formatted value. We currently have formatToParts, which solves a lot of these problems, but we could do better. In ICU4X/C things return formatted values with additional getters. Perhaps we could eventually return an object, keep value and unit in a machine-readable form, or we could just rely on StableFormatting to do that – if you StableFormat using the null locale, this could give you a machine-readable form that you can utilize. When formatting for human display, we’re not doing a highly accurate conversion. Example: When we’re talking about social distancing during COVID, the CDC said you need to have six feet of separation to be safe. When you display that in Germany, what’s happening is there are sources that say “you need 1.83 meters of separation”, which doesn’t really get the spirit. They probably should have said 2 meters. But this is the type of thing where these localized outputs aren’t designed to be accurate scientific conversions. I think that’s okay. If we do provide APIs to get to these formatted values, that’s the disclaimer that comes with them: it’s not meant to be a scientific unit conversion, but if you want the outputs you can get them this way, since we don’t want you to parse strings.

HJS: So the assumption is that a conversion API would need to be very scientific, and double-precision floating point would not be good enough. This context here, that CLDR has, are things like person-height, floor-area, surely double-precision floating point numbers are precise enough for that. It seems weird to refer to have an outright conversion API out of fear of not having a decimal type. On that point, what about currency? Money stuff seems higher-stakes than person-height. I assumed that currency formatting is high enough stakes that it’s not used, then I look at the HTTP Archive and it seems to be one of the most popular things that people do with Intl APIs. When there is currency formatting in the API, it seems overprudent not to have a unit conversion API with lengths and areas. I’m not saying we should have that API, but that the Decimal reason doesn’t seem like a good reason not to have it.

SFC: It’s a matter of when people think of measurement units, they come with this presumption that things are going to be accurate. By having everything you can do inside Intl, we indicate that this is l10n, rather than precision. Not actually scientific – by doing it we’re saying this is an approximation.

LAF: The accuracy is itself a characteristic of the output.

SFC: The only place currency could come in is something like “dollars per mile”, that would be a legal conversion, but we’re not going to ever convert currencies.

EAO: Just reiterating that I don’t think unit conversion belongs in 402. What could belong is 402 is getting information for which measurement system is used in this locale for which object. Actually doing the conversion doesn’t belong in 402.

SFC: Definitely a valid perspective. In general there are two directions: we provide a black box that does everything. The other is to provide building blocks, and the building blocks that are needed. One is to get the unit preferences for a locale and a preference. One is encapsulating mixed unit lists. And then defer to 3rd party library for conversions. That can be done, but isn’t necessarily the most important thing. CLDR has the conversion data and the conversion utilities. There’s a lot more than just unit preference lists. If it says “here’s a region, here’s the usage”, if you look at the xml file, it depends on how big the unit is. You get a list of units for big, medium, and small. You also get data around how to round units. Road distance example: different rounding for different distances. There’s a lot of detail in unit preferences that needs to be exposed, and a lot of glue code. If it’s a black box, it’s easier to do the right thing and harder to do the wrong thing. By giving the building blocks, we’re setting up programmers to do the wrong thing. My position is that it seems intuitive that we should give the building blocks, but it’s better to give the black box.

EAO: My main concern with a black box is that there are real user desires for using the value of a unit conversion, rather than just the formatted result. What it sounds like you’re proposing is an API that could expand the scope of how people misuse Intl interfaces to get at the value, or proposing something novel in 402, which is providing a value use that could and should be used as such. It would be interesting to see the shape of what that looks like. Potentially this sounds like a thing that is maybe a 402 thing, but not an Intl object thing. I don’t know where it’d go or what it would look like, but it would be interesting to see.

BAN: The reason the black box idea works for me is that all the data is already in CLDR. The idea of it being a 402 thing but not an Intl object is intriguing.

SFC: We’re not solving the problem of doing unit conversions to the web platform, but we have learned the lesson that if people will use Intl APIs for things that they shouldn’t be used for. People will definitely use this for unit conversion, even though it’s not what it’s for. What we can do is acknowledge that they will do it, and provide data in a way that won’t break, without fully endorsing that this is a unit conversion API.

EAO: If we ever provide an API that does unit conversion and provides output, that will be a unit conversion API for the web platform, even if that’s not how it’s intended.

HJS: I agree, we should take into account what programmers will do and not just say that it’s not designed for that.

SFC: Sounds like we’re in agreement – that’s something we’ve learned the hard way. Maybe we could take these points, make slides, and get ahead of questions, so that TG1 discussion isn’t a rehash of earlier ones.

SBE: One takeaway is that actual unit conversion… there are too many different ways that people might want to use that. Some of those ways are things that we really don’t want them to be doing with an i18n API, which suggests what EAO proposed – notating what units are, not providing the conversion. Actually including conversions opens up several cans of worms that we don’t necessarily want to address in Intl.

SFC: It’s safe to say that we don’t have a TG2 recommendation for building blocks vs black box approach. Gathering the pros and cons are acceptable at this point. When we get to the point where we’re asking for stage 2, we’ll need a TG2 recommendation, but we don’t need it right now.

SFC: One sub-proposal is doing formatting of mixed units, which is something that’s motivated on its own. You can already do it by gluing NumberFormats and ListFormats together, but do we want to support that as a unit we can support in NumberFormat itself. Another less controversial sub-proposal would be the thing that CDA proposed before, notation: compact with a unit. See how that fits into the picture. That’s not the smart units proposal, but it’s one we could maybe move on. It would be nice to align on what we want the end goal to be, since there’s lots of sub-proposals to do to work towards that end goal. We should ask for 45 to 60 minutes for a discussion of this at TG1. Goal of that meeting is to get more clear guidance on what direction we want to take / black box vs building blocks, and the shape in general of unit formatting. If we say we want the black box / to do things the right way, we could also do something to fix the problem that CDA found with the compact notation bytes formatting.

EAO: I think putting building blocks and black box as opposition to each other is not necessarily the way this can go. I think the concerns about how the value results of conversion can get used will lead us to a situation where we provide some solution – something like building blocks will need to be provided – the question is whether we end up with black box and building blocks. If we provide just the black box, it’s going to get abused.

Please remove onError parameter from format/formatToParts

tc39/proposal-intl-messageformat#58

EAO: Discussion about this is happening in the MessageFormatV2 and the ICU TG. The question we’re asking is what is an appropriate set of requirements or suggestions that that spec should offer for how MessageFormat2 messages should be formatted if they fail, either a little bit or a lot. If they’re calling a custom function that fails, if they’re using invalid data, or they’re formatting a message that requires a variable that’s not passed in, etc. The text that has been in the MessageFormat2 spec is quite strict about how an API should behave, and it is possible that the text of the spec that we end up with will be looser about requiring the formatting behaviour to be the same across implementations. The question that’s interesting for us to consider here is what we think the most appropriate thing for JS, is that appropriate thing for JS something that is enabled/allowed by the MessageFormat2 proposal, so that we can take that feedback to the MessageFormat2 working group. Or should we find something that’s more appropriate than the current way. My personal take is that I very much value the strictness that is currently in the spec, and I do think that the current proposal for the format methods with their own error callbacks is the most JavaScriptey way of handling this, and a valid way of fulfilling all the requirements we have here.

SFC: I think that it’s an important behaviour to have. Not only is it required by the current specification, but it’s also useful to clients to access and iterate over these errors. However, I do think that there’s room to think about what’s the most JavaScriptey way to give this information to users, and I’m not convinced that onError is the most JavaScriptey approach. This is not something I’ve necessarily seen done in JS languages. JavaScript supports exceptions – not every language does, but JavaScript does – and try/catch is a pattern I’ve seen. We can also populate fields – it’s very JavaScriptey to return bags of data. Possibly return an upgraded type like FormattedMessage, with getters on it to get the errors out. Regex.match returns an object that has getters on it – it’s consistent in JS to do something like that here. Having an error callback – maybe EAO can describe why this is the most JavaScriptey approach.

EAO: My background in this API design is working with Fluent, which has a similar sort of approach, in that the Fluent JavaScript formatting call returns a tuple with the value and list of errors, for the same set of purposes as in MessageFormat2. The sources of the error are various, that the user will have a use for both a message that is possibly formatted with some fallback values, as well as the errors themselves that they may deal with as appropriate. The practical experience as a user of the Fluent library in JavaScript made that pattern strange and clunky to use. It also requires the construction of the list of errors within the formatting, and its population, and then including in the output. Whereas the pattern that is here, is a sort of pattern, which I think USA linked to examples of, to me this seems from a userland point of view a much more common pattern to be used in interfaces, and it makes for a much cleaner UX when the formatted result is always the formatted result, and the errors that you might need to deal with (but only rarely can with this sort of API) is handled by a side-channel. The least-worst option here, in particular because it avoids the wrapper object creation that would be required by any sort of compound value, given the feedback that this proposal has already received is the number of objects that we come up with here.

SFC: To return to the object count concern, we’re not introducing too many objects. We’ve been hearing a lot about object cost in Temporal, but Temporal is basically two orders of magnitude more than this. I think that there’s potentially a very big benefit to having MessageFormat return a formatted message. Not only for error handling, but also for the expressiveness of the .formatToParts(), something like that. If it turns out that error handling is the only reason to return upgraded type we can have that discussion about adding more types, and this use case may not be enough justification for adding a new type, but my instinct is that it’s not the only use case.

EAO: That’s where the API started from. The call for that was called resultMessage, since I figured all the format calls currently return a string, so returning something other from a string would seem weird and not fit with the rest of the API design. Earlier in the life of this proposal the way to do it would be to call [?] that would have toString and toParts, to get the string value and the parts value, and provided you access to some of the more internal values of the message, which allowed you to take one formatted message and more cleanly and clearly include it as an input value to another MessageFormatter, in cases like we needed to include one message in another. There is a sense from TG1 that this is not the right thing to do. The sense I got when presenting that API was a strong desire for this to be more normal, more aligned with the format and formatToParts methods in other Intl objects, so that’s the direction we’ve taken. It sounds like this complex message value return could be explored, but there’s also concerns that have been previously made about that, and an explicit move away from that to our current design. I am not sure that error handling was specifically discussed in a clear way during all of that.

SFC: It sounds to me that this is probably a topic that’s worthwhile discussing again with TG1. I know that the previous discussion we had with TG1 was a little while ago, I don’t have a clear recollection. If the question is just about API shape, it is worthwhile to put this on the agenda so that we can discuss it with TG1.

EAO: I don’t have the bandwidth for doing that for Helsinki.

SFC: It would be an appropriate venue, if we have the resources to put such a thing on the agenda.

EAO: I don’t mind other people putting MessageFormat on the TG1 agenda and talking about it, but since I’m effectively hosting the meeting I don’t have the bandwidth for doing it myself.

SFC: I may put this on the agenda for myself.

Add locale sensitive substring matching functions to Intl.Collator

#506

HJS: There exists, not in 402 but in ICU4C, a notion of collator-based search. To have the notion of case-insensitive search that takes in account accents and (for example) the Turkish i. There are accents from the English-world point of view are accents, but are from the Finnish point of view (for example) are considered as a base letter in their own right. If you have a checkbox to allow/disallow accent, if you have an é you would ignore it, but you wouldn’t ignore [a with 2 dots]

So we need to know what accents are ignored and what accents are not ignored. It seems like we can get this information “for free” by using a collator, but “for free” may not be free. For example, if you want to search instead of just sort, there’s a need to correlate the collation units back to the text, which is complexity. The mapping is complexity, and then it turns out the whole design space for why collation data is arranged the way it is is because the point is to have culturally relevant less than / greater than relationships, but for search you just need equals, and less than/greater than is wasted complexity for search. Implementing search, since there is this need to correlate the collation units back to the text being searched, it’s a non-trivial effort to take a collator that does sorting and extend it to search. On the other hand if you have unicode database data and add special cases for (example) the Turkish i, and the kind of pre-baked data where you can take a character and have it mapped to its base character without accents, even without going to normalization. You could do accent-insensitive search and some kind of exception data that’s much simpler than collation data. What this is asking for is an accent- and case-insensitive search, and asking for a particular mechanism, and that mechanism is excessively complex. Even if we think the use case is reasonable, should there be a different mechanism than collation-based search for it. There are additional complications required to tie Arabic script and Hangul in CLDR. Another reason why this doesn’t come for free is that, since there is the collation data for Finnish or Swedish, you can use that for search, but you can’t: the implementation only allows the root and one tailoring later. We end up with as many copies as the search tailorings to the root as there are locales that have this locale-specific tailoring, and we duplicating the locale-specific tailoring. It ends up being very much not for free. So why do we have this search root layer? Well, the equals sign should not match not equals. We have one character that’s special everywhere. Then it takes the position that ignoring the tie, that for some reason ignoring that should be analogous to ignoring accents or case. That’s fine, but it could be special-cased with much less complexity than this. Similarly, there is data that, Arabic hamza, ignoring that is considered equivalent to ignoring accents in search, but there’s no user story of why you’d want this for Arabic, other than “we need to do this, let’s do this” without any background explanation of whether this is what users want. I’m not saying they don’t want it, I’m saying it’s problematic when the motivation isn’t visible. And for Hangul, it seems what the search collations try to accomplish is, say you’re producing archaic symbols, you want to approximate that with a modern IME, is it necessary? Firefox doesn’t do that – we don’t get complaints, but I don’t know if that’s a good signal. There’s a lot of complexity and it would be good to see to what extent users actually care.

What Firefox does is based on this precomputed fold case and precomputed base letter thing. I think if we want accent-insensitive search in a way that’s locale-sensitive about what accents to not ignore, we should first think about whether it’s a problem that really needs solving, and we should have a special-purpose solution for that instead of supposing that collation data gets it for free when how much it costs to get the collation database to do the search is a lot.

SFC: I think the use case here is definitely something that a lot of developers want to do, and as with all these Intl APIs there’s a lot of gnarly corners of everything we do – date time, number formatting, regular collation, there’s a lot of weird corners – but this seems like a use case that comes up a lot. I have a string, I want to search for substrings, get the boundaries, that seems like an operation that seems useful. I do agree, though, that there’s a lot of people in this thread who are saying in one way or another that I want to get at the building blocks. This is a good example where the building blocks are so messy and easy to use incorrectly that doing the building blocks is not the right solution. What’s the very basic fundamental that we might be able to agree on? Intl.Collator has the compare function that returns equal, less than, greater than, so one fundamental is, let’s say we took a string of text, took out every substring, and ran Collator.compare(). We could have a function that’s the ranges in this text that are Collator.compare() equal. Is that a thing that we think could be useful and motivated if we shape it in that form? Separate from the idea of exposing the Intl.Collator search?

HJS: Markus is noting in the thread that it’s more complex than that. Even though it’d be quadratic to look at each of the substrings, even if it’s extremely inefficient to take the current primitive and build search out of that primitive, there’s cases where it wouldn’t work. The way stuff works is even more difficult than it seems. What could go wrong? There’s the issue that in Thai they put certain characters not in logical order in the storage, and then the collation layer, when it’s used for sorting, in the process of mapping to collation elements undoes that presentational ordering. Then the search root suppresses those contractions, so that the search operates on how the user who’s used to the way the Thai keyboard works and Thai storage works matches those substrings.

Multiple characters in the storage can form one collation unit, but there can be issues if the prefix search lands in the middle of that sort of thing. Even if you do the silly quadratic approach, you still can’t get reliable results. And then how do you correlate that back to the input? Once you’ve run a match on the collation units, the collation units are 64 bits. If a browser has UTF-16 and you could run a search on UTF-16 and have a handful of special cases for case- and accent-sensitivity, and mappings on those units, UTF-16 units to UTF-16 units, you get this other set of 64 bit units that don’t correspond to one thing. There are so many simpler ways to accomplish the use case, but they need data that CLDR doesn’t currently have. Opening the collation data up in one window and have [] for accent-sensitive search, looking for the collation definitions and picking out the interesting things and developing this other data would be much less of an undertaking than implementing a search function on Intl.Collator. Searching the ICU4X collator vs doing all the work of reviewing collations in CLDR and adopting the current firefox approach based on new exception data is something I’d rather have, rather than the software project of adding collation-based search to the ICU4X collator. It’s better to look at the data and put them in a form that’s optimized for search, rather than taking the collation data and using Intl.Collator for search.

SFC: Maybe this isn’t a 402 issue – maybe if it came from Unicode or CLDR, we then could look at it? – but it seems hard to implement in 402. This seems to be a really simple problem where people have found a solution – firefox has its own, that it uses internally – but that can’t be the only solution for this problem.

HJS: For non-locale sensitive, the search that would correspond to the root locale without the Thai and Arabic things, that works on modern Korean and the sort of scripts that have accents that are really accents, in the sense that Latin and Greek scripts have them, and not hamzas, Firefox has an implementation of that but doesn’t have locale-specific exceptions. The only complaint about that is from a Finnish colleague. We’re not getting reports from throughout the world about not having collator-based search. If you have a large enough web page, like the HTML spec, even on a modern computer that’s super fast with an optimized build of a browser, you can see the difference in performance between collator-based search and what Firefox does.

SFC: In order to do this correctly, this is not a small thing. We want to collect the data to make sure it’s implementable in ICU4X, and it sounds like the solution that’s currently implemented in ICU4C we don’t want to implement in ICU4X. Could you clarify – if Safari and Chrome are already using the ICU4C approach – why…

HJS: It’s a needlessly complex approach. Firefox takes 16 bit units and converts them to 16 bit units on a character-by-character basis, so that you don’t get lost in the haystack.

SFC: So you’re saying that the Firefox approach is technically superior to the ICU4C implementation?

HJS: It's faster, less complex, has fewer edge cases, doesn't need to correlate back to the original representation. It runs directly on the UTF that you have, instead of first transforming into this many-to-many 64 bit space and then having to search on that end and correlate back, so it doesn’t have that issue, and as a matter of efficiently, because the collator data is designed so that > and < are culturally sensitive, we have needlessly complex data design for what we need. If we don’t, for collation for sorting purposes, we need only root and one level of tailoring, and that’s currently the way it is in both ICU4C and ICU4X. For search we logically need three layers: sorting root, search root overlaid, then we need the locale-specific overlay. Unless we do three layers of lookup in the collator itself, we end up with the data situation as you see in ICU4C, where we have this duplication and duplication of data that’s supposed to be for free, but the search root layer gets duplicated as many time as there’s search locale, and the locale-specific data is duplicated for every locale that has locale-specific search. And what the data is designed for is inefficient for our purposes.

SFC: One of the reasons I put this on here is because there’s a large number of upvotes. It does seem like a use case that has uses – I can even see google docs wanting to use this sort of API, I can see Google clients wanting to use this API if it’s available, so it seems intuitive that this is the kind of thing we want to work toward. I am convinced that there’s a use case, but I see what HJS is saying about the problems with using collator for that. Since Firefox has its own implementation, I wonder if someone from Mozilla can upstream this into ICU4X, and then eventually 402.

HJS: The thing is, we don’t have the locale-specific rules, we have what correlates to the root locale, so we don’t have the locale-specific exceptions. Maybe there are users who are unhappy about it, but the only one who has filed a bug is one of my colleagues in Finland.

SFC: Adding locale overrides is part of the picture – if we can get agreement from the different committees about whether this is the direction we want to go in, we could ask that question at that time.

HJS: We have [missed], and a logically equivalent operation computed from the Unicode database for removing accents. We could have a codepoint trie for ICU4X, seems like the sensible thing.

SFC: Next month: Sequence unit formatting, how do we support formatting of foot-and-inch, meter-and-centimeter, we can have a discussion about whether we want people to rely on ListFormat or if we want people to have a more cohesive solution.

The other item is the ‘und’ locale – it turns out we don’t support this in the Web platform. I thought we did, but I looked more closely and we don’t. Do we want to add the ‘und’ locale? Is this a solution we can employ for the StableFormatting proposal? That’s a discussion we can also have next month. I’ll post my conclusion on 506 based on these notes. And again, my conclusion that this is a thing that we can’t solve in ECMAScript until we solve it in Unicode, and solving it in Unicode requires in the next few years to see if we can take pieces from the Mozilla solution and bring it into ICU4X, and then after that bring it in to 402.