-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using ICU when running on Windows #826
Comments
I believe this also tracks potentially carrying a private version of ICU, whether on Linux or Windows. (We can split that out of course) |
Right. This issue intended to track all related ICU work we need to do. we can add all information here. |
You mention supporting app-local ICU in Windows and Linux. Will we support it in macOS? Are there any special caveats for Alpine here (or containers)? I know Alpine originally motivated the "no ICU" mode -- I assume ICU can be deployed on Alpine manually -- will it be possible to use an app-local ICU there as well? |
Yes, macOS should be covered too. I'll edit the text to clarify one thing for supporting ICU as local app on Linux/macOS which is, the ICU NuGet packages which going to be produced by Windows team is going to support Windows only. So, on Linux/macOS, the app will need manually include the ICU libs if want to have local support. That till we'll have some source producing ICU NuGet packages for Linux/macOS.
If the globalization invariant mode is enabled on Alpine, then we'll not load ICU at all (as we do today). |
ICU has different performance characteristics, so we are going to see performance changes as side-effect of this change (related: https://github.com/dotnet/coreclr/issues/27525). |
@jkotas that is why I was inclining to make ICU as opt-in in the beginning. I am not sure if we can address all perf concerns in 5.0. |
It is ok to have performance regressions if they are understood and done intentionally for a good reason. We have done this trade-off many times in the history of .NET. |
We also have a few options available to address any performance losses if they show up in hot paths. For example, we already have a few places where we special-case |
From my experiments with Simple Case Folding a two-level table is as fast as ASCII workarounds. I believe MSFT experts can make it even faster. |
I think that we should revisit all known platform differences and group them into 3 categories:
An example of ICU bug that we've already fixed: unicode-org/icu#840 Examples of things that can be bugs on our side or ICU side: Console.WriteLine(CultureInfo.InvariantCulture.CompareInfo.IsPrefix("''b", "b", CompareOptions.IgnoreSymbols)); // false
Console.WriteLine(CultureInfo.InvariantCulture.CompareInfo.IsPrefix("a''b", "ab", CompareOptions.IgnoreSymbols)); // true CompareInfo frenchCompare = new CultureInfo("fr-FR").CompareInfo;
Console.WriteLine(frenchCompare.Compare("\u0153", "oe")); // 1, should be 0 |
Personally I am not a big fan of such optimizations. It adds a LOT of complexity on our side.
I am not sure if I understand correctly, but we already have a cache:
I had the "pleasure" to get more familiar with ICU code and I can say that there is definitely a place for improvement. The lack of possibility of breaking a 20 years old API makes it a little bit harder, but not impossible. An example: before we call the equivalent of
@tarekgh are there other parties at Microsoft interested in using ICU in performance-sensitive code? Maybe we should just identify the perf bottlenecks, submit proposals and somehow divide the work amongst teams. Also, if we really want to use ICU as our default we should ask for expanding the |
We got some parties asking for using ICU on Windows. They already used it on Linux and never heard any complain from them about the perf there. So far, we didn't get anyone talked about the expected performance at all. |
It is super rare to see globalization APIs used on performance critical paths. If somebody uses globalization API on a performance critical path, it is most likely a bug - a wrong API being called. |
I agree. The problem is that I agree with @kevingosse who wrote in https://github.com/dotnet/corefx/issues/40674#issuecomment-526319388
I believe that if we really want to switch to ICU by default without optimizing it first, we should consider adding a |
They will hit the "slow path" only if the strings do contain globalization-sensitive characters.
Part of the plan is to have a switch to use NLS. So if somebody hits a problem with ICU (whether it is functionality or performance), they can go back to the legacy NLS to workaround it. |
Using StringComparer.Ordinal properly is not just a performance micro optimization. It is often required for security. |
This is true only for |
I'm a big supporter of this. It is really weird to say on one hand "people should use Ordinal most of the time anyway" and yet have APIs use culture-sensitive comparison per default (thus requiring extra knowledge to do the right thing). |
Thanks @tarekgh -- also pasting the official docs site for this since those might suffer updates: https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu |
Windows 10 now carrying a version of ICU. It will be good to allow .NET Core take advantage of that.
Goals
There are 2 goals which if we do will get us in the much better position regarding the globalization support.
Doing these 2 goals will give the flexibility to the apps to decide if want to use whatever ICU version shipping with Windows or decide to carry its own version of ICU.
How we can support ICU on Windows
Loading Default installed ICU on Windows
Windows RS2 started carrying ICU. The ICU DLLs are installed on %WINDIR%\System32 folder.
On RS2, there are 2 DLLs installed there named icuuc.dll and icuin.dll. On 19H1, Windows combined these 2 DLLs into a single DLL named icu.dll.
We'll support the loading ICU on Windows starting from 19H1 version and up. The reason is the previous versions require COM initialization which can cause some issues.
NLS or ICU as a default?
We'll use ICU by default as this is the future direction and we want to get users switch to that. Of course this will be kind of breaking change especially for sorting behavior and locale data. We are going to provide a config switch to allow using NLS if any app wanted to switch back to the old behavior.
Loading non-ICU default libraries
As the second goal is to allow using a specific version of ICU which not necessary the same version shipped with Windows. We are working with Windows team to have them produce ICU NuGet package which apps can get and install with their app (under the app folder). The apps will install it under the app folder for the reasons:
The app can opt-in to the app version using the config switch. We’ll try to find the libraries under the app folder according to the layout decided by NuGet package.
The ICU NuGet package will support running on down level Windows versions (e.g. Windows 7). That can give advantage of using ICU there too.
Config Switch
We need the config switch for 2 reasons:
To opt-in/out to or from using ICU/NLS.
To opt-in use the local app ICU version instead of the global version.
Note, we can have this switch work on Linux/macOS too. so, we’ll allow apps to carry their own ICU version if they need to. The issue here is, so far no one producing NuGet packages for ICU on Linux and macOS. The apps on Linux/macOS will need manually to carry the ICU libs to allow local ICU support.
The suggesting is to have a config switch named System.Globalization.ICU and will have the possible values:
We’ll support the switch through environment variable too called DOTNET_SYSTEM_GLOBALIZATION_ICU.
Falling back
Invariant Mode
If the globalization invariant mode is enabled, we honor it and will not try to load ICU at all.
The text was updated successfully, but these errors were encountered: