Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing the "ja" culture takes 200ms when using ICU #31273

Open
adamsitnik opened this issue Oct 23, 2019 · 16 comments
Open

Initializing the "ja" culture takes 200ms when using ICU #31273

adamsitnik opened this issue Oct 23, 2019 · 16 comments
Labels
area-System.Globalization help wanted [up-for-grabs] Good issue for external contributors os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

DateTime.Parse for "ja" culture (ONLY) is 10x times slower on Linux compared to Windows

Slower diff/base Windows Median (ns) Linux Median (ns)
System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja) 8.87 454.24 4029.00
System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: da) 1.08 236.39 254.59
System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: ) 1.05 245.69 258.02
System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: ja) 1.05 245.61 257.00
System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: fr) 1.05 245.56 256.82
Faster Win/Lin Windows Median (ns) Linux Median (ns)
System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: da) 1.24 531.21 428.73
System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: fr) 1.05 450.44 428.87
System.Globalization.Tests.Perf_DateTimeCultureInfo.ToStringHebrewIsrael 1.04 509.37 488.81

The contributor who wants to work on this issue should:

  1. Run this simple benchmark from dotnet/performance repository and confirm the problem
git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp5.0 --filter System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse
  1. Build CoreCLR locally: https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-corefx.md#Build
  2. Create a small repro app: https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-corefx.md#repro
  3. Use PerfCollect to identify issue https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-corefx.md#PerfCollect
  4. Solve the issue
@adamsitnik adamsitnik changed the title DateTime.Parse for "ja" culture 10x times slower on Linux DateTime.Parse for "ja" culture 9x times slower on Linux Oct 23, 2019
@tarekgh
Copy link
Member

tarekgh commented Nov 13, 2019

@adamsitnik does the results change if we change the initialization

yield return new CultureInfo("ja");

to

yield return CultureInfo.GetCultureInfo("ja");

Sorry I didn't try it myself yet. I am asking because the parsing code should be exact the same between Windows and Linux, the only difference is the calendar data we pick from the OS which should be cached if you are not recreating the culture.

@adamsitnik
Copy link
Member Author

does the results change if we change the initialization

I've just tried and it does not change. I've also tested ja-JP and it does not help as well

@tarekgh
Copy link
Member

tarekgh commented Nov 19, 2019

@adamsitnik thanks for your quick try.

@tarekgh
Copy link
Member

tarekgh commented Jan 10, 2020

I looked at that and I am seeing the problem is the parse initialization time is counted as part of the measurements. The initialization should occur once for the culture. The test should really measure the parsing perf excluding the one time parse initialization. when I change the test code to:

        private static CultureInfo s_jaCulture = CultureInfo.GetCultureInfo("ja");
        private static CultureInfo s_jaJPCulture = CultureInfo.GetCultureInfo("ja-JP");

        private CultureInfo JA { get { DateTime.Parse("10/10/2010 12:00:00 AM", s_jaCulture); return s_jaCulture; }} // call parsing one time to force initialization 
        private CultureInfo JAJP { get { DateTime.Parse("10/10/2010 12:00:00 AM", s_jaJPCulture); return s_jaJPCulture; }} // call parsing one time to force initialization 

        public IEnumerable<object> Cultures()
        {
            // yield return new CultureInfo("ja");
            yield return JA;
            yield return JAJP;
            yield return new CultureInfo("fr");
            yield return new CultureInfo("da");
            yield return new CultureInfo("");
        }

I am getting the results:

Method culturestring Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
Parse 637.8 ns 38.69 ns 43.00 ns 644.6 ns 570.8 ns 717.3 ns - - - -
Parse da 654.0 ns 33.33 ns 38.38 ns 652.4 ns 578.8 ns 720.9 ns - - - -
Parse fr 640.6 ns 30.69 ns 35.34 ns 646.5 ns 562.4 ns 685.8 ns - - - -
Parse ja 639.9 ns 34.20 ns 39.38 ns 640.7 ns 570.5 ns 727.8 ns - - - -
Parse ja-JP 631.3 ns 46.60 ns 51.80 ns 623.0 ns 546.3 ns 742.7 ns - - - -

It is expected the Japanese culture parsing initialization be slower than other cultures because Japanese cultures have multiple eras which required to initialize more stuff.

@adamsitnik let me know if you want me to update the test as I did here.

@adamsitnik
Copy link
Member Author

adamsitnik commented Jan 13, 2020

I've investigated this issue and it turned out to be more complex ;)

BenchmarkDotNet JITs the code by just running the benchmark once. When it takes a lot of time, it skips further warmups and just runs the benchmark once per iteration. It does so to avoid situations where we have a benchmark that takes a lot of time to run and we run it many times just for a warmup (for example for 30s per run and six warmups = 3 minutes spend for a warmup).

The problem are benchmarks where the initialization time is high, but every next run is short. dotnet/BenchmarkDotNet#837

In this case, the first call to following code:

var culture = new CultureInfo("ja");
var result = DateTime.Parse("10/10/2010 12:00:00 AM", culture);

takes 200 ms to execute and this is not the cost of JITting the code (I've run it with CoreRT for comparison). With the performance repo settings (IterationTime = 250ms) it's considered to be long. That's why BDN runs the benchmark once per iteration. 4 us is reported due to Stopwatch resolution (this operation is to short to be just executed once before we stop the stopwatch).

The initialization time can be easily measured using the time command on Linux:

using System;
using System.Globalization;

namespace culture
{
    class Program
    {
        static void Main(string[] args)
        {
            var culture = new CultureInfo("ja");
            var result = DateTime.Parse("10/10/2010 12:00:00 AM", culture);
        }
    }
}
time dotnet run -c Release --no-build --no-restore
real	0m0,612s
user	0m0,953s
sys	0m0,097s

While for a Hello World it's just:

real	0m0,382s
user	0m0,642s
sys	0m0,070s

I am going to rename this issue and investigate the initialization time. 200ms is way too long.

@adamsitnik adamsitnik changed the title DateTime.Parse for "ja" culture 9x times slower on Linux Initializing the "ja" culture takes 200ms on Linux Jan 13, 2020
@adamsitnik
Copy link
Member Author

ucol_openRules takes 150ms... I <3 ICU

image

@adamsitnik
Copy link
Member Author

FWIW I was wondering why we create a new collator based on custom text comparison rules while parsing the DateTime and debugged it to the following code:

image

@adamsitnik
Copy link
Member Author

I've studied the ICU UCollator APIs and I cant' see any alternatives to ucol_openRules for creating an UCollator based on a list of custom rules.

Currently the only idea I have is to try to call this method with normalization disabled (UCOL_OFF instead of UCOL_DEFAULT)

@sharwell
Copy link
Member

sharwell commented Jan 13, 2020

@adamsitnik would it help to have the ability to specify the number of warmup iterations as a manual override?

@tarekgh
Copy link
Member

tarekgh commented Jan 13, 2020

@adamsitnik I am not sure it is worth it to try optimizing the initialization time of the parsing. The reason is the main scenarios always using a single culture (e.g. CurrentCulture or InvariantCulture) which have the initialization done once and we never got any complaint about that. Also, any code using CultureInfo.GetCultureInfo will always use cached culture with the cached parsing data. I suggest closing this one until we get a real complaint.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2020
@adamsitnik adamsitnik modified the milestones: Future, 5.0.0 Aug 17, 2020
@tarekgh
Copy link
Member

tarekgh commented Aug 17, 2020

Thanks @adamsitnik. I don't think this should be tagged for 5.0 though if you agree this is not a major issue we should target now. moving to 6.0 would be reasonable I guess.

@adamsitnik
Copy link
Member Author

I agree, moving to 6.0

@ghost
Copy link

ghost commented Aug 17, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

@tarekgh tarekgh removed this from the 6.0.0 milestone Apr 19, 2021
@asmichi
Copy link
Contributor

asmichi commented Nov 22, 2021

Could the issue title be updated so that this issue be more discoverable? For example, "when using ICU" instead of "on Linux". This issue affects also Windows and macOS now that .NET uses ICU on Windows by default and supports macOS.

Regarding the issue itself... This issue has visible impact on desktop apps. For desktop apps, startup time is an important factor, especially for short-lived ones. Since .NET Core 3.1 is reaching its EOL, I expect more Windows desktop apps will encounter this issue. (On the other hand, server apps should be less affected because I believe they tend to use "en" or the invariant culture after all.)

My actual apps, which are short-lived build tools utilizing Microsoft.Build.Construction APIs, encountered this issue after migrating from .NET Core 3.1 to .NET 6. Its execution time increased from ~1s to ~1.3s on average. As build tools they were executed thousands of times during a build, so the increase added up to hundreds of seconds. (Fortunately Globalization Invariant Mode was acceptable.)

dotnet msbuild (including dotnet build and dotnet run) also is affected by this issue as I have mentioned in #61887. The following PowerShell script run on English and Japanese Windows yields the result below.

dotnet new console
dotnet run
1..10 | % { Measure-Command { dotnet run } } | Measure-Object -Property TotalMilliseconds -Average
Windows Language Average Time
English 1315.6ms
Japanese 1619.0ms

Configuration:

  • Host
    • Windows 10 Pro 21H2 x64 Japanese (10.0.19043.1348)
    • AMD Ryzen 7 2700X
  • Hyper-V Guest
    • Windows 10 Pro 21H2 x64 Japanese (10.0.19044.1288)
    • Windows 10 Pro 21H2 x64 English (10.0.19044.1288)
    • .NET SDK 6.0.100 x64

@ghost ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 22, 2021
@tarekgh tarekgh changed the title Initializing the "ja" culture takes 200ms on Linux Initializing the "ja" culture takes 200ms when using ICU Nov 22, 2021
@tarekgh tarekgh removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 22, 2021
@tarekgh
Copy link
Member

tarekgh commented Nov 22, 2021

Could the issue title be updated so that this issue be more discoverable?

Yes, I have updated the title. Thanks for the suggestion.

@iSazonov
Copy link
Contributor

If look PowerShell startup scenario (гsers often complain that it starts up slowly) it would be great to have fast ICU initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Globalization help wanted [up-for-grabs] Good issue for external contributors os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

8 participants