-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[browser][HybridGlobalization] Improve speed performance of IndexOf and LastIndexOf text APIs with HybridGlobalization mode #95583
Conversation
Tagging subscribers to this area: @dotnet/area-system-globalization Issue DetailsDescriptionIn this PR, we improve the time performance of This is done by picking up necessary parts from the previously used This improvement introduces a regression in raw size of ~21kB due to grapheme segmentation rules presented in src/mono/wasm/runtime/hybrid-globalization/segmentation-rules.ts. IdeasBy introducing our grapheme segmentation on the TypeScript side, we are duplicating code already present in .NET under the name
Contributes to: #94473 Note: I don't have much experience with TypeScript, so feel free to change any parts that don't obey style conventions.
|
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsDescriptionIn this PR, we improve the time performance of This is done by picking up necessary parts from the previously used This improvement introduces a regression in raw size of ~21kB due to grapheme segmentation rules presented in src/mono/wasm/runtime/hybrid-globalization/segmentation-rules.ts. IdeasBy introducing our grapheme segmentation on the TypeScript side, we are duplicating code already present in .NET under the name
Contributes to: #94473 Note: I don't have much experience with TypeScript, so feel free to change any parts that don't obey style conventions.
|
2 more ideas:
|
Update: 21kB before brotli compression. After compression, 5kB (that is ~1.5% of the original |
That sounds good: |
src/mono/wasm/runtime/hybrid-globalization/grapheme-segmenter.ts
Outdated
Show resolved
Hide resolved
src/mono/wasm/runtime/hybrid-globalization/grapheme-segmenter.ts
Outdated
Show resolved
Hide resolved
/azp run runtime-wasm |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -380,6 +381,8 @@ public void AssertIcuAssets(AssertBundleOptionsBase assertOptions) | |||
} | |||
|
|||
IEnumerable<string> actual = Directory.EnumerateFiles(assertOptions.BinFrameworkDir, "icudt*dat"); | |||
if (assertOptions.GlobalizationMode == GlobalizationMode.Hybrid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why actual
is not all files ? Like *.*
? @radical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because are only looking for the icu related files here, and we do throw if the number of expected!=actual .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now that we have a differently named file too, it would indeed make sense to enumerate all the files ignoring subdirectories, and AssertFilesOnDisk
can skip the number of files check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radical @pavelsavara Should I make the change as part of this PR? If I understand @radical comment correctly, it would require removing the number of files check and replacing Assert.Equal(expected, actualFileNames)
with Assert.Contains(expected, actualFileNames)
in the AssertFilesOnDisk
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up PR would be fine.
/azp run runtime-wasm |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @ilonatommy please approve for graphemes
/azp run runtime-wasm-libtests |
Azure Pipelines successfully started running 1 pipeline(s). |
Failures are unrelated and tracked. Merging. |
…nd LastIndexOf text APIs with HybridGlobalization mode (dotnet#95583) * Re-implement the grapheme segmenter from Intl. * Load segmentation rules as static json asset
I realized that in order to make this MT friendly, we need to load the JSON on all web-workers, not just in the UI thread. |
What changes would be necessary to accommodate this? |
Description
In this PR, we improve the time performance of
IndexOf
andLastIndexOf
text APIs when running in HybridGlobalization (HG) mode.This is done by picking up necessary parts from the previously used
Intl.Segmenter
class. By picking up only essential parts for grapheme segmentation, we could reduce the overhead and improve performance from ~2362ms down to ~38ms (measured using browser-bench)This improvement introduces a regression in raw size of ~21kB due to grapheme segmentation rules presented in src/mono/browser/runtime/hybrid-globalization/segmentation-rules.json. These rules are loaded separately as static JSON asset when HG is enabled.
Ideas
By introducing grapheme segmentation algorithm on the TypeScript side, we are duplicating code already present in .NET under the name
TextSegmentationUtility.GetLengthOfFirstUtf16ExtendedGraphemeCluster
. I have several propositions for how we could leverage this:JsCompareString
). By core part, I mean grapheme segmentation of source and target strings and iteration over strings to find a match. With this approach, we wouldn't introduce any additional rule files and use the already present and tested grapheme segmentation algorithm. However, doing interop for every string comparison introduces additional overhead, which would slow the APIs. I tried a basic version of this approach locally and measured ~200ms forIndexOf
andLastIndexOf
APIs (slower than this PR, but still significantly faster than usingIntl.Segmenter
). We could choose this approach if we consider the size increase by this PR undesirable.TextSegmentationUtility
for grapheme segmentation, but we would have to segment the entire source/target string in advance. This could significantly hinder performance for cases when the target string can be found in the first parts of the source string. Additionally, we would have to pass an array of grapheme breaks to the interop call.runtime/src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfoData.cs
Lines 1441 to 2295 in b69f79c
GetNumericGraphemeTableOffsetNoBoundsChecks
function in Typescript. Edit based on @pavelsavara comment: Ideally, we would preprocess data from CharUnicodeInfoData.cs only once and used them in subsequent calls.cc: @ilonatommy
Contributes to: #94473
Note: I don't have much experience with TypeScript and WASM in general, so feel free to change any parts that don't obey style conventions.