Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CountWordInText #10050

Merged
merged 1 commit into from
Jul 1, 2024
Merged

Conversation

lahma
Copy link
Contributor

@lahma lahma commented Jun 29, 2024

  • use SearchValues for special chars lookup on NET 8
  • only allocate delimiter chars once
  • ensure no closure allocations with static lambdas

image

* use SearchValues for special chars lookup on NET 8
* only allocate delimiter chars once
* ensure no closure allocations with static lambdas
@filzrev
Copy link
Contributor

filzrev commented Jun 30, 2024

It seems that it can reduce extra memory allocation that caused by string.Split and string.Trim()
by using Span-based API.

Currently it seems there is no SplitAny API that can be used for this purpose. (See: https://github.com/dotnet/runtime/issues/934)
Instead. it can be implemented by simple loop logics.

Example of Benchmark results

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
Default 163.41 ms 1.960 ms 1.833 ms 1.00 0.00 101250.0000 32250.0000 430256464 B 1.000
Optimized1(string) 132.51 ms 2.489 ms 2.329 ms 0.81 0.02 61500.0000 500.0000 257720518 B 0.599
Optimized2(SearchValue) 133.73 ms 2.538 ms 2.607 ms 0.82 0.02 61500.0000 500.0000 257720518 B 0.599
Optimized3(Span-based) 80.74 ms 1.560 ms 1.602 ms 0.49 0.01 - - 313 B 0.000

Benchmark code (LINQPad)

CountwordBenchmark.linq
<Query Kind="Program">
  <Namespace>BenchmarkDotNet.Attributes</Namespace>
  <Namespace>System.Globalization</Namespace>
  <Namespace>System.Runtime.CompilerServices</Namespace>
  <Namespace>System.Threading.Tasks</Namespace>
</Query>

#load "BenchmarkDotNet"

using System.Diagnostics;
using System.Globalization;
using System.Runtime.CompilerServices;

void Main()
{
    RunBenchmark();
}

private const int IterationCount = 1000;
private string text = "";

[GlobalSetup]
public void BenchmarkSetup()
{
    text = File.ReadAllText(@"C:\temp\docfx\docs\_site\docs\markdown.html");
}

[Benchmark(Baseline = true)]
public void Default()
{
    foreach (var i in Enumerable.Range(1, IterationCount))
        WordCounter_Default.CountWordInText(text);
}

[Benchmark]
public void Optimized1()
{
    foreach (var i in Enumerable.Range(1, IterationCount))
        WordCounter_Optimized2.CountWordInText(text);
}

[Benchmark]
public void Optimized2()
{
    foreach (var i in Enumerable.Range(1, IterationCount))
        WordCounter_Optimized2.CountWordInText(text);
}

[Benchmark]
public void Optimized3()
{
    foreach (var i in Enumerable.Range(1, IterationCount))
        WordCounter_Optimized3.CountWordInText(text);
}

public static class WordCounter_Default
{
    public static int CountWordInText(string text)
    {
        if (string.IsNullOrWhiteSpace(text))
        {
            return 0;
        }

        string specialChars = ".?!;:,()[]";
        char[] delimiterChars = [' ', '\t', '\n'];

        string[] wordList = text.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
        return wordList.Count(s => !s.Trim().All(specialChars.Contains));
    }
}

public static class WordCounter_Optimized1
{
    private static readonly string SpecialChars = ".?!;:,()[]";
    private static readonly char[] DelimiterChars = [' ', '\t', '\n'];

    public static int CountWordInText(string text)
    {
        if (string.IsNullOrWhiteSpace(text))
        {
            return 0;
        }

        string[] wordList = text.Split(DelimiterChars, StringSplitOptions.RemoveEmptyEntries);
        return wordList.Count(s => !s.Trim().All(static c => SpecialChars.Contains(c)));
    }
}

public static class WordCounter_Optimized2
{
    private static readonly System.Buffers.SearchValues<char> SpecialChars = System.Buffers.SearchValues.Create(".?!;:,()[]");
    private static readonly char[] DelimiterChars = [' ', '\t', '\n'];

    public static int CountWordInText(string text)
    {
        if (string.IsNullOrWhiteSpace(text))
        {
            return 0;
        }

        string[] wordList = text.Split(DelimiterChars, StringSplitOptions.RemoveEmptyEntries);
        return wordList.Count(static s => !s.Trim().All(static c => SpecialChars.Contains(c)));
    }
}

public static class WordCounter_Optimized3
{
    public static int CountWordInText(ReadOnlySpan<char> text)
    {
        var count = 0;
        var isWordProcessing = false;

        for (int i = 0; i < text.Length; ++i)
        {
            var ch = text[i];
            switch (ch)
            {
                // Word separator chars
                case '\t': // \u0009
                case '\n': // \u000A
                case '\r': // \u000D
                case ' ':  // \u0020
                    IncrementCountIfProcessingWord(ref count, ref isWordProcessing);
                    continue;

                // Skip special chars
                case '!': // \u0021
                case '(': // \u0028
                case ')': // \u0029
                case ',': // \u002C
                case '.': // \u002E
                case ':': // \u003A
                case ';': // \u003B
                case '?': // \u003F
                case '[': // \u005B
                case ']': // \u005D
                    continue;

                default:
                    isWordProcessing = true;
                    continue;
            }
        }

        // Handle remaining processing word.
        IncrementCountIfProcessingWord(ref count, ref isWordProcessing);

        return count;

        // Increment word count if processing word.
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        static void IncrementCountIfProcessingWord(ref int count, ref bool isProcessingWord)
        {
            if (isProcessingWord)
            {
                isProcessingWord = false;
                ++count;
            }
        }
    }
}

Copy link
Contributor

@yufeih yufeih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, we could leave other optimizations in the future.

@yufeih yufeih merged commit bd00e2b into dotnet:main Jul 1, 2024
7 checks passed
@yufeih yufeih added the performance Makes the pull request appear in "Performance" section of the next release note label Jul 1, 2024
@lahma lahma deleted the optimize-CountWordInText branch July 1, 2024 03:45
@lahma
Copy link
Contributor Author

lahma commented Jul 1, 2024

That implementation @filzrev did is probably worth a PR, great work with benchmarks and the implementation! I was afraid to change the "readable" behavior of the current implementation as I wasn't sure about test coverage.

A separate benchmark project could also be a good addition to the docfx solution to host experiments.

renovate bot referenced this pull request in buehler/dotnet-operator-sdk Jul 1, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [docfx](https://github.com/dotnet/docfx) | `2.76.0` -> `2.77.0` |
[![age](https://developer.mend.io/api/mc/badges/age/nuget/docfx/2.77.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/nuget/docfx/2.77.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/nuget/docfx/2.76.0/2.77.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/nuget/docfx/2.76.0/2.77.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>dotnet/docfx (docfx)</summary>

### [`v2.77.0`](https://github.com/dotnet/docfx/releases/tag/v2.77.0)

<!-- Release notes generated using configuration in .github/release.yml
at main -->

#### What's Changed

##### 💥 Breaking Changes

- chore: Drop .NET 7 SDK supports by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9911](https://github.com/dotnet/docfx/pull/9911)

##### 🎉 New Features

- feat: Enable PreferCSSPageSize option for PDF generation by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9850](https://github.com/dotnet/docfx/pull/9850)
- feat: Add docfx JSON Schema files and related tests by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9852](https://github.com/dotnet/docfx/pull/9852)
- feat: Add `toc.json` transform logics using `toc.extension.js` by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9953](https://github.com/dotnet/docfx/pull/9953)
- feat: Better error when link not found by
[@&#8203;Patrick8639](https://github.com/Patrick8639) in
[https://github.com/dotnet/docfx/pull/9957](https://github.com/dotnet/docfx/pull/9957)
- feat: Add `categoryLayout` option for metadata generation by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9965](https://github.com/dotnet/docfx/pull/9965)
- feat: Permits to specify the placement of overwrites by
[@&#8203;Patrick8639](https://github.com/Patrick8639) in
[https://github.com/dotnet/docfx/pull/9937](https://github.com/dotnet/docfx/pull/9937)
- feat: Support private symbols by
[@&#8203;Patrick8639](https://github.com/Patrick8639) in
[https://github.com/dotnet/docfx/pull/9972](https://github.com/dotnet/docfx/pull/9972)
- feat: Add support for gzipped xrefmap that stored as local file by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9966](https://github.com/dotnet/docfx/pull/9966)
- feat: Enable view transitions feature by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9909](https://github.com/dotnet/docfx/pull/9909)

##### 🐞 Bug Fixes

- fix: PDF `Producer` document information by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9851](https://github.com/dotnet/docfx/pull/9851)
- fix: Xrefmap baseUrl problem reported at
[#&#8203;9866](https://github.com/dotnet/docfx/issues/9866) by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9869](https://github.com/dotnet/docfx/pull/9869)
- fix: Xref links are not resolved on docs site by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9880](https://github.com/dotnet/docfx/pull/9880)
- fix: Change same URL check logic to case invariant by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9883](https://github.com/dotnet/docfx/pull/9883)
- fix: Improve unresolved xref messages by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9884](https://github.com/dotnet/docfx/pull/9884)
- fix: Fix nightly build errors by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9913](https://github.com/dotnet/docfx/pull/9913)
- fix: TOC filter value is not shared between pages by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9912](https://github.com/dotnet/docfx/pull/9912)
- fix: Build problems when running .NET 6 version of docfx on some
environment by [@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9922](https://github.com/dotnet/docfx/pull/9922)
- fix: `docfx metadata` command throw `ArgumentException` when
referencing empty namespace by doc comment by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/10023](https://github.com/dotnet/docfx/pull/10023)
- fix: serve url link by
[@&#8203;WeihanLi](https://github.com/WeihanLi) in
[https://github.com/dotnet/docfx/pull/10035](https://github.com/dotnet/docfx/pull/10035)

##### 🚀 Performance Improvements

- perf: Change serializer for XrefMap from NewtonsoftJson to
System.Text.Json by [@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9872](https://github.com/dotnet/docfx/pull/9872)
- perf: Remove some enum boxing in GlobMatcher by
[@&#8203;lahma](https://github.com/lahma) in
[https://github.com/dotnet/docfx/pull/10051](https://github.com/dotnet/docfx/pull/10051)
- perf: Optimize CountWordInText by
[@&#8203;lahma](https://github.com/lahma) in
[https://github.com/dotnet/docfx/pull/10050](https://github.com/dotnet/docfx/pull/10050)

##### 🔧 Engineering

- chore: Skip unstable SVG content check that returned from PlantUML
Online Server by [@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9858](https://github.com/dotnet/docfx/pull/9858)
- deps: Update Spectre.Console package versions by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9894](https://github.com/dotnet/docfx/pull/9894)
- chore: fix NU5129 warning on `dotnet pack` command by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9942](https://github.com/dotnet/docfx/pull/9942)
- chore: Add PolySharp libarary to use latest C# syntax by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9960](https://github.com/dotnet/docfx/pull/9960)
- chore: Add snapshot update workflow by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9969](https://github.com/dotnet/docfx/pull/9969)
- chore: Remove unused workflow settings by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/10030](https://github.com/dotnet/docfx/pull/10030)

##### 📄 Documentation

- docs: Fix typo by
[@&#8203;carlos-regis](https://github.com/carlos-regis) in
[https://github.com/dotnet/docfx/pull/9871](https://github.com/dotnet/docfx/pull/9871)
- docs: Fix URL in markdown and match to html example by
[@&#8203;si618](https://github.com/si618) in
[https://github.com/dotnet/docfx/pull/9881](https://github.com/dotnet/docfx/pull/9881)
- docs: Fix documentation site build warnings by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9895](https://github.com/dotnet/docfx/pull/9895)
- docs: Fix missing docfx.json config docs by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9935](https://github.com/dotnet/docfx/pull/9935)
- docs: Fix Docfx.App nuget package usage document by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/9994](https://github.com/dotnet/docfx/pull/9994)
- docs: Fix plugin related docs and logs by
[@&#8203;filzrev](https://github.com/filzrev) in
[https://github.com/dotnet/docfx/pull/10029](https://github.com/dotnet/docfx/pull/10029)

#### New Contributors

- [@&#8203;carlos-regis](https://github.com/carlos-regis) made their
first contribution in
[https://github.com/dotnet/docfx/pull/9871](https://github.com/dotnet/docfx/pull/9871)
- [@&#8203;si618](https://github.com/si618) made their first
contribution in
[https://github.com/dotnet/docfx/pull/9881](https://github.com/dotnet/docfx/pull/9881)
- [@&#8203;Patrick8639](https://github.com/Patrick8639) made their
first contribution in
[https://github.com/dotnet/docfx/pull/9957](https://github.com/dotnet/docfx/pull/9957)

**Full Changelog**:
dotnet/docfx@v2.76.0...v2.77.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "after 9pm,before 6am" in timezone
Europe/Zurich, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/buehler/dotnet-operator-sdk).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjEuMCIsInVwZGF0ZWRJblZlciI6IjM3LjQyMS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiXX0=-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Makes the pull request appear in "Performance" section of the next release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants