Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/arm64: 4 Regressions on 5/3/2024 3:54:35 PM #102044

Closed
performanceautofiler bot opened this issue May 9, 2024 · 5 comments · Fixed by #102084
Closed

[Perf] Linux/arm64: 4 Regressions on 5/3/2024 3:54:35 PM #102044

performanceautofiler bot opened this issue May 9, 2024 · 5 comments · Fixed by #102084
Assignees
Labels
arch-arm64 area-System.Text.Json os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented May 9, 2024

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline fc76b1cac3f02cc9729f6682d6850fd7982e9fe5
Compare 358b0a4d350a6c72ccc825a0ac668620f849365a
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Json.Tests.Utf8JsonReaderCommentsTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
78.93 ns 100.11 ns 1.27 0.13 False
80.90 ns 93.24 ns 1.15 0.34 False
94.71 ns 125.39 ns 1.32 0.48 False
81.94 ns 89.40 ns 1.09 0.07 False

graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Json.Tests.Utf8JsonReaderCommentsTests*'

System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing(CommentHandling: Skip, SegmentSize: 0, TestCase: ShortSingleLine)

ETL Files

Histogram

JIT Disasms

System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing(CommentHandling: Allow, SegmentSize: 0, TestCase: ShortSingleLine)

ETL Files

Histogram

JIT Disasms

System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing(CommentHandling: Allow, SegmentSize: 100, TestCase: ShortMultiLine)

ETL Files

Histogram

JIT Disasms

System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing(CommentHandling: Skip, SegmentSize: 0, TestCase: ShortMultiLine)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels May 9, 2024
@DrewScoggins DrewScoggins removed the untriaged New issue has not been triaged by the area owner label May 9, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues May 9, 2024
@DrewScoggins DrewScoggins self-assigned this May 9, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 9, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@DrewScoggins DrewScoggins removed their assignment May 9, 2024
@EgorBo EgorBo added this to the 9.0.0 milestone May 9, 2024
@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels May 9, 2024
@EgorBo EgorBo removed tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark untriaged New issue has not been triaged by the area owner labels May 9, 2024
@DrewScoggins
Copy link
Member

Looks related to #101761

@EgorBo
Copy link
Member

EgorBo commented May 10, 2024

The problem here is that we have a very large struct System.Text.Json.Utf8JsonReader size=192 with 9 gc pointers, so 120 bytes are non-gc

@EgorBo
Copy link
Member

EgorBo commented May 10, 2024

A big chunk of time is spent inside the explicit memory barrier here.

image

image

@EgorBo
Copy link
Member

EgorBo commented May 10, 2024

cc @jkotas @MichalStrehovsky so looks like unconditional full memory barrier (no-op on x64) might make the bulk move slower than individual helpers (where dmb ish migth be skipped, e.g. in case if the dest address is not in the gc heap) - it eats 15% of perf numbers

I found a few tricks to speed up the copy algorithm, just not sure that will be enough.

Maybe we should adjust the logic that decides when to emit the bulk helper, e.g. to:

if (gcPointers >= 4 && gcSize >= fullSize/2)
    // insert the barrier

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-System.Text.Json os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants