Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Numerics.Tests.Perf_VectorConvert.Convert_double_long has regressed #65189

Closed
Tracked by #79004
adamsitnik opened this issue Feb 11, 2022 · 6 comments
Closed
Tracked by #79004
Assignees
Labels
Milestone

Comments

@adamsitnik
Copy link
Member

System.Numerics.Tests.Perf_VectorConvert.Convert_double_long has regressed on 64 bit systems and improved on 32 bit.

image

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Numerics.Tests.Perf_VectorConvert.Convert_double_long

It seems that @performanceautofiler has missed the regression (cc @DrewScoggins ): https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Numerics.Tests.Perf_VectorConvert.Convert_double_long.html

System.Numerics.Tests.Perf_VectorConvert.Convert_double_long

Result Base Diff Ratio Operating System Bit Processor Name
Slower 1641.19 5656.11 0.29 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 1568.26 4675.84 0.34 Windows 11 X64 AMD Ryzen 9 5900X
Slower 2917.81 6127.01 0.48 Windows 10 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 3950.44 8210.00 0.48 Windows 11 X64 Intel Core i5-4300U CPU 1.90GHz (Haswell)
Slower 2511.54 5407.97 0.46 Windows 10 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 2391.06 4968.04 0.48 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 2467.05 5512.31 0.45 Windows 11 X64 Intel Core i9-9900T CPU 2.10GHz
Slower 3604.17 8129.05 0.44 Windows 11 X64 Unknown processor
Slower 3086.74 6799.54 0.45 Windows 11 X64 Unknown processor
Slower 1630.15 6176.46 0.26 ubuntu 20.04 X64 AMD Ryzen 9 5900X
Slower 2881.27 7115.21 0.40 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 2282.50 11087.64 0.21 centos 7 X64 Intel Xeon CPU E5530 2.40GHz
Slower 1626.48 9297.52 0.17 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower 2449.92 5405.11 0.45 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 982.59 6879.18 0.14 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 2599.96 5395.83 0.48 ubuntu 20.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 679.37 675.39 1.01 Windows 10 Arm64 Microsoft SQ1 3.0 GHz
Faster 20996.44 12145.39 1.73 Windows 10 X86 Intel Xeon CPU E5-1650 v4 3.60GHz
Faster 54109.68 30804.49 1.76 Windows 10 Arm Microsoft SQ1 3.0 GHz
Slower 3566.29 7145.91 0.50 macOS Big Sur 11.6.3 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Slower 3369.04 9991.33 0.34 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)

cc @tannergooding

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 11, 2022
@ghost
Copy link

ghost commented Feb 11, 2022

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Numerics.Tests.Perf_VectorConvert.Convert_double_long has regressed on 64 bit systems and improved on 32 bit.

image

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Numerics.Tests.Perf_VectorConvert.Convert_double_long

It seems that @performanceautofiler has missed the regression (cc @DrewScoggins ): https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Numerics.Tests.Perf_VectorConvert.Convert_double_long.html

System.Numerics.Tests.Perf_VectorConvert.Convert_double_long

Result Base Diff Ratio Operating System Bit Processor Name
Slower 1641.19 5656.11 0.29 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 1568.26 4675.84 0.34 Windows 11 X64 AMD Ryzen 9 5900X
Slower 2917.81 6127.01 0.48 Windows 10 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 3950.44 8210.00 0.48 Windows 11 X64 Intel Core i5-4300U CPU 1.90GHz (Haswell)
Slower 2511.54 5407.97 0.46 Windows 10 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 2391.06 4968.04 0.48 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 2467.05 5512.31 0.45 Windows 11 X64 Intel Core i9-9900T CPU 2.10GHz
Slower 3604.17 8129.05 0.44 Windows 11 X64 Unknown processor
Slower 3086.74 6799.54 0.45 Windows 11 X64 Unknown processor
Slower 1630.15 6176.46 0.26 ubuntu 20.04 X64 AMD Ryzen 9 5900X
Slower 2881.27 7115.21 0.40 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 2282.50 11087.64 0.21 centos 7 X64 Intel Xeon CPU E5530 2.40GHz
Slower 1626.48 9297.52 0.17 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower 2449.92 5405.11 0.45 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 982.59 6879.18 0.14 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 2599.96 5395.83 0.48 ubuntu 20.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 679.37 675.39 1.01 Windows 10 Arm64 Microsoft SQ1 3.0 GHz
Faster 20996.44 12145.39 1.73 Windows 10 X86 Intel Xeon CPU E5-1650 v4 3.60GHz
Faster 54109.68 30804.49 1.76 Windows 10 Arm Microsoft SQ1 3.0 GHz
Slower 3566.29 7145.91 0.50 macOS Big Sur 11.6.3 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Slower 3369.04 9991.33 0.34 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)

cc @tannergooding

Author: adamsitnik
Assignees: -
Labels:

area-System.Numerics, tenet-performance

Milestone: -

@tannergooding
Copy link
Member

This one is somewhat by design.

The previously logic wasn't doing the right stuff since x86/x64 doesn't have a single vectorized instruction to convert double <-> long/ulong and so the new logic is now correct.

There is a pending work item to go back and accelerate the logic using SIMD intrinsics

@adamsitnik
Copy link
Member Author

This one is somewhat by design.
There is a pending work item to go back and accelerate the logic using SIMD intrinsics

Should we close the issue then or would you prefer to wait until the pending work is finished?

@tannergooding
Copy link
Member

We can leave it open until the pending work is finished.

@dakersnar
Copy link
Contributor

@tannergooding We spotted this in the RC2 vs 6.0 perf report, is the pending work you referenced still ongoing?

@tannergooding
Copy link
Member

This was by design where the previous logic was incorrect. AVX-512 hardware have a dedicated instruction that helps with the codegen on such hardware.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants