Some System.Decimal performance improvements #99212

Daniel-Svensson · 2024-03-03T22:06:35Z

Speed up decimal Multiplication and Division (mostly on x64, but some minor improvements on x86).

It does mostly 3 tricks:

Use new X86 unstrincts for DivRem to reduce number of divisions, an to avoid doing expensive 64bit divisions (via helper call) in 32bit mode
Add a helper for 64 * 32 bit multiplication (which on 64bit platforms becomes a single call to BigMul)
Changes ulong Math.BigMul(uint, uint) to not use BMI2.MultiplyNoFlags for better generated code (it gets rid of the memory write)

This is small port of the code from #7778 now that DivRem is accessible.
There is some more fixes that can be done at a later time (such as using 64 by 32 multiply in more places), but many of the original changes are not relevant due either missing low lewel primitives (such as carry) or worse code generated code (big mul)

Questions

I saw code currently uses both checks on IntPtr.Size and #if TARGET_32BIT for target conditional code, what is the preferred style ?
How do you prioritize performance of 64bit vs 32bit runtime ?
- I could make a go and try and rewrite multiply to the fully optimized C++ version, but such a C# version would probably have some loss of performance for the 32bit version

Remarks for better performance

Add/Subtract with Carry would be very useful to further simplify the code and improve performance Provide a way to perform Add with carry and Subtract with borrow #48247 (comment) or Utilizing flags set by various instructions, not just cmp and test. #76502 or (ADX instrincts)
Instrincts or helpers for shld / ShiftLeft128 would be useful in a few places
The current code for BigMul could use some improvements (but there seems to be a couple of issues open for MultiplyNoFlags and an old abandoned PR)

Benchmarks

Full benchmark code can be found at https://github.com/Daniel-Svensson/ClrExperiments/tree/master/ClrDecimal/Benchmarks

Overview

x64
- Division time is 60-80% of old time (~25-66% speedup)
- Multiplication worst case down to 65% (~53% speedup)
x86
- Division time is 70-86% of old time (~16-42% speedup)
- Multiplication - might be a few % slower in worst case (not more than 8%)

Divide x64


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio
New	1	3	1/3 32bit division	26.42 ns	0.115 ns	0.102 ns	0.71
Main	1	3	1/3 32bit division	37.40 ns	0.038 ns	0.032 ns	1.00

New	107374182.39	3.3	34bit / 32bit	24.36 ns	0.488 ns	0.407 ns	0.78
Main	107374182.39	3.3	34bit / 32bit	31.26 ns	0.469 ns	0.439 ns	1.00

New	10145(...)50239 [21]	3	96bit / 32bit	18.11 ns	0.126 ns	0.112 ns	0.61
Main	10145(...)50239 [21]	3	96bit / 32bit	29.63 ns	0.118 ns	0.098 ns	1.00

New	39291(...)21183 [22]	12884901920	96bit / 64bit	30.19 ns	0.157 ns	0.147 ns	0.86
Main	39291(...)21183 [22]	12884901920	96bit / 64bit	35.06 ns	0.135 ns	0.126 ns	1.00

New	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	45.50 ns	0.184 ns	0.172 ns	0.78
Main	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	58.17 ns	0.249 ns	0.233 ns	1.00

64bit Multiply


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio
New	15564.65093	0.00000003	*32bit 32bit**	5.537 ns	0.0281 ns	0.0263 ns	1.01
Main	15564.65093	0.00000003	32bit * 32bit	5.503 ns	0.0311 ns	0.0275 ns	1.00

New	18446(...)19935 [21]	38738(...)51235 [22]	*96bit 96bit**	22.299 ns	0.1154 ns	0.0964 ns	0.94
Main	18446(...)19935 [21]	38738(...)51235 [22]	96bit * 96bit	23.860 ns	0.1578 ns	0.1399 ns	1.00

New	21702051861934.75013	51240935.53816662	*64it 64bit**	13.066 ns	0.0910 ns	0.0851 ns	0.89
Main	21702051861934.75013	51240935.53816662	64it * 64bit	14.759 ns	0.1160 ns	0.1085 ns	1.00

New	57510(...)29861 [21]	0.01193046	*96bit 32bit**	6.228 ns	0.0496 ns	0.0464 ns	0.65
Main	57510(...)29861 [21]	0.01193046	96bit * 32bit	9.584 ns	0.0786 ns	0.0735 ns	1.00

x86 (32-bit)

32bit benchmarks results


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio
Ole32	1	3	1/3 32bit division	31.68 ns	0.052 ns	0.049 ns	0.51
New	1	3	1/3 32bit division	44.65 ns	0.094 ns	0.083 ns	0.72
Main	1	3	1/3 32bit division	62.36 ns	0.533 ns	0.472 ns	1.00

Ole32	107374182.39	3.3	34bit / 32bit	27.64 ns	0.032 ns	0.027 ns	0.49
New	107374182.39	3.3	34bit / 32bit	39.30 ns	0.090 ns	0.084 ns	0.70
Main	107374182.39	3.3	34bit / 32bit	56.35 ns	0.057 ns	0.048 ns	1.00

Ole32	10145(...)50239 [21]	3	96bit / 32bit	24.04 ns	0.018 ns	0.016 ns	0.70
New	10145(...)50239 [21]	3	96bit / 32bit	29.66 ns	0.041 ns	0.039 ns	0.86
Main	10145(...)50239 [21]	3	96bit / 32bit	34.52 ns	0.077 ns	0.068 ns	1.00

Ole32	39291(...)21183 [22]	12884901920	96bit / 64bit	43.04 ns	0.036 ns	0.028 ns	0.50
New	39291(...)21183 [22]	12884901920	96bit / 64bit	65.67 ns	0.187 ns	0.166 ns	0.77
Main	39291(...)21183 [22]	12884901920	96bit / 64bit	85.67 ns	0.112 ns	0.105 ns	1.00

Ole32	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	48.31 ns	0.045 ns	0.038 ns	0.43
New	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	79.02 ns	0.514 ns	0.456 ns	0.71
Main	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	112.06 ns	0.549 ns	0.486 ns	1.00


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio	RatioSD
New	15564.65093	0.00000003	*32bit 32bit**	12.28 ns	0.040 ns	0.037 ns	1.03	0.00
Main	15564.65093	0.00000003	32bit * 32bit	11.92 ns	0.037 ns	0.033 ns	1.00	0.00

New	18446(...)19935 [21]	38738(...)51235 [22]	*96bit 96bit**	64.89 ns	0.160 ns	0.142 ns	1.03	0.00
Main	18446(...)19935 [21]	38738(...)51235 [22]	96bit * 96bit	63.14 ns	0.161 ns	0.134 ns	1.00	0.00

New	21702051861934.75013	51240935.53816662	*64it 64bit**	40.29 ns	0.807 ns	0.863 ns	0.98	0.02
Main	21702051861934.75013	51240935.53816662	64it * 64bit	41.18 ns	0.293 ns	0.274 ns	1.00	0.00

New	57510(...)29861 [21]	0.01193046	*96bit 32bit**	18.50 ns	0.079 ns	0.070 ns	1.08	0.01
Main	57510(...)29861 [21]	0.01193046	96bit * 32bit	17.08 ns	0.087 ns	0.082 ns	1.00	0.00

…se code than Job=ShortRun IterationCount=3 LaunchCount=1 WarmupCount=3 | Method | a | b | Mean | Error | StdDev | Allocated | |--------------------------- |-- |----------- |---------:|----------:|----------:|----------:| | Mul64By32_New | 3 | 4294967295 | 2.068 ns | 0.0459 ns | 0.0383 ns | - | | Mul64By32_Ori | 3 | 4294967295 | 2.916 ns | 0.0231 ns | 0.0193 ns | - |

- Add comment to BigMul64By32 and make it return nunit to avoid clearing upper 32 bits - Simplify IncreaseScale

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs

danmoseley · 2024-03-04T00:42:59Z

Cc @jkotas for the questions about relative prioritization of 32bit performance.

huoyaoyuan · 2024-03-04T02:56:48Z

It's similar to #99196 (and what I did in https://github.com/huoyaoyuan/runtime/tree/decimal-divrem)

ghost · 2024-03-04T04:56:02Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Speed up decimal Multiplication and Division (mostly on x64, but some minor improvements on x86).

It does mostly 3 tricks:

Use new X86 unstrincts for DivRem to reduce number of divisions, an to avoid doing expensive 64bit divisions (via helper call) in 32bit mode
Add a helper for 64 * 32 bit multiplication (which on 64bit platforms becomes a single call to BigMul)
Changes ulong Math.BigMul(uint, uint) to not use BMI2.MultiplyNoFlags for better generated code (it gets rid of the memory write)

This is small port of the code from #7778 now that DivRem is accessible.
There is some more fixes that can be done at a later time (such as using 64 by 32 multiply in more places), but many of the original changes are not relevant due either missing low lewel primitives (such as carry) or worse code generated code (big mul)

Questions

I saw code currently uses both checks on IntPtr.Size and #if TARGET_32BIT for target conditional code, what is the preferred style ?
How do you prioritize performance of 64bit vs 32bit runtime ?
- I could make a go and try and rewrite multiply to the fully optimized C++ version, but such a C# version would probably have some loss of performance for the 32bit version

Remarks for better performance

Add/Subtract with Carry would be very useful to further simplify the code and improve performance Provide a way to perform Add with carry and Subtract with borrow #48247 (comment) or Utilizing flags set by various instructions, not just cmp and test. #76502 or (ADX instrincts)
Instrincts or helpers for shld / ShiftLeft128 would be useful in a few places
The current code for BigMul could use some improvements (but there seems to be a couple of issues open for MultiplyNoFlags and an old abandoned PR)

Benchmarks

Full benchmark code can be found at https://github.com/Daniel-Svensson/ClrExperiments/tree/master/ClrDecimal/Benchmarks

64bit Divide


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio
New	92233(...)63.19 [21]	3.3	34bit / 32bit	23.09 ns	0.145 ns	0.121 ns	0.54
Main	92233(...)63.19 [21]	3.3	34bit / 32bit	42.75 ns	0.195 ns	0.183 ns	1.00

New	10145(...)02.39 [22]	0.3	96bit / 32bit	14.21 ns	0.086 ns	0.072 ns	0.48
Main	10145(...)02.39 [22]	0.3	96bit / 32bit	29.83 ns	0.074 ns	0.066 ns	1.00

New	39291(...)11.83 [23]	12884901.920	96bit / 64bit	30.57 ns	0.045 ns	0.040 ns	0.86
Main	39291(...)11.83 [23]	12884901.920	96bit / 64bit	35.45 ns	0.190 ns	0.159 ns	1.00

New	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	47.87 ns	0.129 ns	0.114 ns	0.82
Main	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	58.72 ns	0.106 ns	0.094 ns	1.00

64bit Multiply


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host]     : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2

Method	a	b	descr	Mean	Error	StdDev	Ratio
New	15564.65093	0.00000003	*32bit 32bit**	5.580 ns	0.0320 ns	0.0300 ns	1.00
Main	15564.65093	0.00000003	32bit * 32bit	5.578 ns	0.0096 ns	0.0085 ns	1.00

New	18446(...)19935 [21]	38738(...)51235 [22]	*96bit 96bit**	22.684 ns	0.2357 ns	0.2089 ns	0.94
Main	18446(...)19935 [21]	38738(...)51235 [22]	96bit * 96bit	24.079 ns	0.0712 ns	0.0666 ns	1.00

New	21702051861934.75013	51240935.53816662	*64it 64bit**	12.748 ns	0.0281 ns	0.0249 ns	0.91
Main	21702051861934.75013	51240935.53816662	64it * 64bit	13.954 ns	0.0561 ns	0.0525 ns	1.00

New	57510(...)29861 [21]	0.01193046	*96bit 32bit**	6.270 ns	0.0156 ns	0.0130 ns	0.65
Main	57510(...)29861 [21]	0.01193046	96bit * 32bit	9.707 ns	0.1240 ns	0.1160 ns	1.00

32bit

32bit benchmarks results


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio	RatioSD
New	92233(...)63.19 [21]	3.3	34bit / 32bit	60.96 ns	0.172 ns	0.161 ns	1.01	0.01
Main	92233(...)63.19 [21]	3.3	34bit / 32bit	60.42 ns	0.564 ns	0.500 ns	1.00	0.00

New	10145(...)02.39 [22]	0.3	96bit / 32bit	36.85 ns	0.072 ns	0.063 ns	1.01	0.00
Main	10145(...)02.39 [22]	0.3	96bit / 32bit	36.49 ns	0.090 ns	0.085 ns	1.00	0.00

New	39291(...)11.83 [23]	12884901.920	96bit / 64bit	78.43 ns	0.244 ns	0.217 ns	0.82	0.01
Main	39291(...)11.83 [23]	12884901.920	96bit / 64bit	95.56 ns	0.663 ns	0.620 ns	1.00	0.00

New	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	81.67 ns	1.663 ns	2.489 ns	0.62	0.03
Main	39291(...)21183 [22]	18446744211148505120	96bit / 96bit	132.38 ns	2.643 ns	4.559 ns	1.00	0.00


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

Method	a	b	descr	Mean	Error	StdDev	Ratio
New	15564.65093	0.00000003	*32bit 32bit**	10.65 ns	0.047 ns	0.042 ns	0.86
Main	15564.65093	0.00000003	32bit * 32bit	12.36 ns	0.062 ns	0.052 ns	1.00

New	18446(...)19935 [21]	38738(...)51235 [22]	*96bit 96bit**	65.15 ns	0.159 ns	0.141 ns	1.00
Main	18446(...)19935 [21]	38738(...)51235 [22]	96bit * 96bit	64.84 ns	0.178 ns	0.149 ns	1.00

New	21702051861934.75013	51240935.53816662	*64it 64bit**	40.23 ns	0.119 ns	0.112 ns	0.97
Main	21702051861934.75013	51240935.53816662	64it * 64bit	41.45 ns	0.534 ns	0.500 ns	1.00

New	57510(...)29861 [21]	0.01193046	*96bit 32bit**	18.69 ns	0.082 ns	0.077 ns	1.10
Main	57510(...)29861 [21]	0.01193046	96bit * 32bit	16.96 ns	0.051 ns	0.048 ns	1.00

Author:	Daniel-Svensson
Assignees:	-
Labels:	`area-System.Numerics`, `needs-area-label`
Milestone:	-

jkotas · 2024-03-04T05:22:10Z

relative prioritization of 32bit performance.

We do not actively invest into improving 32bit performance specifically. At the same, we avoid regressing 32bit performance unless there is a very good reason.

tannergooding · 2024-03-04T05:31:13Z

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs

-                // TODO: https://github.com/dotnet/runtime/issues/5213
-                ulong tmp, div;
-                if (bufNum.U2 != 0)
+                if (X86.X86Base.X64.IsSupported)


This is a lot of extra code for a relatively small performance increase on modern CPUs.

We currently use Skylake as the baseline for a lot of our perf score numbers and 3ns savings for 13 lines of new code (+23 more for x86) doesn't really seem worth it.

Ideally any improvements would be shared across all 3 platforms or be significant enough to make the additional complexity worthwhile.

Ok I removed the x64 specific code and just kept the x86 since it was just a few ns faster than the x86 specific code (as long as there are no branch misspredictions) and the x86 should be faster on Skylake even with a misspredicted branch.

It stills gives around ~10ns faster division for 96/32 case

I can change it to x64 only code insted (2 if statements and 2 DivRem calls) if you rather like it.

tannergooding · 2024-03-04T05:34:50Z

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs

+            /// <param name="den">64-bit divisor</param>
+            /// <returns>Returns quotient. Remainder overwrites lower 64-bits of dividend.</returns>
+            [MethodImpl(MethodImplOptions.AggressiveInlining)]
+            private static unsafe ulong Div128By64(Buf16* bufNum, ulong den)


Same general comment, but this is notably a case where we'd ideally just use UInt128.operator / and have the JIT recognize the general pattern instead.

Doing general purpose optimizations is almost always a better option than doing one offs. Ideally the JIT recognizes cases like ulong / uint or uint128 / ulong and does the optimal thing on platforms that support it.

In general that would be god, but in this specific case there we know several things that the compiler cannot know and optimize for.

denominator has already been shifted left so it has highest bit set

The metod will probably be called several times in a row so it makes sense to do bit shifting etc before it is called

den > bufNum->High64

However since both types lives in CoreLib, several helpers here could probably be made internal and useable by both UInt128 and Decimal division code. For example I think both this metod and maybe division by 32bit could be usefull from within UInt128

I opened #99747 with a similar fix for UInt128.

I did not make this internal and call it from UInt128 at the moment, but if the code works as fast without the divisor having the highest bit set then it might make sense to do so.

tannergooding · 2024-03-04T05:37:15Z

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs

+            /// <returns>hi bits of the result</returns>
+            /// <remarks>returns nuint instead of uint to skip clearing upper 32bits on 64bit platforms</remarks>
+            [MethodImpl(MethodImplOptions.AggressiveInlining)]
+            private static nuint BigMul64By32(ulong a, uint b, out ulong low)


Same general comment. This is another case where this should really just be a general purpose opt in the JIT, recognizing ulong * uint and optimizing it accordingly, rather than requiring a full ulong * ulong, just opting to do whichever is most efficient.

Sounds good if the JIT could recognice it and optimize the code.
Since the currently most popular approach to the get more than 64bit result is to call BigMul I hope that that method can be optimized in the future.

I created an internal overload of BigMul in the time beeing, which can easily be removed when the JIT can optimize 32*64 bit => 128bit multiplications

tannergooding · 2024-03-04T05:43:06Z

src/libraries/System.Private.CoreLib/src/System/Math.cs

-#if TARGET_32BIT
-            if (Bmi2.IsSupported)
-            {
-                uint low;
-                uint high = Bmi2.MultiplyNoFlags(a, b, &low);
-                return ((ulong)high << 32) | low;
-            }
-#endif


AFAIR, the intent was for us to have an overload that returns a tuple and uses the JIT multi-reg return hookups, but we haven't exposed that yet.

We ideally aren't removing opts like this, especially when even with the memory access it is potentially faster or close enough in most cases.

Ok I added it back, but it contributed to more than 20% slower division (more than 10ns for 1/3) so I removed a of calls and replaced it with the fallback path (ulong)(uint)a * (uint)b which the JIT recognize and generates optmized code for.

Since the JIT already has specialized codegen for 32 * 32 => 64bit division would it not make sense to just emit mulx there (or for normal multiplication as well) if it could lead to more efficient code (maybe when writing to memory or when the register usage gets better).

It seems clang uses mulx for 32 * 32 => 64bit in 32bit mode and 64 * 64 => 128bit in 64bit mode.

tannergooding · 2024-03-04T05:46:27Z

I'm, personally, not a huge fan of a lot of the changes here. It's a lot of new code for what appears to be some relatively minor perf increases overall.

Where the perf increase is more measurable (specifically n-bit / 32-bit), it looks like places where we're missing some more general purpose JIT optimizations which would likely be a better overall investment, working across multiple hardware configurations and for more than just System.Decimal.

…it can easily be removed once JIT recognize and optimize "ulong * uint"

… code

tannergooding · 2024-06-05T19:38:57Z

This has build failures that need to be addressed. They come from using DivRem which requires opting into preview features currently as there are some known codegen pessimizations that need to be handled still (#82194).

Daniel-Svensson · 2024-06-27T19:30:18Z

@tannergooding I've opted into preview features for the file, hopefully it should compile again.

Hope it is ok to opt into preview features for the whole file instead of doing 3 pairs of suppress/restore

tannergooding · 2024-07-10T18:46:12Z

Rerunning CI. Going to finish reviewing this today after CI finishes

xtqqczze · 2024-07-15T15:41:35Z

@MihuBot

…ental]

Daniel-Svensson added 7 commits February 18, 2024 11:39

Improve performance of decimal division

03b5226

update part of Multiply

218b373

- Add comment to BigMul64By32 and make it return nunit to avoid clearing upper 32 bits - Simplify IncreaseScale

remove unintentional file

f19ef92

Improve division by 64bit value on x64

e984fc5

Remove some more 64bit divides for x86

f0d62fd

Call IncreaseScale in one more place

792c3eb

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 3, 2024

add extra paranthesis

6dab5f9

Daniel-Svensson commented Mar 3, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs Outdated Show resolved Hide resolved

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs Show resolved Hide resolved

Daniel-Svensson changed the title ~~Some Decimal performance improvements~~ Some System.Decimal performance improvements Mar 3, 2024

EgorBo reviewed Mar 3, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Decimal.DecCalc.cs Outdated Show resolved Hide resolved

jkotas added area-System.Numerics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 4, 2024

tannergooding reviewed Mar 4, 2024

View reviewed changes

tannergooding mentioned this pull request Mar 4, 2024

DivRem 64by32 and 32by16 refactor in Decimal.DecCalc #99196

Closed

Daniel-Svensson added 4 commits March 4, 2024 13:22

review: remove X86.X86Base.X64.DivRem

83efb42

Remove BigMul64By32 and add overloads of Math.BigMul instead so that …

a20f37f

…it can easily be removed once JIT recognize and optimize "ulong * uint"

Simplify Div96By32

2f6e107

Remove 64 bit path from Div96By32

14a662b

This was referenced Mar 10, 2024

[browser][MT] Unhandled JS error #97546

Closed

[browser] FATAL UNHANDLED EXCEPTION #98842

Closed

Daniel-Svensson marked this pull request as ready for review March 13, 2024 20:05

Daniel-Svensson mentioned this pull request Mar 14, 2024

UInt128 improve division by 64 bit on x64 #99747

Open

Call IncreaseScale(ref Buf12) from IncreaseScale(ref Buf16) for 32bit…

0087c61

… code

build-analysis bot mentioned this pull request Mar 16, 2024

STATUS_UNSUCCESSFUL in RsaCryptRoundtrip_OaepSHA1 #29683

Open

Merge branch 'main' into deciaml_calc

f2a291e

build-analysis bot mentioned this pull request Apr 19, 2024

Timeout in System.Text.Json.Tests.Utf8JsonWriterTests.WriteNumbers #101193

Closed

tannergooding closed this Jun 5, 2024

tannergooding reopened this Jun 5, 2024

tannergooding added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 12, 2024

dotnet-policy-service bot added the no-recent-activity label Jun 26, 2024

Add #pragma warning disable CA2252

baae42d

dotnet-policy-service bot removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Jun 27, 2024

This was referenced Jun 28, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

Nuget: Central Directory corrupt dotnet/dnceng#3099

Open

The job running on agent NetCore-Public ran longer than the maximum time #104044

Closed

Merge branch 'dotnet:main' into deciaml_calc

7fe82f4

Merge branch 'main' into deciaml_calc

233700b

build-analysis bot mentioned this pull request Jul 11, 2024

[Test Failure] System.Net.Security.Tests.SslStreamNetworkStreamTest.SslStream_RandomSizeWrites_OK #104605

Closed

MihuBot mentioned this pull request Jul 15, 2024

[JitDiff X64] [Daniel-Svensson] Some System.Decimal performance improvements MihuBot/runtime-utils#528

Open

Daniel-Svensson added 2 commits September 12, 2024 17:06

Merge remote-tracking branch 'upstream/main' into deciaml_calc

8baedf7

Change supression to SYSLIB5004 now that DivRem is marked as [Experim…

3a97891

…ental]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some System.Decimal performance improvements #99212

Some System.Decimal performance improvements #99212

Daniel-Svensson commented Mar 3, 2024 •

edited

Loading

danmoseley commented Mar 4, 2024

huoyaoyuan commented Mar 4, 2024

ghost commented Mar 4, 2024

Questions

Remarks for better performance

Benchmarks

64bit Divide

64bit Multiply

32bit

jkotas commented Mar 4, 2024

tannergooding Mar 4, 2024

Daniel-Svensson Mar 13, 2024 •

edited

Loading

tannergooding Mar 4, 2024

Daniel-Svensson Mar 13, 2024

Daniel-Svensson Mar 14, 2024

tannergooding Mar 4, 2024

Daniel-Svensson Mar 13, 2024

tannergooding Mar 4, 2024

Daniel-Svensson Mar 13, 2024

tannergooding commented Mar 4, 2024

tannergooding commented Jun 5, 2024

Daniel-Svensson commented Jun 27, 2024

tannergooding commented Jul 10, 2024

xtqqczze commented Jul 15, 2024

Some System.Decimal performance improvements #99212

Are you sure you want to change the base?

Some System.Decimal performance improvements #99212

Conversation

Daniel-Svensson commented Mar 3, 2024 • edited Loading

Questions

Remarks for better performance

Benchmarks

Overview

Divide x64

64bit Multiply

x86 (32-bit)

danmoseley commented Mar 4, 2024

huoyaoyuan commented Mar 4, 2024

ghost commented Mar 4, 2024

Questions

Remarks for better performance

Benchmarks

64bit Divide

64bit Multiply

32bit

jkotas commented Mar 4, 2024

Choose a reason for hiding this comment

Daniel-Svensson Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Mar 4, 2024

tannergooding commented Jun 5, 2024

Daniel-Svensson commented Jun 27, 2024

tannergooding commented Jul 10, 2024

xtqqczze commented Jul 15, 2024

Daniel-Svensson commented Mar 3, 2024 •

edited

Loading

Daniel-Svensson Mar 13, 2024 •

edited

Loading