-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CpuMathUtils #1229
Refactor CpuMathUtils #1229
Conversation
- Allow it to take Spans instead of arrays. - Remove redundant overloads - When multiple spans are accepted, always use an explicit count parameter instead of one being chosen as having the right length. Working towards dotnet#608
682b4cc
to
9c7dc9c
Compare
|
||
public static float SumSq(float[] src, int offset, int count) => SseUtils.SumSq(src, offset, count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SumSq [](start = 28, length = 5)
did these end up not necessary?
I wonder why were they added in the first place then. Maybe some TLC code will rely on them? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are no longer necessary because the Span
parameter takes care of needing to pass (and verify) offset
and count
. (Span is a pointer and a count). If you have an array, and need to offset into it, or limit the count, you can call array.AsSpan(start, length)
and now you can call the one overload that takes a Span.
This helps make the CpuMathUtils class consistent. Previously, some methods had overloads that take an offset
, some didn't. Some had multiple arrays and only one of those arrays had a corresponding offset
parameter, etc.
I wonder why were they added in the first place then
We never had Span before. 😉
Maybe some TLC code will rely on them?
If that's the case, they can call SumSq(array.AsSpan(offset, count))
instead of SumSq(array, offset, count)
. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed some of the AVX intrinsics were not used when I re-wrote them in pure C# when creating an OSX build about 2 yrs ago. I assumed at the time either they were for future use (completeness of the intrinsics set) or I simply wasn't calling code where they were used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed the same. In fact, the whole "AVX" C++ code is not being called at all in ML.NET.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#780 we have issue for that.
@@ -1474,44 +1320,43 @@ public static void ZeroMatrixItems(AlignedArray dst, int ccol, int cfltRow, int[ | |||
} | |||
} | |||
|
|||
public static void SdcaL1UpdateDense(float primalUpdate, int length, float[] src, float threshold, float[] v, float[] w) | |||
public static void SdcaL1UpdateDense(float primalUpdate, int count, ReadOnlySpan<float> src, float threshold, Span<float> v, Span<float> w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count [](start = 69, length = 5)
this is a somewhat conspicuous rename, because length
and count
are both used in a very specific way in our codebase.
Maybe it's worthwhile to add a summary comment to this method to denote exactly what is the expectation of count
? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ML.CpuMath
assembly, all the code uses the term count
to mean "how many elements to work over". This was the only externally visible usage of length
, so I decided to rename it to be consistent with the rest of the classes in ML.CpuMath
. It also fixes the issue with SdcaL1UpdateSparse
that length
wasn't used at all, but the other parameter count
WAS used. Now both SdcaL1UpdateDense
and SdcaL1UpdateSparse
have a consistent signature, with the only difference that Sparse takes in the indices
as well. #Closed
@@ -566,12 +566,12 @@ public static unsafe void Scale(float scale, Span<float> dst) | |||
} | |||
} | |||
|
|||
public static unsafe void ScaleSrcU(float scale, Span<float> src, Span<float> dst) | |||
public static unsafe void ScaleSrcU(float scale, ReadOnlySpan<float> src, Span<float> dst, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count [](start = 103, length = 5)
this addition also probably warrants a summary comment #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting summary comments on SseIntrinsics is a waste because only CpuMathUtils will be calling SseIntrinsics. It is an internal class to ML.CpuMath
. The summary comments should go on the external facting CpuMathUtils
class.
I can add them to CpuMathUtils - do you want full summary comments for all functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose yes. Can we make these methods non-public then?
In reply to: 224636550 [](ancestors = 224636550)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal class, so this whole class is internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #1265 to add xml comments and arg validation since CpuMathUtils
is public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Do you have benchmark results? |
Here are some results from my machine: Before:
After:
Without the calls to
As you can see, the |
@eerhardt wont it be nice to also check the performance directly using cpumath.performaceTests machinelearning/test/Microsoft.ML.CpuMath.PerformanceTests/AvxPerformanceTests.cs Line 15 in 0b84350
|
@@ -448,7 +449,7 @@ public static unsafe void MatMulTranX(bool add, AlignedArray mat, AlignedArray s | |||
// dst[i] += scale | |||
public static unsafe void AddScalarU(float scalar, Span<float> dst) | |||
{ | |||
fixed (float* pdst = dst) | |||
fixed (float* pdst = &MemoryMarshal.GetReference(dst)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to have an analyzer for this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. You have to ensure that the span is non-empty, because if you call this on an empty span, you'll get back a garbage pointer.
Also, there are some concerns that the .NET Core team has that these calls shouldn't be necessary. See the conversation at dotnet/corefx#32669 (comment), so it may not be worth it to write the analyzer if the recommendation isn't to use this method everywhere.
@@ -606,12 +607,12 @@ public static unsafe void Scale(float scale, Span<float> dst) | |||
} | |||
} | |||
|
|||
public static unsafe void ScaleSrcU(float scale, Span<float> src, Span<float> dst) | |||
public static unsafe void ScaleSrcU(float scale, ReadOnlySpan<float> src, Span<float> dst, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a Contract.Assert
that count is less than or equal to the length of both dst and src?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These already happen in the calling CpuMathUtils
methods:
@@ -28,6 +28,7 @@ | |||
<Compile Remove="CpuMathUtils.netcoreapp.cs" /> | |||
<Compile Remove="SseIntrinsics.cs" /> | |||
<Compile Remove="AvxIntrinsics.cs" /> | |||
<PackageReference Include="System.Memory" Version="$(SystemMemoryVersion)" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Having different items in their own ItemGroups makes the file easier to manually parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a blank line above this line to segregate the Compile and PackageReference groups. I added this to the same ItemGroup to limit the number of TargetFramework
Conditions.
I only ran the Before:
After:
|
Working towards #608