Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CpuMathUtils #1229

Merged
merged 3 commits into from
Oct 15, 2018
Merged

Refactor CpuMathUtils #1229

merged 3 commits into from
Oct 15, 2018

Conversation

eerhardt
Copy link
Member

  • Allow it to take Spans instead of arrays.
  • Remove redundant overloads
  • When multiple spans are accepted, always use an explicit count parameter instead of one being chosen as having the right length.

Working towards #608

- Allow it to take Spans instead of arrays.
- Remove redundant overloads
- When multiple spans are accepted, always use an explicit count parameter instead of one being chosen as having the right length.

Working towards dotnet#608

public static float SumSq(float[] src, int offset, int count) => SseUtils.SumSq(src, offset, count);
Copy link
Contributor

@Zruty0 Zruty0 Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SumSq [](start = 28, length = 5)

did these end up not necessary?

I wonder why were they added in the first place then. Maybe some TLC code will rely on them? #Closed

Copy link
Member Author

@eerhardt eerhardt Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are no longer necessary because the Span parameter takes care of needing to pass (and verify) offset and count. (Span is a pointer and a count). If you have an array, and need to offset into it, or limit the count, you can call array.AsSpan(start, length) and now you can call the one overload that takes a Span.

This helps make the CpuMathUtils class consistent. Previously, some methods had overloads that take an offset, some didn't. Some had multiple arrays and only one of those arrays had a corresponding offset parameter, etc.

I wonder why were they added in the first place then

We never had Span before. 😉

Maybe some TLC code will rely on them?

If that's the case, they can call SumSq(array.AsSpan(offset, count)) instead of SumSq(array, offset, count). #Closed

Copy link
Contributor

@justinormont justinormont Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some of the AVX intrinsics were not used when I re-wrote them in pure C# when creating an OSX build about 2 yrs ago. I assumed at the time either they were for future use (completeness of the intrinsics set) or I simply wasn't calling code where they were used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed the same. In fact, the whole "AVX" C++ code is not being called at all in ML.NET.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#780 we have issue for that.

@@ -1474,44 +1320,43 @@ public static void ZeroMatrixItems(AlignedArray dst, int ccol, int cfltRow, int[
}
}

public static void SdcaL1UpdateDense(float primalUpdate, int length, float[] src, float threshold, float[] v, float[] w)
public static void SdcaL1UpdateDense(float primalUpdate, int count, ReadOnlySpan<float> src, float threshold, Span<float> v, Span<float> w)
Copy link
Contributor

@Zruty0 Zruty0 Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count [](start = 69, length = 5)

this is a somewhat conspicuous rename, because length and count are both used in a very specific way in our codebase.

Maybe it's worthwhile to add a summary comment to this method to denote exactly what is the expectation of count ? #Closed

Copy link
Member Author

@eerhardt eerhardt Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ML.CpuMath assembly, all the code uses the term count to mean "how many elements to work over". This was the only externally visible usage of length, so I decided to rename it to be consistent with the rest of the classes in ML.CpuMath. It also fixes the issue with SdcaL1UpdateSparse that length wasn't used at all, but the other parameter count WAS used. Now both SdcaL1UpdateDense and SdcaL1UpdateSparse have a consistent signature, with the only difference that Sparse takes in the indices as well. #Closed

@@ -566,12 +566,12 @@ public static unsafe void Scale(float scale, Span<float> dst)
}
}

public static unsafe void ScaleSrcU(float scale, Span<float> src, Span<float> dst)
public static unsafe void ScaleSrcU(float scale, ReadOnlySpan<float> src, Span<float> dst, int count)
Copy link
Contributor

@Zruty0 Zruty0 Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count [](start = 103, length = 5)

this addition also probably warrants a summary comment #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of them I mean


In reply to: 224550873 [](ancestors = 224550873)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting summary comments on SseIntrinsics is a waste because only CpuMathUtils will be calling SseIntrinsics. It is an internal class to ML.CpuMath. The summary comments should go on the external facting CpuMathUtils class.

I can add them to CpuMathUtils - do you want full summary comments for all functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose yes. Can we make these methods non-public then?


In reply to: 224636550 [](ancestors = 224636550)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal class, so this whole class is internal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #1265 to add xml comments and arg validation since CpuMathUtils is public.

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Anipik Anipik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danmoseley
Copy link
Member

danmoseley commented Oct 13, 2018

Do you have benchmark results?

Cc @tannergooding

@eerhardt
Copy link
Member Author

Here are some results from my machine:

Before:

              Method |         Mean |      Error |     StdDev |        Extra Metric |
-------------------- |-------------:|-----------:|-----------:|--------------------:|
    TrainKMeansAndLR |     709.1 ms |   44.86 ms |   49.86 ms |                   - |
           TrainIris |     262.9 ms |   12.43 ms |   14.31 ms |                   - |
      TrainSentiment | 1,763.720 ms | 22.2524 ms | 20.8149 ms |                   - |
         PredictIris |     1.648 ms |  0.0104 ms |  0.0087 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf1 |     1.623 ms |  0.0361 ms |  0.0416 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf2 |     1.631 ms |  0.0391 ms |  0.0450 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf5 |     1.592 ms |  0.0326 ms |  0.0375 ms | AccuracyMacro: 0.98 |

After:

              Method |         Mean |      Error |     StdDev |        Extra Metric |
-------------------- |-------------:|-----------:|-----------:|--------------------:|
    TrainKMeansAndLR |     712.8 ms |   48.01 ms |   53.37 ms |                   - |
           TrainIris |     266.7 ms |   9.616 ms |   10.69 ms |                   - |
      TrainSentiment | 1,765.497 ms | 37.6166 ms | 41.8108 ms |                   - |
         PredictIris |     1.656 ms |  0.0191 ms |  0.0170 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf1 |     1.546 ms |  0.0276 ms |  0.0258 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf2 |     1.645 ms |  0.0366 ms |  0.0421 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf5 |     1.574 ms |  0.0407 ms |  0.0468 ms | AccuracyMacro: 0.98 |

Without the calls to MemoryMarshal.GetReference when pinning the Span, I am seeing:

              Method |         Mean |      Error |     StdDev |        Extra Metric |
-------------------- |-------------:|-----------:|-----------:|--------------------:|
    TrainKMeansAndLR |     810.2 ms |   71.12 ms |   79.05 ms |                   - |
           TrainIris |   260.214 ms |  9.7503 ms | 11.2285 ms |                   - |
      TrainSentiment | 1,752.501 ms | 24.0648 ms | 22.5103 ms |                   - |
         PredictIris |     1.650 ms |  0.0151 ms |  0.0141 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf1 |     1.599 ms |  0.0362 ms |  0.0339 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf2 |     1.646 ms |  0.0179 ms |  0.0167 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf5 |     1.635 ms |  0.0192 ms |  0.0179 ms | AccuracyMacro: 0.98 |

As you can see, the TrainKMeansAndLR benchmark was really affected by the change to MemoryMarshal.GetReference.

@Anipik
Copy link
Contributor

Anipik commented Oct 15, 2018

@eerhardt wont it be nice to also check the performance directly using cpumath.performaceTests
like

@@ -448,7 +449,7 @@ public static unsafe void MatMulTranX(bool add, AlignedArray mat, AlignedArray s
// dst[i] += scale
public static unsafe void AddScalarU(float scalar, Span<float> dst)
{
fixed (float* pdst = dst)
fixed (float* pdst = &MemoryMarshal.GetReference(dst))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to have an analyzer for this...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. You have to ensure that the span is non-empty, because if you call this on an empty span, you'll get back a garbage pointer.

Also, there are some concerns that the .NET Core team has that these calls shouldn't be necessary. See the conversation at dotnet/corefx#32669 (comment), so it may not be worth it to write the analyzer if the recommendation isn't to use this method everywhere.

@@ -606,12 +607,12 @@ public static unsafe void Scale(float scale, Span<float> dst)
}
}

public static unsafe void ScaleSrcU(float scale, Span<float> src, Span<float> dst)
public static unsafe void ScaleSrcU(float scale, ReadOnlySpan<float> src, Span<float> dst, int count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a Contract.Assert that count is less than or equal to the length of both dst and src?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -28,6 +28,7 @@
<Compile Remove="CpuMathUtils.netcoreapp.cs" />
<Compile Remove="SseIntrinsics.cs" />
<Compile Remove="AvxIntrinsics.cs" />
<PackageReference Include="System.Memory" Version="$(SystemMemoryVersion)" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Having different items in their own ItemGroups makes the file easier to manually parse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a blank line above this line to segregate the Compile and PackageReference groups. I added this to the same ItemGroup to limit the number of TargetFramework Conditions.

@eerhardt
Copy link
Member Author

wont it be nice to also check the performance directly using cpumath.performaceTests
like

I only ran the Avx.Sum* perf tests because these take so long.

Before:

      Method |     Mean |    Error |   StdDev |
------------ |---------:|---------:|---------:|
        SumU | 205.2 us | 4.053 us | 4.668 us |
      SumSqU | 220.6 us | 4.096 us | 4.023 us |
  SumSqDiffU | 230.4 us | 4.471 us | 5.814 us |
     SumAbsU | 233.6 us | 4.612 us | 8.317 us |
 SumAbsDiffU | 240.9 us | 4.775 us | 8.237 us |

After:

      Method |     Mean |    Error |   StdDev |
------------ |---------:|---------:|---------:|
        SumU | 197.5 us | 3.030 us | 2.686 us |
      SumSqU | 213.3 us | 4.000 us | 3.546 us |
  SumSqDiffU | 218.7 us | 2.469 us | 2.309 us |
     SumAbsU | 240.8 us | 2.979 us | 2.641 us |
 SumAbsDiffU | 233.8 us | 8.379 us | 8.966 us |

@eerhardt eerhardt merged commit bee7f17 into dotnet:master Oct 15, 2018
@eerhardt eerhardt deleted the UpdateCpuMathUtils branch October 15, 2018 19:31
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants