Refactor CpuMathUtils #1229

eerhardt · 2018-10-11T17:38:55Z

Allow it to take Spans instead of arrays.
Remove redundant overloads
When multiple spans are accepted, always use an explicit count parameter instead of one being chosen as having the right length.

Working towards #608

- Allow it to take Spans instead of arrays. - Remove redundant overloads - When multiple spans are accepted, always use an explicit count parameter instead of one being chosen as having the right length. Working towards dotnet#608

Zruty0 · 2018-10-11T18:10:54Z

src/Microsoft.ML.CpuMath/CpuMathUtils.netstandard.cs


-        public static float SumSq(float[] src, int offset, int count) => SseUtils.SumSq(src, offset, count);


SumSq [](start = 28, length = 5)

did these end up not necessary?

I wonder why were they added in the first place then. Maybe some TLC code will rely on them? #Closed

They are no longer necessary because the Span parameter takes care of needing to pass (and verify) offset and count. (Span is a pointer and a count). If you have an array, and need to offset into it, or limit the count, you can call array.AsSpan(start, length) and now you can call the one overload that takes a Span.

This helps make the CpuMathUtils class consistent. Previously, some methods had overloads that take an offset, some didn't. Some had multiple arrays and only one of those arrays had a corresponding offset parameter, etc.

I wonder why were they added in the first place then

We never had Span before. 😉

Maybe some TLC code will rely on them?

If that's the case, they can call SumSq(array.AsSpan(offset, count)) instead of SumSq(array, offset, count). #Closed

I noticed some of the AVX intrinsics were not used when I re-wrote them in pure C# when creating an OSX build about 2 yrs ago. I assumed at the time either they were for future use (completeness of the intrinsics set) or I simply wasn't calling code where they were used.

I've noticed the same. In fact, the whole "AVX" C++ code is not being called at all in ML.NET.

#780 we have issue for that.

Zruty0 · 2018-10-11T18:12:25Z

src/Microsoft.ML.CpuMath/Sse.cs

@@ -1474,44 +1320,43 @@ public static void ZeroMatrixItems(AlignedArray dst, int ccol, int cfltRow, int[
            }
        }

-        public static void SdcaL1UpdateDense(float primalUpdate, int length, float[] src, float threshold, float[] v, float[] w)
+        public static void SdcaL1UpdateDense(float primalUpdate, int count, ReadOnlySpan<float> src, float threshold, Span<float> v, Span<float> w)


count [](start = 69, length = 5)

this is a somewhat conspicuous rename, because length and count are both used in a very specific way in our codebase.

Maybe it's worthwhile to add a summary comment to this method to denote exactly what is the expectation of count ? #Closed

In the ML.CpuMath assembly, all the code uses the term count to mean "how many elements to work over". This was the only externally visible usage of length, so I decided to rename it to be consistent with the rest of the classes in ML.CpuMath. It also fixes the issue with SdcaL1UpdateSparse that length wasn't used at all, but the other parameter count WAS used. Now both SdcaL1UpdateDense and SdcaL1UpdateSparse have a consistent signature, with the only difference that Sparse takes in the indices as well. #Closed

Zruty0 · 2018-10-11T18:12:56Z

src/Microsoft.ML.CpuMath/SseIntrinsics.cs

@@ -566,12 +566,12 @@ public static unsafe void Scale(float scale, Span<float> dst)
            }
        }

-        public static unsafe void ScaleSrcU(float scale, Span<float> src, Span<float> dst)
+        public static unsafe void ScaleSrcU(float scale, ReadOnlySpan<float> src, Span<float> dst, int count)


count [](start = 103, length = 5)

this addition also probably warrants a summary comment #Closed

all of them I mean

In reply to: 224550873 [](ancestors = 224550873)

Putting summary comments on SseIntrinsics is a waste because only CpuMathUtils will be calling SseIntrinsics. It is an internal class to ML.CpuMath. The summary comments should go on the external facting CpuMathUtils class.

I can add them to CpuMathUtils - do you want full summary comments for all functions?

I suppose yes. Can we make these methods non-public then?

In reply to: 224636550 [](ancestors = 224636550)

It's an internal class, so this whole class is internal.

I've opened #1265 to add xml comments and arg validation since CpuMathUtils is public.

Zruty0

Anipik

LGTM

danmoseley · 2018-10-13T14:27:21Z

Do you have benchmark results?

Cc @tannergooding

eerhardt · 2018-10-15T15:09:48Z

Here are some results from my machine:

Before:

              Method |         Mean |      Error |     StdDev |        Extra Metric |
-------------------- |-------------:|-----------:|-----------:|--------------------:|
    TrainKMeansAndLR |     709.1 ms |   44.86 ms |   49.86 ms |                   - |
           TrainIris |     262.9 ms |   12.43 ms |   14.31 ms |                   - |
      TrainSentiment | 1,763.720 ms | 22.2524 ms | 20.8149 ms |                   - |
         PredictIris |     1.648 ms |  0.0104 ms |  0.0087 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf1 |     1.623 ms |  0.0361 ms |  0.0416 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf2 |     1.631 ms |  0.0391 ms |  0.0450 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf5 |     1.592 ms |  0.0326 ms |  0.0375 ms | AccuracyMacro: 0.98 |

After:

              Method |         Mean |      Error |     StdDev |        Extra Metric |
-------------------- |-------------:|-----------:|-----------:|--------------------:|
    TrainKMeansAndLR |     712.8 ms |   48.01 ms |   53.37 ms |                   - |
           TrainIris |     266.7 ms |   9.616 ms |   10.69 ms |                   - |
      TrainSentiment | 1,765.497 ms | 37.6166 ms | 41.8108 ms |                   - |
         PredictIris |     1.656 ms |  0.0191 ms |  0.0170 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf1 |     1.546 ms |  0.0276 ms |  0.0258 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf2 |     1.645 ms |  0.0366 ms |  0.0421 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf5 |     1.574 ms |  0.0407 ms |  0.0468 ms | AccuracyMacro: 0.98 |

Without the calls to MemoryMarshal.GetReference when pinning the Span, I am seeing:

              Method |         Mean |      Error |     StdDev |        Extra Metric |
-------------------- |-------------:|-----------:|-----------:|--------------------:|
    TrainKMeansAndLR |     810.2 ms |   71.12 ms |   79.05 ms |                   - |
           TrainIris |   260.214 ms |  9.7503 ms | 11.2285 ms |                   - |
      TrainSentiment | 1,752.501 ms | 24.0648 ms | 22.5103 ms |                   - |
         PredictIris |     1.650 ms |  0.0151 ms |  0.0141 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf1 |     1.599 ms |  0.0362 ms |  0.0339 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf2 |     1.646 ms |  0.0179 ms |  0.0167 ms | AccuracyMacro: 0.98 |
 PredictIrisBatchOf5 |     1.635 ms |  0.0192 ms |  0.0179 ms | AccuracyMacro: 0.98 |

As you can see, the TrainKMeansAndLR benchmark was really affected by the change to MemoryMarshal.GetReference.

Anipik · 2018-10-15T17:00:12Z

@eerhardt wont it be nice to also check the performance directly using cpumath.performaceTests
like

machinelearning/test/Microsoft.ML.CpuMath.PerformanceTests/AvxPerformanceTests.cs

Line 15 in 0b84350

public void AddScalarU()

tannergooding · 2018-10-15T17:24:43Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

@@ -448,7 +449,7 @@ public static unsafe void MatMulTranX(bool add, AlignedArray mat, AlignedArray s
        // dst[i] += scale
        public static unsafe void AddScalarU(float scalar, Span<float> dst)
        {
-            fixed (float* pdst = dst)
+            fixed (float* pdst = &MemoryMarshal.GetReference(dst))


It might be good to have an analyzer for this...

Not necessarily. You have to ensure that the span is non-empty, because if you call this on an empty span, you'll get back a garbage pointer.

Also, there are some concerns that the .NET Core team has that these calls shouldn't be necessary. See the conversation at dotnet/corefx#32669 (comment), so it may not be worth it to write the analyzer if the recommendation isn't to use this method everywhere.

tannergooding · 2018-10-15T17:25:47Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

@@ -606,12 +607,12 @@ public static unsafe void Scale(float scale, Span<float> dst)
            }
        }

-        public static unsafe void ScaleSrcU(float scale, Span<float> src, Span<float> dst)
+        public static unsafe void ScaleSrcU(float scale, ReadOnlySpan<float> src, Span<float> dst, int count)


Should we have a Contract.Assert that count is less than or equal to the length of both dst and src?

These already happen in the calling CpuMathUtils methods:

https://github.com/dotnet/machinelearning/pull/1229/files/ec4c943cedf3ab535d69bac83062255c542827da#diff-aa5d85893df98249c289efe91e0c0ac1R248

tannergooding · 2018-10-15T17:30:17Z

src/Microsoft.ML.CpuMath/Microsoft.ML.CpuMath.csproj

@@ -28,6 +28,7 @@
    <Compile Remove="CpuMathUtils.netcoreapp.cs" />
    <Compile Remove="SseIntrinsics.cs" />
    <Compile Remove="AvxIntrinsics.cs" />
+    <PackageReference Include="System.Memory" Version="$(SystemMemoryVersion)" />


nit: Having different items in their own ItemGroups makes the file easier to manually parse.

I'll add a blank line above this line to segregate the Compile and PackageReference groups. I added this to the same ItemGroup to limit the number of TargetFramework Conditions.

eerhardt · 2018-10-15T18:07:31Z

wont it be nice to also check the performance directly using cpumath.performaceTests
like

I only ran the Avx.Sum* perf tests because these take so long.

Before:

      Method |     Mean |    Error |   StdDev |
------------ |---------:|---------:|---------:|
        SumU | 205.2 us | 4.053 us | 4.668 us |
      SumSqU | 220.6 us | 4.096 us | 4.023 us |
  SumSqDiffU | 230.4 us | 4.471 us | 5.814 us |
     SumAbsU | 233.6 us | 4.612 us | 8.317 us |
 SumAbsDiffU | 240.9 us | 4.775 us | 8.237 us |

After:

      Method |     Mean |    Error |   StdDev |
------------ |---------:|---------:|---------:|
        SumU | 197.5 us | 3.030 us | 2.686 us |
      SumSqU | 213.3 us | 4.000 us | 3.546 us |
  SumSqDiffU | 218.7 us | 2.469 us | 2.309 us |
     SumAbsU | 240.8 us | 2.979 us | 2.641 us |
 SumAbsDiffU | 233.8 us | 8.379 us | 8.966 us |

eerhardt requested review from Ivanidzo4ka, danmoseley, tannergooding and Anipik October 11, 2018 17:38

Refactor CpuMathUtils

9c7dc9c

- Allow it to take Spans instead of arrays. - Remove redundant overloads - When multiple spans are accepted, always use an explicit count parameter instead of one being chosen as having the right length. Working towards dotnet#608

eerhardt force-pushed the UpdateCpuMathUtils branch from 682b4cc to 9c7dc9c Compare October 11, 2018 17:57

Zruty0 reviewed Oct 11, 2018

View reviewed changes

Zruty0 approved these changes Oct 11, 2018

View reviewed changes

Anipik reviewed Oct 11, 2018

View reviewed changes

Use MemoryMarshal.GetReference to avoid perf hit when pinning Span.

ec4c943

tannergooding reviewed Oct 15, 2018

View reviewed changes

tannergooding approved these changes Oct 15, 2018

View reviewed changes

PR feedback

1d1d758

eerhardt mentioned this pull request Oct 15, 2018

Add xml summary comments and argument validation to CpuMathUtils #1265

Closed

eerhardt merged commit bee7f17 into dotnet:master Oct 15, 2018

eerhardt deleted the UpdateCpuMathUtils branch October 15, 2018 19:31

justinormont mentioned this pull request Oct 15, 2018

Micro-accuracy for Multiclass Classification tests #1268

Closed

eerhardt mentioned this pull request Oct 16, 2018

Update input arguments in CpuMath files #1021

Merged

eerhardt mentioned this pull request Oct 24, 2018

Need to add PackageReference to System.Memory from ML.CpuMath nuget #1359

Closed

ghost locked as resolved and limited conversation to collaborators Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor CpuMathUtils #1229

Refactor CpuMathUtils #1229

eerhardt commented Oct 11, 2018

Zruty0 Oct 11, 2018 •

edited

Loading

eerhardt Oct 11, 2018 •

edited by Zruty0

Loading

justinormont Oct 15, 2018 •

edited

Loading

eerhardt Oct 15, 2018

Ivanidzo4ka Oct 15, 2018

Zruty0 Oct 11, 2018 •

edited

Loading

eerhardt Oct 11, 2018 •

edited by Zruty0

Loading

Zruty0 Oct 11, 2018 •

edited

Loading

Zruty0 Oct 11, 2018

eerhardt Oct 11, 2018

Zruty0 Oct 15, 2018

eerhardt Oct 15, 2018

eerhardt Oct 15, 2018

Zruty0 left a comment

Anipik left a comment

danmoseley commented Oct 13, 2018 •

edited

Loading

eerhardt commented Oct 15, 2018

Anipik commented Oct 15, 2018

tannergooding Oct 15, 2018

eerhardt Oct 15, 2018

tannergooding Oct 15, 2018

eerhardt Oct 15, 2018

tannergooding Oct 15, 2018

eerhardt Oct 15, 2018

eerhardt commented Oct 15, 2018


		public static float SumSq(float[] src, int offset, int count) => SseUtils.SumSq(src, offset, count);

Refactor CpuMathUtils #1229

Refactor CpuMathUtils #1229

Conversation

eerhardt commented Oct 11, 2018

Zruty0 Oct 11, 2018 • edited Loading

Choose a reason for hiding this comment

eerhardt Oct 11, 2018 • edited by Zruty0 Loading

Choose a reason for hiding this comment

justinormont Oct 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zruty0 Oct 11, 2018 • edited Loading

Choose a reason for hiding this comment

eerhardt Oct 11, 2018 • edited by Zruty0 Loading

Choose a reason for hiding this comment

Zruty0 Oct 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zruty0 left a comment

Choose a reason for hiding this comment

Anipik left a comment

Choose a reason for hiding this comment

danmoseley commented Oct 13, 2018 • edited Loading

eerhardt commented Oct 15, 2018

Anipik commented Oct 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eerhardt commented Oct 15, 2018

Zruty0 Oct 11, 2018 •

edited

Loading

eerhardt Oct 11, 2018 •

edited by Zruty0

Loading

justinormont Oct 15, 2018 •

edited

Loading

Zruty0 Oct 11, 2018 •

edited

Loading

eerhardt Oct 11, 2018 •

edited by Zruty0

Loading

Zruty0 Oct 11, 2018 •

edited

Loading

danmoseley commented Oct 13, 2018 •

edited

Loading