-
-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Crc32 for arm #1987
Optimize Crc32 for arm #1987
Conversation
} | ||
#else | ||
return ~CalculateScalar(~crc, buffer); | ||
else if (ArmCrc32.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be if (ArmCrc32.IsSupported)
instead?
Hmmm .. this is difficult. On one hand I really appreciate and value the contribution, and would be happy to see us getting better on ARM. On the other hand I'm afraid we lack some prerequisites that are necessary to take it in:
As such I'm leaning towards not merging this to @SixLabors/core thoughts? |
@antonfirsov Can we use this to emulate the environment with QEMU? @Bond-009 I'd love to know how you manage ARM testing with Jellyfin. |
For this to work we would need also a docker image with ImageMagick build for ARM, otherwise most of the test wont work. While this can work, it will be very slow. My very rough estimate would be, that running all test with QEMU would take something like 40 minutes (just a wild guess, but i doubt it will be much faster). Also what I am not sure about is, if QEMU will emulate hardware intrinsics, too. In my opinion we should keep the changes from this PR in a separate branch until we figure out how to run the tests properly on the CI. |
@dlemstra got ARM builds working I think. By my understanding QEMU does emulate hardware intrinsics but I don’t know if there’s limitations. Alternatively we can look at purchasing a device and wiring up CI to talk to that (I’ve read that’s possible but no idea how). Re branching… yeah. Very wise. |
Yes a selfhosted CI runner would work, but this device then must run all the time somewhere and always be reachable to the internet. Not sure how feasible that would be. |
I have a static IP so possible… I’d like to explore QEMU first though |
I have tried out https://github.com/uraimo/run-on-arch-action with a simple example.
Besides that, I have tried running a simple unit test which uses ARM intrinsics with docker (which also uses QEMU when the host is not ARM) and it seems to work. So in theory QEMU should work. Not sure why edit:
If @dlemstra got ARM builds working for imagemagick, I could try those in the next steps. |
Adds a fast path in Crc32 for arm and arm64
@JimBobSquarePants We don't have automated arm tests for jellyfin. (We also don't have any arm specific code) Automated tests would only help with correctness regressions, not performance regressions, so you'd still have to check that manually. |
We've actually developed a technique here that allows us to use automated testing against intrinsics features. We're able to switch off intrinsics features to ensure correctness for all implementations and fallbacks. ImageSharp/tests/ImageSharp.Tests/TestUtilities/FeatureTesting/FeatureTestRunner.cs Line 52 in 46b0bf0
|
Maybe we should adjust the |
Good idea. I need to check the runtime also to see if there’s additional flags now for arm intrinsics |
Codecov Report
@@ Coverage Diff @@
## arm #1987 +/- ##
=====================================
- Coverage 88% 88% -1%
=====================================
Files 979 979
Lines 51939 51935 -4
Branches 6499 6499
=====================================
- Hits 45755 45740 -15
- Misses 5112 5119 +7
- Partials 1072 1076 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Found the new collection of constants. We'll have to figure out which ones are most relevant. |
Not sure howto decide which once are relevant yet. Should we add all of them or should we wait with that until we have more ARM specific code in the repo? We can disable intrinsics complete for now. |
@Bond-009 could you please change the target branch from |
@tannergooding Would it be possible for you to advise which constants we should add to our ImageSharp/tests/ImageSharp.Tests/TestUtilities/FeatureTesting/FeatureTestRunner.cs Lines 388 to 410 in 92da254
|
@JimBobSquarePants, sure thing! The full list of available flags is here: https://github.com/dotnet/runtime/blob/baae8fa837198fbc1cd42dfee951579e877745f1/src/coreclr/inc/clrconfigvalues.h#L736-L769 Since the list for x86/x64 and Arm64 is large, I've given a more "in depth" explanation of what is likely meaningful to target below. Feel free to ignore if its not important or irrelevant here. The recommendation for x86/x64 would be to target the following configurations:
There is a little bit of nuance and leeway here, and of course you're free to target whatever actual instruction sets fit you and your customer's needs. However, the above provides the biggest major differences in functionality and what actual hardware has shipped with up until this point so you'll likely end up with the broadest "meaningful" coverage and the least amount of code paths to support by targeting these. For Arm64, the consideration would be:
Since Arm licenses the spec and there can be many vendors producing their own chips and deciding what ISAs are meaningful to them or not, the coverage of what's actually supported beyond this is a bit hazy and a couple of them (like A summary is:
An annotation on the above is that something required in |
@tannergooding Thanks so much for all the info, this is perfect! |
I think this is ready to be merged into the |
@@ -198,6 +208,103 @@ private static unsafe uint CalculateSse(uint crc, ReadOnlySpan<byte> buffer) | |||
} | |||
} | |||
} | |||
|
|||
#if NET5_0_OR_GREATER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to actually target a .NET 5+ runtime to take advantage of this. I want to add an explicit target for .NET 6 for v3.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks everyone 👍
Prerequisites
Description
Adds a fast path in Crc32 for arm and arm64
Some benchmarks for the arm64 path:
Current (LUT):
arm crc32 instructions without loop unrolling:
This PR: