Add vectorized implementation of hex encoding/decoding #39702

tkp1n · 2020-07-21T14:00:47Z

History

#37546 introduced a public API to encode/decode bytes from/to hexadecimal strings. It also consolidated a lot of internal implementations of such methods to the common class HexConverter.

The implementation in the PR mentioned above was kept simple to ensure the API gets merged in time for .NET 5 with optimizations left for further PRs.

Summary

This PR now introduces vectorized methods for the encoding/decoding of hex data as a follow-up to the above-mentioned PR.
With relatively little added complexity we achieve more than an order of magnitude in performance improvements for larger data sets (e.g. 2048 bytes).

Perf-Numbers

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.388 (2004/?/20H1)
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores

Method	Job	EnvironmentVariables	NofBytes	Mean	Error	StdDev
Decode	AVX2	COMPlus_EnableAVX2=1	2048	194.46 ns	1.101 ns	0.976 ns
Decode	Scalar	COMPlus_EnableAVX2=0	2048	4,036.43 ns	21.000 ns	17.536 ns
Encode	AVX2	COMPlus_EnableAVX2=1	2048	305.11 ns	2.013 ns	1.785 ns
Encode	Scalar	COMPlus_EnableAVX2=0	2048	4,584.03 ns	32.259 ns	30.175 ns

Review Input / Open Issues

@tannergooding and @gfoidl you may be interested

This is marked as a draft PR for several reasons:

It is unclear whether the optimizations will be deemed worthy to be merged (perf vs. complexity trade-off).
The implementation is not yet polished and only good for initial feedback.
The build is currently failing, as we would have to exclude (#if) the vectorized methods from builds not supporting intrinsics. I didn't find any code based on intrinsics in the Common directory to base this kind of build restrictions on.

gfoidl

Did just a quick pass over the code (will have a deeper look a bit later).

Do you plan to provide a SSE2 / SSE41 (for blend) path too?
I think it should be possible to do so...

src/libraries/Common/src/System/HexConverter.cs

gfoidl · 2020-07-21T15:41:03Z

performance improvements for larger data sets (e.g. 2048 bytes).

When the PR isn't a draft anymore, we should collect numbers for differnt input-size to verify that small inputs (common case?) don't regress (too much).

tkp1n · 2020-07-21T15:47:36Z

Did just a quick pass over the code (will have a deeper look a bit later).

Thanks for the initial review. I'll make your suggested adjustments.

Do you plan to provide a SSE2 / SSE41 (for blend) path too?
I think it should be possible to do so...

I have the following algorithms ready but decided to start here with AVX2 (biggest bang for the buck).

Encode:

AVX2
SSSE3
SSE2

Decode:

AVX2
SSE41
SSSE3
SSE2

Feel free to browse through my HexMate repository to check out those implementations as well.

When the PR isn't a draft anymore, we should collect numbers for differnt input-size to verify that small inputs (common case?) don't regress (too much).

Absolutely! I'll make the effort of creating those numbers, once I've seen some interest in actually adding a vectorized implementation from the team.

gfoidl · 2020-07-21T16:27:38Z

src/libraries/Common/src/System/HexConverter.cs

+
+        private static ReadOnlySpan<byte> LowerHexLookupTable => new byte[]
+        {
+            0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66,


** not relevant yet -- only when SSE2 comes into play **

Had a quick look at HexMate, and saw that (as expected) the constants for AVX2 just doubles SSE2 in the upper lane.

So in order to save some (disk) space for the assembly, one could use this outline:

Vector128<byte> sseConstant = ReadVector(Sse2ContantData); Vector256<byte> avxConstant = Vector256.Create(sseConstant, sseConstant);

In essence it's about avoiding to have duplicated data in the "text-section" of the DLL.

Good point, I added your suggestion to the HexMate repo (here) 👍

I let this open as a reminder in case we get to SSE2 in this PR.

tkp1n · 2020-07-21T19:51:22Z

Tagging @bartonjs, @stephentoub and @GrabYourPitchforks to get a sense of the teams interest in this as you all contributed to the original issue (#17837) and/or the PR that recently added the public API (#37546).

danmoseley · 2020-08-06T06:32:58Z

The build errors are because some projects using HexConverter.cs do not reference S.R.Intrinsics and S.N.Vectors. @Anipik from a layering POV is it OK for those other projects to have references to those?

Anipik · 2020-08-10T18:25:32Z

@Anipik from a layering POV is it OK for those other projects to have references to those?

We should be fine with using S.R.Intrinsics.
But we will have to do case by case for S.N vectors as it depends on a system.Memory package reference, which i dont think we want to reference from the shared framework.

ghost · 2020-08-19T07:11:56Z

Tagging subscribers to this area: @tarekgh, @krwq
See info in area-owners.md if you want to be subscribed.

GrabYourPitchforks · 2020-12-01T20:29:35Z

Going through old inbox notifications and came across this. Does #44111 render this moot?

GrabYourPitchforks · 2021-01-25T17:07:00Z

Now that #44111 is in we should consider how to morph this PR to provide additional benefit. Since this PR focuses on AVX/AVX2, does it provide a significant performance boost over the SSSE3 code paths for the common cases of processing 16/32/64 bytes?

jeffhandley · 2021-04-18T05:36:42Z

@tkp1n Would you be able to take a look at the changes from #44111 and update this PR with what you think would be the best updated approach? Thanks!

tkp1n · 2021-04-18T07:31:10Z

@jeffhandley I‘ll look into it 👍

tkp1n · 2021-04-18T13:53:33Z

I've updated my encoding method locally and ran some benchmarks for 32 and 64 byte inputs. (The algorithm I'm proposing works with input buffers of >= 32 bytes for encoding and output buffers of >= 32 bytes for decoding hex characters)

It looks like we could achieve a > 2x improvement using AVX2.

Method	Size	Mean	Error	StdDev	Ratio
Ssse3	32	12.693 ns	0.0595 ns	0.0528 ns	1.00
Avx2	32	5.108 ns	0.1262 ns	0.1119 ns	0.40

Ssse3	64	16.871 ns	0.1416 ns	0.1105 ns	1.00
Avx2	64	7.450 ns	0.0756 ns	0.0670 ns	0.44

@GrabYourPitchforks, @jeffhandley Let me know if you think the speedup justifies maintaining yet another code path for encoding/decoding hex chars. If you think so, I'll gladly put in the work to update the PR accordingly.

tkp1n · 2021-04-18T20:30:42Z

"Converting" the current encoding algorithm to AVX2 allows us to encode input buffers of >= 8 bytes but at a slower speed:

Method	Size	Mean	Error	StdDev	Ratio	RatioSD
Ssse3	16	7.074 ns	0.0693 ns	0.0648 ns	1.00	0.00
Avx2ChunksOf8	16	5.122 ns	0.0371 ns	0.0310 ns	0.72	0.01

Ssse3	32	10.915 ns	0.1368 ns	0.1213 ns	1.00	0.00
Avx2ChunksOf8	32	7.222 ns	0.1629 ns	0.1811 ns	0.66	0.02

Ssse3	64	16.922 ns	0.1938 ns	0.1718 ns	1.00	0.00
Avx2ChunksOf8	64	12.028 ns	0.0490 ns	0.0434 ns	0.71	0.01

rcollette · 2021-04-28T00:42:58Z

It is a shame that the internal class HexConverter allows for specifying casing but but the public method Converter.ToHexString() does not.

One real-world example of where lower case must be used is:
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html

I know it's not directly related to this optimization, but hopeful that the optimization will handle lowercase and expose it.

danmoseley · 2021-04-28T01:09:01Z

@rcollette as far as I can tell, these perf changes do not break the contract of HexConverter.ToString that accepts the desired casing. If you believe that Converter.ToHexString(..) should have an overload to select casing, which seems a quite reasonable desire (given your example use case), it seems that can happen independently. The first step would be for you to create an API proposal here. If approved, we'd then need a volunteer to implement it.

GrabYourPitchforks · 2021-04-28T01:59:18Z

Thanks for the investigation! TBH the thing I'm struggling with is that this seems like a lot of code (+ extra review, maintenance, unit tests, etc.) for something that's on the expected high end going to save us around 10 ns per invocation. We don't tend to have frequent use of AVX2 intrinsics from within CoreLib, with the notable exceptions of two really hot code families: (a) strchr / memmov / memset; and (b) base64encode/decode, since these buffers tend to be arbitrarily long. If there were evidence that this savings would manifest in real-world applications, that'd help assuage my concerns.

But let's try for some extra perspectives here.

@tannergooding @EgorBo, since you both have experience in this code, do you think there are scenarios that could benefit from this being AVX2-enlightened? Please feel free to tell me that I'm being too grumpy and that some apps would see clear benefits from this. :)

I'm operating under the assumption that most apps which perform hex processing are doing so with fixed-length payloads, generally no more than 64 bytes (512 bits), and only occasionally with higher lengths. I'm also assuming that any workload which involves hex conversion is dominated by whatever processing takes place over the raw decoded binary buffer.

tannergooding · 2021-04-28T02:02:01Z

AVX2 (or more specifically Vector256<T>) support likely depends entirely on the average payload size. If we believe that is small, then V256<T> support is likely not beneficial and may actually be harmful due to additional branches or masking required to handle small payloads.

danmoseley · 2021-05-17T15:51:46Z

@EgorBo thoughts about @GrabYourPitchforks question above?

jeffhandley · 2021-06-11T23:29:25Z

@EgorBo it looks like we're still waiting for your comments on this.

jeffhandley · 2021-07-23T20:55:14Z

@GrabYourPitchforks This PR is assigned to you for follow-up/decision before the RC1 snap, but I've also assigned @EgorBo since we're awaiting their feedback.

jeffhandley · 2021-08-19T17:35:41Z

I'm updating this PR to target the 7.0.0 milestone as we were not able to finish the review and validation of this change before .NET 6.0 RC1 snapped. I will resolve the conflicts and push to the branch so that we can review as soon as time permits.

jeffhandley · 2021-10-09T06:17:44Z

I was finally able to take another look at this PR, the comment history, and another review of the code. I've decided, largely based on the comments from @GrabYourPitchforks (here) and @tannergooding (here), that we will close this PR without merging it, leaving the changes from #44111 in place.

@tkp1n Thanks for your effort on this and your patience with this PR being open as long as it was before we came to this conclusion.

Add vectorized implementation of hex encoding/decoding

9c30e8a

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 21, 2020

gfoidl reviewed Jul 21, 2020

View reviewed changes

Address initial review feedback

3ed41b5

gfoidl reviewed Jul 21, 2020

View reviewed changes

tkp1n marked this pull request as ready for review July 27, 2020 09:58

tkp1n mentioned this pull request Aug 6, 2020

[Perf -19%] System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort #39720

Closed

danmoseley requested a review from tannergooding August 6, 2020 06:28

tkp1n mentioned this pull request Aug 8, 2020

Fix perf regression in System.Net.NetworkInformation.PhysicalAddress #40571

Closed

sandreenko added area-System.Text.Encoding and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Aug 19, 2020

tarekgh added area-System.Runtime and removed area-System.Text.Encoding labels Aug 19, 2020

danmoseley mentioned this pull request Sep 3, 2020

[Perf -14%] System.Tests.Perf_UInt32.TryParseHex #39119

Closed

jeffhandley assigned GrabYourPitchforks Dec 7, 2020

Base automatically changed from master to main March 1, 2021 09:06

terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021

jeffhandley assigned EgorBo Jul 23, 2021

jeffhandley added this to the 7.0.0 milestone Aug 19, 2021

jeffhandley closed this Oct 9, 2021

determ1ne mentioned this pull request Oct 14, 2021

[API Proposal]: Add lowercase support for Convert.ToHexString #60393

Closed

ghost locked as resolved and limited conversation to collaborators Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vectorized implementation of hex encoding/decoding #39702

Add vectorized implementation of hex encoding/decoding #39702

tkp1n commented Jul 21, 2020

gfoidl left a comment

gfoidl commented Jul 21, 2020

tkp1n commented Jul 21, 2020

gfoidl Jul 21, 2020

tkp1n Jul 21, 2020

tkp1n commented Jul 21, 2020

danmoseley commented Aug 6, 2020

Anipik commented Aug 10, 2020

ghost commented Aug 19, 2020

GrabYourPitchforks commented Dec 1, 2020

GrabYourPitchforks commented Jan 25, 2021

jeffhandley commented Apr 18, 2021

tkp1n commented Apr 18, 2021

tkp1n commented Apr 18, 2021

tkp1n commented Apr 18, 2021

rcollette commented Apr 28, 2021

danmoseley commented Apr 28, 2021

GrabYourPitchforks commented Apr 28, 2021

tannergooding commented Apr 28, 2021

danmoseley commented May 17, 2021

jeffhandley commented Jun 11, 2021

jeffhandley commented Jul 23, 2021

jeffhandley commented Aug 19, 2021

jeffhandley commented Oct 9, 2021

Add vectorized implementation of hex encoding/decoding #39702

Add vectorized implementation of hex encoding/decoding #39702

Conversation

tkp1n commented Jul 21, 2020

History

Summary

Perf-Numbers

Review Input / Open Issues

gfoidl left a comment

Choose a reason for hiding this comment

gfoidl commented Jul 21, 2020

tkp1n commented Jul 21, 2020

gfoidl Jul 21, 2020

Choose a reason for hiding this comment

tkp1n Jul 21, 2020

Choose a reason for hiding this comment

tkp1n commented Jul 21, 2020

danmoseley commented Aug 6, 2020

Anipik commented Aug 10, 2020

ghost commented Aug 19, 2020

GrabYourPitchforks commented Dec 1, 2020

GrabYourPitchforks commented Jan 25, 2021

jeffhandley commented Apr 18, 2021

tkp1n commented Apr 18, 2021

tkp1n commented Apr 18, 2021

tkp1n commented Apr 18, 2021

rcollette commented Apr 28, 2021

danmoseley commented Apr 28, 2021

GrabYourPitchforks commented Apr 28, 2021

tannergooding commented Apr 28, 2021

danmoseley commented May 17, 2021

jeffhandley commented Jun 11, 2021

jeffhandley commented Jul 23, 2021

jeffhandley commented Aug 19, 2021

jeffhandley commented Oct 9, 2021