Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sse2 version of select #1804

Merged
merged 3 commits into from
Nov 2, 2021
Merged

Add sse2 version of select #1804

merged 3 commits into from
Nov 2, 2021

Conversation

brianpopow
Copy link
Collaborator

@brianpopow brianpopow commented Nov 1, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This adds a SSE2 version of the method Select(), which is used during lossless encoding. Its a bit faster, but its very specific to the image which is encoded, since this only gets used in Predictor11

TODO:

  • Add tests

Before:
select

After with sse2:
select_sse

@brianpopow brianpopow linked an issue Nov 1, 2021 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Nov 1, 2021

Codecov Report

Merging #1804 (143de22) into master (d021222) will decrease coverage by 0.00%.
The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1804      +/-   ##
==========================================
- Coverage   87.10%   87.09%   -0.01%     
==========================================
  Files         936      936              
  Lines       47832    47855      +23     
  Branches     6009     6011       +2     
==========================================
+ Hits        41662    41681      +19     
- Misses       5178     5180       +2     
- Partials      992      994       +2     
Flag Coverage Δ
unittests 87.09% <87.50%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ageSharp/Formats/Webp/Lossless/PredictorEncoder.cs 89.10% <72.72%> (+0.27%) ⬆️
.../ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs 88.43% <93.10%> (+0.21%) ⬆️
...mageSharp/Formats/Webp/Lossless/NearLosslessEnc.cs 88.67% <0.00%> (-7.55%) ⬇️
...rc/ImageSharp/Formats/Webp/Lossless/Vp8LEncoder.cs 97.37% <0.00%> (-0.13%) ⬇️
...ImageSharp/Formats/Webp/Lossless/Vp8LBitEntropy.cs 100.00% <0.00%> (+1.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d021222...143de22. Read the comment docs.

Vector128<byte> cb0 = Sse2.SubtractSaturate(c0, b0);
Vector128<byte> ac = Sse2.Or(ac0, ca0);
Vector128<byte> bc = Sse2.Or(bc0, cb0);
Vector128<byte> pa = Sse2.UnpackLow(ac, Zero); // |a - c|
Copy link
Contributor

@br3aker br3aker Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While vector creation on pre .net6 is compiled to a really bad sequence of scalar sets, explicit Vector128<T>.Zero compiles to a xor command which would be a little clearer imo. Plus it might be a very tiny bit faster than static variable read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @br3aker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: default for the vector creates the same code as Vector128<T>.Zero

(I prefer Vector128.Zero).

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@brianpopow brianpopow changed the title WIP: Add sse2 version of select Add sse2 version of select Nov 2, 2021
@brianpopow brianpopow merged commit 49bd35c into master Nov 2, 2021
@brianpopow brianpopow deleted the bp/selectsse2 branch November 2, 2021 10:24
Sse2.Store((ushort*)p, diff);
}

int paMinusPb = output[0] + output[1] + output[2] + output[3];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put this into the fixed-block and access it via the pointer to avoid bound checks*.
If output would be too small, then there's a bug somewhere else 😉 (fortunately there's none).

* or reverse the order to read output[3] first, then [2], ... thant there's only one bound-check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, too late....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or reverse the order to read output[3] first, then [2], ... thant there's only one bound-check.

ah yeah, always forget about that trick, thx. Will do with a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put it (incl. the return) within the fixed block here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WebP - Improve performance
4 participants