Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PNG: Add SSE/AVX version of Sub, Up, Average and Paethfilters #2028

Merged
merged 23 commits into from
Feb 28, 2022

Conversation

brianpopow
Copy link
Collaborator

@brianpopow brianpopow commented Feb 23, 2022

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR adds a SSE2 version of the average filter (4 bytes per pixel only), which is used for decoding PNG's.
I tried a SSE2 version for 3 bytes per pixel also, but benchmarks did not show god results. Probably due to misaligned read/write.

edit:
PR now also includes SSE/AVX versions of Paeth, Sub and Up filter.

Benchmark results:

master:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
  Job-OGQEMG : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
|                      Method |     Mean |     Error |    StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |---------:|----------:|----------:|------:|------:|------:|----------:|
| 'Average-filtered PNG file' | 2.626 ms | 0.0179 ms | 0.0046 ms |     - |     - |     - |      3 KB |

PR:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
  Job-OGQEMG : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT

Runtime=.NET 5.0  Arguments=/p:DebugType=portable  IterationCount=5
LaunchCount=1  WarmupCount=3

|                             Method |        Mean |     Error |   StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------- |------------:|----------:|---------:|------:|--------:|-------:|------:|------:|----------:|
| 'Average-filtered PNG file (4bpp)' | 2,212.37 us | 14.979 us | 3.890 us | 55.63 |    5.27 |      - |     - |     - |      3 KB |

Testimage was:
AverageFilter4Bpp

@brianpopow brianpopow changed the title PNG: Add SSE2 version of average filter WIP: PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow brianpopow changed the title WIP: PNG: Add SSE2 version of average filter PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow brianpopow changed the title PNG: Add SSE2 version of average filter WIP: PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow brianpopow changed the title WIP: PNG: Add SSE2 version of average filter PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow
Copy link
Collaborator Author

@gfoidl thanks for the suggestions!

@brianpopow
Copy link
Collaborator Author

I have added SSE/AVX versions of the other filters, too.

Benchmark with random data of size 1024:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
  DefaultJob : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT


|      Method |        Mean |     Error |    StdDev |
|------------ |------------:|----------:|----------:|
|    UpScalar |   525.50 ns |  2.733 ns |  2.557 ns |
|      UpSse2 |    36.66 ns |  0.160 ns |  0.150 ns |
|      UpAvx2 |    19.11 ns |  0.074 ns |  0.069 ns |
|   SubScalar |   654.48 ns |  3.637 ns |  3.224 ns |
|     SubSse2 |   172.30 ns |  2.738 ns |  2.561 ns |
| PaethScalar | 6,643.09 ns | 23.076 ns | 21.585 ns |
|   PaethSse2 |   674.23 ns |  2.468 ns |  2.309 ns |

@brianpopow brianpopow changed the title PNG: Add SSE2 version of average filter PNG: Add SSE/AVX version of Sub, Up, Average and Paethfilters Feb 25, 2022
@brianpopow brianpopow requested a review from a team February 25, 2022 14:28
Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

@JimBobSquarePants JimBobSquarePants added this to the 2.*.* milestone Feb 28, 2022
@JimBobSquarePants JimBobSquarePants merged commit 2f75823 into main Feb 28, 2022
@JimBobSquarePants JimBobSquarePants deleted the bp/pngavgsse2 branch February 28, 2022 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants