Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode Jpeg into Rgb24 when pixel type not specified #1773

Merged
merged 29 commits into from
Oct 5, 2021

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Oct 3, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Finish the plan defined in #1410:

  • Remove packing from JpegColorConverters and utilize Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462 PixelOperations.PackFromRgbPlanes instead
  • Return Image<Rgb24> from non-generic Decode
  • ResizeProcessor<T>: omit alpha premultiplication if the pixel type has no alpha channel

Fixes #1410, fixes #1121.

Performance

Did usual run of LoadResizeSaveStressBenchmarks on 10 core desktop. The overall effect on speed is within noise range, but there might be some 1%-ish improvement. I assume the reason is that #1462's packing actually doesn't make big difference to the packing code introduced in #1411.

But memory usage dropped significantly, which is the main goal here.

Before

|        Method | maxDegreeOfParallelism |       Mean |       Error |   StdDev | Ratio | RatioSD |     Gen 0 |     Gen 1 |     Gen 2 |    Allocated |
|-------------- |----------------------- |-----------:|------------:|---------:|------:|--------:|----------:|----------:|----------:|-------------:|
| SystemDrawing |                      1 | 8,711.9 ms |   862.64 ms | 47.28 ms |  1.00 |    0.00 |         - |         - |         - |        25 KB |
|    ImageSharp |                      1 | 5,218.4 ms |   135.62 ms |  7.43 ms |  0.60 |    0.00 | 3000.0000 | 3000.0000 | 3000.0000 | 1,226,257 KB |
|               |                        |            |             |          |       |         |           |           |           |              |
| SystemDrawing |                      5 | 3,128.0 ms | 1,188.61 ms | 65.15 ms |  1.00 |    0.00 |         - |         - |         - |        26 KB |
|    ImageSharp |                      5 | 1,516.9 ms |   782.02 ms | 42.86 ms |  0.49 |    0.02 | 2000.0000 | 2000.0000 | 2000.0000 | 1,226,402 KB |
|               |                        |            |             |          |       |         |           |           |           |              |
| SystemDrawing |                     10 | 2,828.7 ms |   786.61 ms | 43.12 ms |  1.00 |    0.00 |         - |         - |         - |        29 KB |
|    ImageSharp |                     10 |   823.4 ms |    80.49 ms |  4.41 ms |  0.29 |    0.01 | 1000.0000 | 1000.0000 | 1000.0000 | 1,229,766 KB |
|               |                        |            |             |          |       |         |           |           |           |              |
| SystemDrawing |                     20 | 2,793.0 ms |   735.65 ms | 40.32 ms |  1.00 |    0.00 |         - |         - |         - |        30 KB |
|    ImageSharp |                     20 |   686.0 ms |   570.78 ms | 31.29 ms |  0.25 |    0.01 | 1000.0000 | 1000.0000 | 1000.0000 | 1,237,201 KB |

After

|        Method | maxDegreeOfParallelism |       Mean |      Error |   StdDev | Ratio | RatioSD |     Gen 0 |     Gen 1 |     Gen 2 |  Allocated |
|-------------- |----------------------- |-----------:|-----------:|---------:|------:|--------:|----------:|----------:|----------:|-----------:|
| SystemDrawing |                      1 | 8,757.5 ms | 1,109.5 ms | 60.81 ms |  1.00 |    0.00 |         - |         - |         - |      25 KB |
|    ImageSharp |                      1 | 4,868.9 ms |   258.8 ms | 14.18 ms |  0.56 |    0.00 | 2000.0000 | 2000.0000 | 2000.0000 | 776,094 KB |
|               |                        |            |            |          |       |         |           |           |           |            |
| SystemDrawing |                      5 | 3,222.5 ms |   913.8 ms | 50.09 ms |  1.00 |    0.00 |         - |         - |         - |      27 KB |
|    ImageSharp |                      5 | 1,397.8 ms |   624.2 ms | 34.21 ms |  0.43 |    0.02 |         - |         - |         - | 776,093 KB |
|               |                        |            |            |          |       |         |           |           |           |            |
| SystemDrawing |                     10 | 2,849.4 ms | 1,590.1 ms | 87.16 ms |  1.00 |    0.00 |         - |         - |         - |      29 KB |
|    ImageSharp |                     10 |   852.6 ms |   576.0 ms | 31.57 ms |  0.30 |    0.02 | 1000.0000 | 1000.0000 | 1000.0000 | 843,968 KB |
|               |                        |            |            |          |       |         |           |           |           |            |
| SystemDrawing |                     20 | 2,764.5 ms | 1,152.1 ms | 63.15 ms |  1.00 |    0.00 |         - |         - |         - |      30 KB |
|    ImageSharp |                     20 |   622.7 ms |   475.5 ms | 26.07 ms |  0.23 |    0.01 | 1000.0000 | 1000.0000 | 1000.0000 | 949,029 KB |

/cc @br3aker @JimBobSquarePants @saucecontrol

@antonfirsov antonfirsov added breaking Signifies a binary breaking change. formats:jpeg labels Oct 3, 2021
@@ -40,23 +43,20 @@ public SpectralConverter(Configuration configuration, CancellationToken cancella

private bool Converted => this.pixelRowCounter >= this.pixelBuffer.Height;

public Buffer2D<TPixel> PixelBuffer
public Buffer2D<TPixel> GetPixelBuffer()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is heavy lazy initialization, with side effects like CancellationToken firing. It's better to expose such stuff from a method than a property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like this property approach with a lot of lazy stuff behind from the start but couldn't come up with anything better. There's a lot of confusing bits of code here and there from my pr, I will work on decoder performance/refactoring next as I'm done with the encoder for now.

@codecov
Copy link

codecov bot commented Oct 3, 2021

Codecov Report

Merging #1773 (4323c8d) into master (2410a37) will increase coverage by 0.00%.
The diff coverage is 87.87%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #1773    +/-   ##
========================================
  Coverage   84.49%   84.49%            
========================================
  Files         849      849            
  Lines       37412    37252   -160     
  Branches     4376     4385     +9     
========================================
- Hits        31611    31477   -134     
+ Misses       4947     4916    -31     
- Partials      854      859     +5     
Flag Coverage Δ
unittests 84.49% <87.87%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...mageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs 99.13% <ø> (ø)
...rConverters/JpegColorConverter.FromYCbCrVector4.cs 7.31% <0.00%> (+0.17%) ⬆️
...rmats/Jpeg/Components/Decoder/HuffmanScanBuffer.cs 94.73% <ø> (ø)
...ImageSharp/PixelFormats/Utils/Vector4Converters.cs 100.00% <ø> (ø)
...ssors/Transforms/Resize/ResizeProcessor{TPixel}.cs 98.38% <33.33%> (-1.62%) ⬇️
...mory/DiscontiguousBuffers/MemoryGroupExtensions.cs 89.81% <70.00%> (+0.70%) ⬆️
...JpegColorConverter.VectorizedJpegColorConverter.cs 66.66% <71.42%> (-8.34%) ⬇️
...ents/Decoder/ColorConverters/JpegColorConverter.cs 94.02% <71.42%> (+19.20%) ⬆️
...onverters/JpegColorConverter.FromGrayScaleBasic.cs 87.50% <84.61%> (-12.50%) ⬇️
...ColorConverters/JpegColorConverter.FromCmykAvx2.cs 100.00% <100.00%> (ø)
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2410a37...4323c8d. Read the comment docs.

@JimBobSquarePants
Copy link
Member

A lot of reading here. Will try to review tomorrow night

Copy link
Contributor

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick look over the code, not detailed review.

Vector256<int> vcontrol = Unsafe.As<byte, Vector256<int>>(ref control);

int n = result.Length / 8;
int n = values.Component0.Length / 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int n = values.Component0.Length / 8;
int n = (int)((uint)values.Component0.Length / 8);

So the JIT will emit a bit-hack division instead of the (signed) integer division by constant.


var scale = new Vector<float>(1 / this.MaximumValue);

// Walking 8 elements at one step:
int n = result.Length / 8;
int n = values.Component0.Length / 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int n = values.Component0.Length / 8;
int n = (int)((uint)values.Component0.Length / 8);

Copy link
Member Author

@antonfirsov antonfirsov Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this trick is only worth it when we are inside a tight loop or, or in a method that is expected to be called from a tight loop. In other cases I prefer readability.

Copy link
Member Author

@antonfirsov antonfirsov Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it possible to achieve the same in a simpler way nint? I changed my code everywhere to nint n = values.Component0.Length / 8; because n is only used for the i < n comparison, and I like it better when the types match explicitly.

Copy link
Contributor

@gfoidl gfoidl Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should go with readability, as you did now with nint. Codegen isn't ideal, but this pattern {array, span}.Length / ConstantOfPowerOfTwo is so common, that the JIT should be able to optimize this, so we don't need any hacks.
I'll have a look if there's a JIT-issue, otherwise create one.

Edit: filed dotnet/runtime#59922

ref Vector<float> ggRefAsVector = ref Unsafe.As<Vector4Pair, Vector<float>>(ref gg);
ref Vector<float> bbRefAsVector = ref Unsafe.As<Vector4Pair, Vector<float>>(ref bb);

int n = values.Component0.Length / 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int n = values.Component0.Length / 8;
int n = (int)((uint)values.Component0.Length / 8);

or maybe create a helper for this? The casts aren't very intuitive, but codegen is quite a difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth it, see #1773 (comment)

Comment on lines 47 to 48
int n = values.Component0.Length / 8;
for (int i = 0; i < n; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int n = values.Component0.Length / 8;
for (int i = 0; i < n; i++)
int n = (int)((uint)values.Component0.Length / 8);
for (nint i = 0; i < n; i++)

Now I'll stop commenting on these -- otherwise it's too much noise generated by me 😉

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe how much easier the code is to read now! I did a few readthroughs just to make sure I wasn't missing something 😄

Just @gfoidl last suggestion and it's good to go. 👍

@JimBobSquarePants JimBobSquarePants merged commit d81c511 into master Oct 5, 2021
@JimBobSquarePants JimBobSquarePants deleted the af/jpeg-pack-sandbox branch October 5, 2021 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Signifies a binary breaking change. formats:jpeg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-generic Image.Load should decode Jpeg into Image<Rgb24> Speed up JPEG Decoder color conversion
4 participants