Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement Tiff codec #119

Merged
merged 77 commits into from
Dec 10, 2018

Conversation

Andy-Wilkinson
Copy link
Contributor

@Andy-Wilkinson Andy-Wilkinson commented Mar 2, 2017

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practise as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This pull request implements (or will implement when complete) a TIFF decoder and encoder for ImageSharp. A new library and NuGet package will be introduced (ImageSharp.Formats.Tiff) in line with the existing formats.

Feature Checklist

  • Reading TIFF file structure
  • Parsing of relevant metadata
  • Extraction of image data blocks
  • Construction of image (from strip data)
  • Construction of image (from tile data)

Compression Types

  • None
  • Ccitt1D
  • PackBits
  • CcittGroup3Fax
  • CcittGroup4Fax
  • Lzw
  • Old Jpeg
  • Jpeg (Technote 2)
  • Deflate (Technote 2)
  • Old Deflate (Technote 2)

Photometric Interpretation Formats

  • WhiteIsZero
  • BlackIsZero
  • Rgb (Chunky)
  • Rgb (Planar)
  • PaletteColor
  • TransparencyMask
  • Separated (TIFF Extension color spaces)
  • YCbCr (TIFF Extension color spaces)
  • CieLab (TIFF Extension color spaces)
  • IccLab (TechNote 1)

@codecov-io
Copy link

codecov-io commented Mar 2, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@dce781d). Click here to learn what that means.
The diff coverage is 93.94%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #119   +/-   ##
=========================================
  Coverage          ?   89.59%           
=========================================
  Files             ?     1038           
  Lines             ?    45854           
  Branches          ?     3255           
=========================================
  Hits              ?    41081           
  Misses            ?     4058           
  Partials          ?      715
Impacted Files Coverage Δ
...ts/PixelBlenders/DefaultPixelBlenders.Generated.cs 77.77% <ø> (ø)
src/ImageSharp/Formats/Tiff/TiffEncoder.cs 0% <0%> (ø)
...ImageSharp/Formats/Tiff/TiffConfigurationModule.cs 0% <0%> (ø)
src/ImageSharp/Formats/Tiff/ImageExtensions.cs 0% <0%> (ø)
...PhotometricInterpretation/BlackIsZero8TiffColor.cs 100% <100%> (ø)
...PhotometricInterpretation/WhiteIsZero1TiffColor.cs 100% <100%> (ø)
...PhotometricInterpretation/BlackIsZero4TiffColor.cs 100% <100%> (ø)
src/ImageSharp/Formats/Tiff/TiffIfd/TiffIfd.cs 100% <100%> (ø)
...rc/ImageSharp/PixelFormats/PixelBlender{TPixel}.cs 100% <100%> (ø)
...s/Tiff/Compression/PackBitsTiffCompressionTests.cs 100% <100%> (ø)
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dce781d...a8d25ef. Read the comment docs.

@Andy-Wilkinson
Copy link
Contributor Author

Just seeding this [WIP] PR with the initial project structure. A bit of a mutant at the moment as I've only got the dotnet-preview4 tooling (.csproj based) on my dev PC at the moment, but this should all converge well before this gets merged. I've also introduced a separate unit testing project for rapid TDD, but this can be combined into the main test project before merging.

Quick comment on the TiffGen code in the unit testing project - this is a set of classes for generating in-memory TIFF files for unit testing purposes. They are designed to be simple rather than performant (and therefore hopefully correct!), easily generate varied TIFF structures (including invalid TIFF files to show that the decoder correctly handles/throws on these) and allow checking that the decoder handles padding/oddly ordered blocks/etc.

@JimBobSquarePants
Copy link
Member

Great to see this opened!


public TiffIfd ReadIfd(uint offset)
{
InputStream.Seek(offset, SeekOrigin.Begin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't assume that the TIFF image starts at the beginning of the stream. (Or at least we don't do this in other codecs.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we can't assume that. Someone could have multiple images in a stream or other information. Best we can do is look for the identifier at the position we are currently at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... The easy fix is to store the offset that we start at, and do all Seeks relative to this, i.e. InputStream.Seek(StartOffset + offset, SeekOrigin.Begin).

I'd really love to get rid of the Seeks entirely, but unfortunately the TIFF file-format relies heavily on referencing offsets within the file. There is no way you can tell what the next bit of data is otherwise. As an example a file layout could be,

0-8 -> TIFF file header (8 bytes) - references first IFD at offset 30000
8-10000 -> Raw image data here
30000-... -> IFD (basically an image header) - references image data at offset 8

Until you read the IFD at offset 30000, you have no way of telling what data is at offset 8 - it could be image data in an unknown compression/format, another IFD, strings from the metadata. The format doesn't have any identifiers to let you know until you have read the IFD itself.

Any clever ideas to avoid Seeks would be great though! 😄

throw new ImageFormatException($"A value of type '{entry.Type}' cannot be converted to a SignedRational.");

byte[] bytes = ReadBytes(ref entry);
SignedRational[] result = new SignedRational[entry.Count];
Copy link
Member

@antonfirsov antonfirsov Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to do buffered read without allocating new arrays?
You can store them as pre-allocated members, if the arrays are expected to have a known upper limit. If the limit is not known and/or or the arrays are too big, we should introduce the following utility class to reuse buffers:

class AutoGrowBuffer<T> : IDisposable
{
    private PinnedBuffer<T> currentBuffer;
    public AutoGrowBuffer(int initialSize)
    {
        this.currentBuffer = new PinnedBuffer<T>(initialSize);
    }

    // BufferSpan<T> should be very similar to BufferPointer<T> (or a replacement!), having a Length property.  I will implement it, if we agree on the design.
public BufferSpan<T> GetBufferSpan(int length)
    {
        if (this.currentBuffer.Array.Length < length)
        {
            this.currentBuffer.Dispose();
            this.currentBuffer = new PinnedBuffer<T>(length);
        }
        return new BufferSpan<T>(this.currentBuffer, length);
    } 
  
   public void Dispose()
   {
       this.currentBuffer .Dispose();
   }
}

// USAGE:
private AutoGrowBuffer<SignedRational> signedRationalBuffer;

// BufferSpan<T> should be very similar to BufferPointer<T> (or a replacement!), having a Length property.  I will implement it, if we agree on the design.
public BufferSpan<SignedRational> ReadSignedRationalSpan(ref TiffIfdEntry entry)
{
    ....
    BufferSpan<SignedRational> result = this.signedRationalBuffer.GetBufferSpan(entry.Count);
    ....
}

/CC @JimBobSquarePants

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used known max values in the past. That would be my favored approach

https://github.com/JimBobSquarePants/ImageSharp/blob/0b5396d0df399dc0eb7968d1479a3e93378de1aa/src/ImageSharp.Formats.Png/PngDecoderCore.cs#L50

For additional larger values I've rented/returned the buffer and made sure I was careful reading them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like that class though. Clever idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a place for Span-like stuff. Better encapsulation for data-chunk operations without unecessary allocations! I think I need to turn my 'BufferPointer' into 'BufferSpan'. It will be almost System.Span compatible then!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great! 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea... There's definitely a place for sharing buffers in managing the performance of the existing code. I'm not sure how much benefit there will be for SignedRational arrays - they are pretty rare in TIFF files and I included them mainly for completeness. I'll get more of a feel for this as I try out a number of sample files.

On the other hand - I'm allocating a new byte[] in the GetBytes(...) method that should really be allocated once and reused. The returned byte[] doesn't even need to be the correct length, so I can make a good guess of the maximum likely size, and revert to a new byte[] only if there is a need to exceed this... I'll look into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! We dont need to bother with uncommon cases, I just wanted to give some general hints based on the code :)


private Int32 ToInt32(byte[] bytes, int offset)
{
if (IsLittleEndian)
Copy link
Member

@antonfirsov antonfirsov Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JimBobSquarePants has introduced several classes to deal with endianness in streams. Shouldn't we use them in Tiff decoder too?
Their implementations definitely need some optimization by eliminating super-frequent virtual calls, but we could follow an "optimize once, win everywhere" approach by using them!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this to be a thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a definitely a good rationale for centralising this logic. I'll see if I can do this in the future once I've got the main features working.


[Theory]
[MemberData(nameof(IsLittleEndianValues))]
public void ReadHeader_ReadsEndianness(bool isLittleEndian)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely <3 this approach in testing!
I wish all our codecs were under this level of unit testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, great stuff. The others really should be. We should add a chore tag for issues so we can do this 😝

@antonfirsov
Copy link
Member

@Andy-Wilkinson I wanted to try the code, but I couln't find a working solution.
@JimBobSquarePants will finish switching all our stuff in main branch to VS2017 soon. You can integrate your work better then :)

@Andy-Wilkinson
Copy link
Contributor Author

Yeah - Sorry about that @antonfirsov . It is a bit of a mutant project at the with a mix of the project.json/.csproj approaches. I see that @JimBobSquarePants has merged the VS2017 branch, so I'll try to get everything consistent soon!

@JimBobSquarePants
Copy link
Member

@Andy-Wilkinson Hoping the merge makes things a lot easier for you!

@Andy-Wilkinson
Copy link
Contributor Author

Thanks @JimBobSquarePants . The VS2017 merge has allowed me to bring everything together into the main ImageSharp solution. Just wishing now that I'd enabled StyleCop and Doc comments from the start! 😉 Almost there with fixing the warnings, and everything will be right going forward.

@JimBobSquarePants JimBobSquarePants mentioned this pull request Aug 3, 2017
25 tasks
ajryan referenced this pull request in ajryan/tesseract Dec 30, 2017
@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Feb 8, 2018

@Andy-Wilkinson I've had a go at fixing those merge conflicts for you but I could have made a mistake. Please let me know I've I've broken things for you.

@CLAassistant
Copy link

CLAassistant commented Feb 8, 2018

CLA assistant check
All committers have signed the CLA.

@JimBobSquarePants
Copy link
Member

Hi @Andy-Wilkinson

I managed to update your fork with the latest code from our master with all test passing.

I had a look at the effort, so far.. It's absolutely incredible! 👍

The API is solidifying now so we should be able to make the effort to help you out a lot more to complete this (We should have helped more earlier). There's a lot of new performance goodies on offer within the codebase that we can apply in some places (memory management for example).

I'd love to get the ball rolling again so let me know when you have some free time.

@JimBobSquarePants JimBobSquarePants added this to the Future milestone Mar 21, 2018
@JimBobSquarePants
Copy link
Member

Just trying to keep this so that it builds and doesn't get too out of date.

@JimBobSquarePants
Copy link
Member

Hi @Andy-Wilkinson

Just dropping you a message to see whether this PR is something that that you will be able to continue with or whether we should merge it now into our own separate branch to continue working on? Ideally we would love to have your help but if that's not possible that's totally cool also.

Additionally would it be possible to re-sign the CLA? We lost a lot of signatures by accident when we globalized the tool across all our repositories.

Cheers

James

@Andy-Wilkinson
Copy link
Contributor Author

Hi @JimBobSquarePants

Apologies that there's not been much progress recently - to be honest, I've not had time for much dev work so I've tended to drop in and out of my own experimental mini-projects when I get chance. The TIFF codec is going to require more of a concerted effort so it is probably best if you merge into a separate branch. It will open it up to multiple contributors to provide smaller PRs (I might get time for small chunks too).

Cheers,
Andy

PS. I've signed the CLA.

@JimBobSquarePants JimBobSquarePants changed the base branch from master to tiff-codec December 10, 2018 05:09
@JimBobSquarePants
Copy link
Member

Hi @Andy-Wilkinson

Thanks for the update, I appreciate it and all your amazing efforts so far.

I'll merge this in as-is then into a new tiff-codec branch which will then allow other contributors to help out.

@JimBobSquarePants JimBobSquarePants merged commit a61e83a into SixLabors:tiff-codec Dec 10, 2018
JimBobSquarePants added a commit that referenced this pull request Feb 17, 2021
JimBobSquarePants added a commit that referenced this pull request Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants