Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify provides the wrong ResolutionUnit for pyramid Tiff Files #1276

Merged
merged 4 commits into from
Sep 30, 2024

Conversation

mexthecat
Copy link
Contributor

Added space after %U in ImageMagick identify format prameter.

We work with pyramid Tiffs in our project. Since several images are combined in one file, identify provides all resolutions and resolution units one after the other. However, there is no space between the first and second image and the resolution unit is therefore read incorrectly.

ret = run(['identify', '-format', r'%[resolution.x] %[resolution.y] %U', img.filename], check=False, stderr=PIPE, stdout=PIPE)
...
tokens = ret.stdout.decode('utf-8').split(' ', 3)

['118.11', '118.11', 'PixelsPerCentimeter118.11', '118.11 PixelsPerCentimeter118.11 118.11 PixelsPerCentimeter118.11 118.11 PixelsPerCentimeter118.11 118.11 PixelsPerCentimeter']

A simple space after the %U should solve the problem

ret = run(['identify', '-format', r'%[resolution.x] %[resolution.y] %U ', img.filename], check=False, stderr=PIPE, stdout=PIPE)
...
tokens = ret.stdout.decode('utf-8').split(' ', 3)

['118.11', '118.11', 'PixelsPerCentimeter', '118.11 118.11 PixelsPerCentimeter 118.11 118.11 PixelsPerCentimeter 118.11 118.11 PixelsPerCentimeter 118.11 118.11 PixelsPerCentimeter ']

@bertsky
Copy link
Collaborator

bertsky commented Sep 27, 2024

Thanks a lot for the fix!

Do you happen to have an example image handy that we can use in the repo as a test case?

@mexthecat
Copy link
Contributor Author

Yes - have some images but can not upload them here -> tif not allowed.

@bertsky
Copy link
Collaborator

bertsky commented Sep 27, 2024

Could you try to zip them and attach that?

(Also, is the image unproblematic for publication, i.e. no licensing or other legal problems?)

@mexthecat
Copy link
Contributor Author

from https://archive.org/details/albumofindianfer00baynrich/albumofindianfer00baynrich/
0004.zip

@bertsky
Copy link
Collaborator

bertsky commented Sep 27, 2024

Turns out we already had an multi-frame TIFF example in the test assets – see 44deb80.

@kba what do you think, should we add 0004.tif as well? (It has cm instead of inches and should yield photometricInterpretation="YCBCR" but instead gives "RGB" because Pillow does not catch it and ImageMagick never gets asked about it.)

@kba kba changed the title Identify provides the worng ResolutionUnit for pyramid Tiff Files Identify provides the wrong ResolutionUnit for pyramid Tiff Files Sep 30, 2024
@kba kba merged commit c744dc8 into OCR-D:master Sep 30, 2024
22 checks passed
@@ -27,7 +27,10 @@
1457, 2084, 1, 1, 1, 'inches', 'RGB', None),
# tolerate multi-frame TIFF:
('gutachten/data/IMG/IMG_1.tif',
2088, 2634, 300, 300, 300, 'inches', 'RGB', 'raw')
2088, 2634, 300, 300, 300, 'inches', 'RGB', 'raw'),
# multi-frame TIFF with metric pixel density (is actually YCBCR not RGB but Pillow thinks otherwise...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Pillow just does not say anything. We could ask ImageMagick about it, though.

(However, %[EXIF:PhotometricInterpretation] does not yield anything... Not sure how to get that info.)

So IMO this test should state YCbCr and fail until we fix OcrdExif.run_identify to include the results from the CLI, or we find a way to make Pillow give us the info.

BTW, using Pillow's TiffTags we can actually get the right value:

# val = img.tag_v2.named()['PhotometricInterpretation']
val = img.tag_v2.get(262)
interpretation = dict((v, k) for k, v in TiffTags.TAGS_V2.get(262).enum.items())
val = interpretation[val]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants