Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace.image_from_*: allow filtering by filename #845

Merged
merged 1 commit into from
May 3, 2022

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Apr 28, 2022

Motivation for this: since we have deprecated Workspace.resolve_image_as_pil (because it cannot calculate the coordinate transform that would belong to the image; and we don't want processors to search for AlternativeImages from scratch), we lost a way to deal with the use-case that the caller does not want to create new images dynamically, but instead already knows the desired AlternativeImage (and therefore its path).

Here we therefore (re)introduce that possibility with a kwarg filename for the image retrieval functions. (So the difference is that now you have to pass an image path that belongs to an existing AlternativeImage, but then you do get the coordinate transform along with the image.)

@kba I can see no harm here and have already tested locally.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and since it's just an additional filter, I see no issue with compatibility.

@kba kba merged commit 0822708 into OCR-D:master May 3, 2022
@@ -688,6 +695,10 @@ def image_from_page(self, page, page_id,
fill=fill, transparency=transparency)

# verify constraints again:
if filename and not getattr(page_image, 'filename', '').endswith(filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

page_image.filename is always empty for me, so this check always fails

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

page_image.filename is always empty for me, so this check always fails

then that AlternativeImage is not maximal, i.e. there are operations (derotation, cropping) that are possible from the description of the segment but have not been applied to the image yet – so the API does it automatically before returning an image. Any PIL.Image operation will invalidate the filename attribute, though.

Can you try retrieving the image with additional filters like feature_filter=deskewed?

Could you share the snipped of the segment/page that shows the AlternativeImage elements with their comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you're right, I think the auto_feature 'cropped' was applied to the image, so its not an original image with an original filename anymore. Didn't expect that, but it makes perfectly sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's somewhat unexpected, but we have no other choice from the API point of view. Other callers would be surprised to not get an image that corresponds with the features of the segment.

But we could clarify that in the docstring...

@bertsky bertsky deleted the workspace-altimg-retrieve-existing branch June 6, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants