Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review CLI terminal output #93

Closed
mikegerber opened this issue Oct 12, 2023 · 8 comments
Closed

Review CLI terminal output #93

mikegerber opened this issue Oct 12, 2023 · 8 comments
Assignees
Labels
bug Something isn't working maintenance

Comments

@mikegerber
Copy link
Collaborator

Using https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip:

❯ ocrd-calamari-recognize -I OCR-D-SEG-LINE-SBB -O TEST
Checkpoint version 2 is up-to-date.
None
Checkpoint version 2 is up-to-date.
None
Checkpoint version 2 is up-to-date.
None
Checkpoint version 2 is up-to-date.
None
Checkpoint version 2 is up-to-date.
None
20:30:14.000 INFO ocrd.workspace.download_file - 'local_filename' OCR-D-SEG-LINE-SBB/OCR-D-SEG-LINE-SBB_00000024.xml already within /home/b-mg106/devel/ocrd_calamari/actevedef_718448162.first-page+binarization+segmentation, nothing to do
INFO:ocrd.workspace.download_file:'local_filename' OCR-D-SEG-LINE-SBB/OCR-D-SEG-LINE-SBB_00000024.xml already within /home/b-mg106/devel/ocrd_calamari/actevedef_718448162.first-page+binarization+segmentation, nothing to do
WARNING:processor.CalamariRecognize:Our own line text is not the same as Calamari's: '(nach Veranlaſſung 5. 16. 17. 18. 9.) vor einen Iaimicum angeben muͤße, woferne jedoch annoch ein Pro⸗' != '(nach Veranlaſſung . 16. 17. 18. 9.) vor einen Iaimicum angeben muͤße, woferne jedoch annoch ein Pro⸗'
20:30:43.300 WARNING ocrd.utils.crop_image - crop coordinates ((0, 0, 299, 69)) exceed image (297x65)
WARNING:ocrd.utils.crop_image:crop coordinates ((0, 0, 299, 69)) exceed image (297x65)
20:30:48.671 WARNING ocrd.utils.crop_image - crop coordinates ((55, 0, 1979, 79)) exceed image (1976x2039)
WARNING:ocrd.utils.crop_image:crop coordinates ((55, 0, 1979, 79)) exceed image (1976x2039)
20:30:49.547 WARNING ocrd.utils.crop_image - crop coordinates ((420, 1967, 1437, 2041)) exceed image (1976x2039)
WARNING:ocrd.utils.crop_image:crop coordinates ((420, 1967, 1437, 2041)) exceed image (1976x2039)
WARNING:processor.CalamariRecognize:Our own line text is not the same as Calamari's: 'iL. 2. de concuſlt l. 1. de:L. Cornel. de fall' != 'L. 2. de concuſlt l. 1. de:L. Cornel. de fall'
WARNING:processor.CalamariRecognize:Our own line text is not the same as Calamari's: 'muͤßte beleget werden, welche dann oben (§. 3 ) geſagter maſſen die Straffe der Enthauptung iſt / wie viel⸗' != 'muͤßte beleget werden, welche dann oben (§. 3) geſagter maſſen die Straffe der Enthauptung iſt / wie viel⸗'
20:31:21.391 WARNING ocrd.utils.crop_image - crop coordinates ((0, 0, 98, 86)) exceed image (94x81)
WARNING:ocrd.utils.crop_image:crop coordinates ((0, 0, 98, 86)) exceed image (94x81)
20:31:21.556 WARNING ocrd.utils.crop_image - crop coordinates ((0, 0, 550, 151)) exceed image (545x111)
WARNING:ocrd.utils.crop_image:crop coordinates ((0, 0, 550, 151)) exceed image (545x111)
WARNING:processor.CalamariRecognize:Our own line text is not the same as Calamari's: '8   0)' != '8 0)'
20:31:21.840 INFO ocrd.process.profile - Executing processor 'ocrd-calamari-recognize' took 67.841698s (wall) 155.750000s (CPU)( [--input-file-grp='OCR-D-SEG-LINE-SBB' --output-file-grp='TEST' --parameter='{"checkpoint_dir": "qurator-gt4histocr-1.0", "voter": "confidence_voter_default_ctc", "textequiv_level": "line", "glyph_conf_cutoff": 0.001}' --page-id='']
INFO:ocrd.process.profile:Executing processor 'ocrd-calamari-recognize' took 67.841698s (wall) 155.750000s (CPU)( [--input-file-grp='OCR-D-SEG-LINE-SBB' --output-file-grp='TEST' --parameter='{"checkpoint_dir": "qurator-gt4histocr-1.0", "voter": "confidence_voter_default_ctc", "textequiv_level": "line", "glyph_conf_cutoff": 0.001}' --page-id='']
@mikegerber
Copy link
Collaborator Author

The None lines are definitely unwanted, therefore labelling as bug.

@mikegerber
Copy link
Collaborator Author

The None lines are definitely unwanted, therefore labelling as bug.

I've submitted a PR upstream fixing this: Calamari-OCR/calamari#350

@mikegerber
Copy link
Collaborator Author

The example above doesn't show the other issue I see:

❯ ocrd-calamari-recognize -P checkpoint_dir qurator-gt4histocr-1.0 -I OCR-D-SEG-LINE-SBB -O OCR-D-OCR-CALAMARI --overwrite
/home/b-mg106/.pyenv/versions/3.10.12/envs/tmp.ocrd_calamari.2023-10-25.check-output/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
  machar = _get_machar(dtype)
Checkpoint version 2 is up-to-date.
Checkpoint version 2 is up-to-date.
[...]

@mikegerber
Copy link
Collaborator Author

❯ python test-numpy.py
/home/b-mg106/.pyenv/versions/tmp.ocrd_calamari.2023-10-25.check-output/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
This warnings indicates broken support for the dtype!
  machar = _get_machar(dtype)
Machine parameters for float128
---------------------------------------------------------------
precision =  15   resolution = 1.0000000000000002749e-15
machep =    -52   eps =        2.2204460492503130808e-16
negep =     -53   epsneg =     1.1102230246251565404e-16
minexp = -16382   tiny =       3.3621031431120935063e-4932
maxexp =  16384   max =        1.189731495357231633e+4932
nexp =       15   min =        -max
smallest_normal = 3.3621031431120935063e-4932   smallest_subnormal = 7.465e-4948
---------------------------------------------------------------
❯ cat test-numpy.py
import numpy as np

print(np.finfo(np.longdouble))

This does not seem to be a Calamari issue, but rather one in NumPy.

@mikegerber
Copy link
Collaborator Author

(Latest NumPy, 1.26.1 on Python 3.10)

@mikegerber
Copy link
Collaborator Author

As this seems to be some binary incompatibility/glitch I'm updating my Linux/GCC first and try again.

@mikegerber
Copy link
Collaborator Author

Upgrade to Debian 12,
Compiled a fresh Python 3.11.6,
Using cached numpy-1.26.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)

and still get UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.

Ignoring for now and hope it's an issue that's going to get fixed upstream.

@mikegerber
Copy link
Collaborator Author

Closing & taking a look again when releasing the next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working maintenance
Projects
None yet
Development

No branches or pull requests

1 participant