-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to segment specific images #319
Comments
Hm, I can't reproduce the error. Both commands run through both on CPU and GPU. Could you give me the full dump of installed packages ( |
CPU vs. GPU makes no difference here, either. Requested list for that env:
|
Not sure if it's related: I wanted to recognize some other images and did so with the 3.0.7 that comes with eScript (same as above), which worked fine. Then I got curious and tried again with another, freshly installed, 3.0.7 with its own pyenv, and got this:
That one has a much shorter package list:
|
On 22/02/02 11:11AM, J. R. Schmid wrote:
Not sure if it's related: I wanted to recognize some other images and did so with the 3.0.7 that comes with eScript (same as above), which worked fine. Then I got curious and tried again with another, freshly installed, 3.0.7 with its own pyenv, and got this:
Yeah, that's the scikit-image fix that wasn't in a stable release yet.
I've cherry-picked it from the binary_dataset branch into master and
tagged a new release 3.0.8. That should at least deal with the crash
below.
|
Ah woops, sorry, had forgotten about that one! |
I get different kinds of
All images and the ALTO results are available online. |
This issue is nagging, as it creates an empty result. In mass production 11 of 10146 ALTO files were empty because of it, in another one 308 of 43011, so it can occur rather often. In some cases processing the same image with a different model helps. I now examined it closer with instrumented / modified kraken code. This is the test case which fails with unmodified kraken git master in an fresh virtual Python environment (see log file):
Now kraken was patched:
With this patch, the log output is
So
|
Urrrgh another shapely/GEOM bug. I'll look into it. In fact the code just above your instrumentation is there to circumvent |
Could it be that each error also leaks GPU memory? I have three running kraken processes which process different large sets of journal pages. Currently they use 5747 MiB, 5225 MiB and 8151 MiB of GPU memory. The process which uses most memory happens to be the one with most errors. |
It shouldn't really. This is in the serializer. But pytorch/cuda does some weird caching which is most likely the reason for increasing memory usage over time. In my experience it gets released after a while and doesn't cause any trouble. |
For these images segmentation ( |
Sorry, accidental auto-close. I've pushed a fix for the error in your latest message (again geometry weirdness but this time in the region vectorization itself). |
@stweil Could you tell me which model you used to get the serializer errors on |
@mittagessen, please try it with my digitue_best.mlmodel. The above output was produced with an earlier model of the same series. |
Same with that one. No crash. |
I got the error messages with kraken-4.1.3.dev37 and now updated to kraken-4.1.3.dev48. It still fails during the serialization:
|
Did you create text output? That works for me, too, without any error. Try to produce ALTO XML, PAGE XML or hOCR. Those fail for me. |
With the latest master and running:
it works for me. Which shapely version do you have installed (1.7.1 here)? And is it a conda install or pip? |
I use a pip install with Python 3.9. That installed Shapely-1.8.2 by default. |
OK, then I'll see if I can reproduce that environment (and the error). If it is the same error (you can actually get a full stack trace by with the |
After a downgrade to Shapely-1.7.1 it works for me, too. |
That's already very helpful to know. |
Latest version is Shapely-1.8.4. I tested that now, and it also fails. |
I can reproduce your bug with 1.8.2 and another unrelated TopologyException for 1.8.4. Awesome. |
* bump up most requirements up to latest releases * pin shapely to a 1.7.x release as 1.8 causes crashes in the serializer (#319)
I pinned shapely to 1.7.x until we get a handle on things. 1.8/2.0 is a fairly large rewrite on their side so kicking the can down the road until things have stabilized is probably easier than playing whack-a-mole with a moving target. I'll write some additional tests that should trigger these regressions though to make sure it won't happen again. |
Shapely is pinned to 1.8.x now. When switching to shapely 2.0.1, I do get self-intersections again. |
I just got the error with latest kraken, Shapely 1.8.5.post1 and images from https://digi.bib.uni-mannheim.de/fileadmin/digi/517438313/max/:
|
This problem occurs with 11 out of a set of ~646 PNGs, all of which plopped out of the exact same processing pipeline, scanned on exactly the same hardware.
Both models (seg & rec) trained from binary_datasets branch about a week ago.
But others :
The text was updated successfully, but these errors were encountered: