migrate to core v3 #117

bertsky · 2024-09-16T15:59:36Z

Still draft as long as v3 is beta / RC, but we can use the CI and already discuss the changes (esp. to tests).

@kba this closely resembles tests on OCR-D/ocrd_kraken#44 (covering variants with METS Caching and/or METS Server and/or parallel pages).

bertsky · 2024-09-16T16:00:18Z

Oh, and this is based on #116, since I often cannot even run Calamari without that.

…redictor

bertsky · 2024-09-18T13:03:40Z

I wrote a simple script to measure and plot the GPU utilisation.

The rather simple modification 9611e2c (which I will cherry pick into #116 for core v2) helps in two ways: it utilises the GPU better (because it avoids too small batches when regions have but a few lines) and thus also allows increasing the batch size without causing OOM:

Unfortunately, fb2a680 does not accomplish what I expected – reducing the peaky GPU utilization behaviour due to GPU waiting for CPU and vice versa.

Here's a log for batch_size=64 without parallel page threads (but with METS Server) – i.e. before fb2a680:

And the same with 3 parallel page threads – still before fb2a680:

Now, after adding fb2a680 with ThreadPoolExecutor computing the predict_raw batches concurrently (shared across parallel page threads):

Thus, surprisingly, the timeline still shows low average utilisation with lots of waiting time. This is also reflected by wall time and CPU time measurements (see below).

It gets a little better if I split up the batches for the background thread (instead of having Calamari v1 do the batching), though:

Also, it helps to increase the number of parallel page threads from 3 to 6 – but just relatively, not regarding bg threading. Here's with 6 threads before fb2a680:

And this is with 6 threads after fb2a680:

I also tried with more than 1 background thread (number of workers in the shared ThreadPoolExecutor), but that does not do much better either – the above with 2 "GPU" threads:

And the same with 4 bg threads:

Increasing the number of parallel page threads to 12 or 24 becomes more inefficent still.

Figures for a book with 180 pages of Fraktur:

commit	OCRD_MAX_PARALLEL_PAGES	wall time	CPU time
`bf755a3` (region-level batches)	1	1148s	1082s
`bf755a3` (region-level batches)	3	744s	1188s
`9611e2c` (page-level batches)	1	1113s	1042s
`9611e2c` (page-level batches)	3	698s	1105s
`fb2a680` (in 1 background thread)	3	709s	1122s
`fb2a680` (in 1 background thread)	6	665s	1178s
`fb2a680` (in 1 background thread)	12	693s	1205s
`fb2a680` (in 2 background threads)	6	660s	1169s
`fb2a680` (in 4 background threads)	6	653s	1160s

Perhaps we must go for Calamari 2 with its efficient tfaip pipelining...

codecov-commenter · 2024-09-18T14:01:58Z

Codecov Report

Attention: Patch coverage is 85.38462% with 19 lines in your changes missing coverage. Please review.

Project coverage is 68.93%. Comparing base (4adf09f) to head (e68ce5f).

Files with missing lines	Patch %	Lines
ocrd_calamari/recognize.py	85.38%	9 Missing and 10 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
- Coverage   71.07%   68.93%   -2.15%     
==========================================
  Files           5        4       -1     
  Lines         204      206       +2     
  Branches       50       55       +5     
==========================================
- Hits          145      142       -3     
- Misses         48       51       +3     
- Partials       11       13       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bertsky added 5 commits September 13, 2024 01:04

split up prediction to avoid overly large batches (causing OOM)

83baf9c

adapt to ocrd>=3.0

bf755a3

make test: no assumption on OCRD resource location

1edd5e7

tests: adapt to v3, overhaul and add caching+threading modes

3333cab

require ocrd 3.0 and calamari-ocr 1.0.7

7aae9bc

bertsky added 3 commits September 17, 2024 18:13

aggregate all lines instead of per region to better utilise batched p…

9611e2c

…redictor

run prediction in bg thread (shared across pages to interleave CPU/GPU)

fb2a680

let GPU memory grow by demand (instead of exclusive reservation)

b9b0e13

bertsky added 2 commits September 18, 2024 13:49

no more need for model fixup

46c2ef6

CI: increase RAM

e68ce5f

bertsky mentioned this pull request Sep 23, 2024

Calamari2 #118

Open

mikegerber self-assigned this Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate to core v3 #117

migrate to core v3 #117

bertsky commented Sep 16, 2024

bertsky commented Sep 16, 2024

bertsky commented Sep 18, 2024

codecov-commenter commented Sep 18, 2024 •

edited

Loading

migrate to core v3 #117

Are you sure you want to change the base?

migrate to core v3 #117

Conversation

bertsky commented Sep 16, 2024

bertsky commented Sep 16, 2024

bertsky commented Sep 18, 2024

codecov-commenter commented Sep 18, 2024 • edited Loading

Codecov Report

codecov-commenter commented Sep 18, 2024 •

edited

Loading