Can we properly solve the reason for `tf1_disable_interactive_logs` existence? #1090

kba · 2023-09-08T17:16:25Z

In ocrd_network/utils we have

def tf_disable_interactive_logs():                                       
    try:                                                                 
        # This env variable must be set before importing from Keras      
        environ['TF_CPP_MIN_LOG_LEVEL'] = '3'                            
        # from tensorflow.keras.utils import disable_interactive_logging 
        # Enabled interactive logging throws an exception                
        # due to a call of sys.stdout.flush()                            
        disable_interactive_logging()                                    
    except Exception:                                                    
        # Nothing should be handled here if TF is not available          
        pass

Why did we do that and how can we get rid of it? Because importing tensorflow is expensive and this is particularly strongly felt with the bashlib processors/tests because they create new python sessions (with all the penalties from importing tensorflow) many times during a single run.

There are other bottlenecks like parsing YAML and importing modules globally that are only needed in a single if-else clause but this is the lowest-hanging fruit.

The text was updated successfully, but these errors were encountered:

MehmedGIT · 2023-09-08T19:09:58Z

Keras thinks shell is interactive but it is not in case of the Processing Worker. Check here as well. Potentially this should be resolved on processor level, so we do not have to do that manually in ocrd network.

2023-02-17 15:11:54,788 - ocrd.network.processing_worker - DEBUG - Starting to process the received message: <ocrd.network.rabbitmq_utils.ocrd_messages.OcrdProcessingMessage object at 0x7f6db9a54050>
2023-02-17 15:11:54,789 - ocrd.network.processing_worker - DEBUG - Invoking the pythonic processor: ocrd-calamari-recognize
2023-02-17 15:11:54,789 - ocrd.network.processing_worker - DEBUG - Invoking the processor_class: <class 'ocrd_calamari.recognize.CalamariRecognize'>
2023-02-17 15:11:55,233 - ocrd.network.processing_worker - ERROR - [Errno 5] Input/output error
Traceback (most recent call last):
  File "/home/mm/Desktop/core/ocrd/ocrd/network/processing_worker.py", line 234, in run_processor_from_worker
    instance_caching=False
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 95, in run_processor
    instance_caching=instance_caching
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 332, in get_processor
    parameter=parameter
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/ocrd_calamari/recognize.py", line 44, in __init__
    self.setup()
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/ocrd_calamari/recognize.py", line 52, in setup
    self.predictor = MultiPredictor(checkpoints=checkpoints)
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in __init__
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in <listcomp>
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 116, in __init__
    graph_type="predict", batch_size=batch_size)
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py", line 17, in create_net
    processes=self.processes,
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 59, in __init__
    print(self.model.summary())
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/engine/training.py", line 3304, in summary
    layer_range=layer_range,
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/utils/layer_utils.py", line 319, in print_summary
    print_fn(f'Model: "{model.name}"')
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/utils/io_utils.py", line 80, in print_msg
    sys.stdout.flush()
OSError: [Errno 5] Input/output error
2023-02-17 15:11:55,233 - ocrd.network.processing_worker - ERROR - <class 'ocrd_calamari.recognize.CalamariRecognize'> failed with an exception.

kba · 2023-09-11T09:01:23Z

We can start by fixing this in ocrd_calamari. I'll drop the actual calls to the method from core and add them to ocrd_calamari.

kba · 2023-09-11T09:13:25Z

@MehmedGIT Can you check whether #1091 combined with OCR-D/ocrd_calamari#90 solves the issue? Then I can check which other processors need this.

MehmedGIT · 2023-09-11T11:23:44Z

@kba, I have just tested and I see no problems.

kba assigned MehmedGIT and joschrew Sep 8, 2023

kba mentioned this issue Sep 11, 2023

move tf_disable_interactive_logs to utils/logging and remove calls fr… #1091

Merged

kba closed this as completed in 483b9d4 Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we properly solve the reason for `tf1_disable_interactive_logs` existence? #1090

Can we properly solve the reason for `tf1_disable_interactive_logs` existence? #1090

kba commented Sep 8, 2023

MehmedGIT commented Sep 8, 2023 •

edited

Loading

kba commented Sep 11, 2023

kba commented Sep 11, 2023

MehmedGIT commented Sep 11, 2023

Can we properly solve the reason for tf1_disable_interactive_logs existence? #1090

Can we properly solve the reason for tf1_disable_interactive_logs existence? #1090

Comments

kba commented Sep 8, 2023

MehmedGIT commented Sep 8, 2023 • edited Loading

kba commented Sep 11, 2023

kba commented Sep 11, 2023

MehmedGIT commented Sep 11, 2023

Can we properly solve the reason for `tf1_disable_interactive_logs` existence? #1090

Can we properly solve the reason for `tf1_disable_interactive_logs` existence? #1090

MehmedGIT commented Sep 8, 2023 •

edited

Loading