Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we properly solve the reason for tf1_disable_interactive_logs existence? #1090

Closed
kba opened this issue Sep 8, 2023 · 4 comments · Fixed by #1091
Closed

Can we properly solve the reason for tf1_disable_interactive_logs existence? #1090

kba opened this issue Sep 8, 2023 · 4 comments · Fixed by #1091
Assignees

Comments

@kba
Copy link
Member

kba commented Sep 8, 2023

In ocrd_network/utils we have

def tf_disable_interactive_logs():                                       
    try:                                                                 
        # This env variable must be set before importing from Keras      
        environ['TF_CPP_MIN_LOG_LEVEL'] = '3'                            
        # from tensorflow.keras.utils import disable_interactive_logging 
        # Enabled interactive logging throws an exception                
        # due to a call of sys.stdout.flush()                            
        disable_interactive_logging()                                    
    except Exception:                                                    
        # Nothing should be handled here if TF is not available          
        pass                                                             

Why did we do that and how can we get rid of it? Because importing tensorflow is expensive and this is particularly strongly felt with the bashlib processors/tests because they create new python sessions (with all the penalties from importing tensorflow) many times during a single run.

There are other bottlenecks like parsing YAML and importing modules globally that are only needed in a single if-else clause but this is the lowest-hanging fruit.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Sep 8, 2023

Keras thinks shell is interactive but it is not in case of the Processing Worker. Check here as well. Potentially this should be resolved on processor level, so we do not have to do that manually in ocrd network.

2023-02-17 15:11:54,788 - ocrd.network.processing_worker - DEBUG - Starting to process the received message: <ocrd.network.rabbitmq_utils.ocrd_messages.OcrdProcessingMessage object at 0x7f6db9a54050>
2023-02-17 15:11:54,789 - ocrd.network.processing_worker - DEBUG - Invoking the pythonic processor: ocrd-calamari-recognize
2023-02-17 15:11:54,789 - ocrd.network.processing_worker - DEBUG - Invoking the processor_class: <class 'ocrd_calamari.recognize.CalamariRecognize'>
2023-02-17 15:11:55,233 - ocrd.network.processing_worker - ERROR - [Errno 5] Input/output error
Traceback (most recent call last):
  File "/home/mm/Desktop/core/ocrd/ocrd/network/processing_worker.py", line 234, in run_processor_from_worker
    instance_caching=False
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 95, in run_processor
    instance_caching=instance_caching
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 332, in get_processor
    parameter=parameter
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/ocrd_calamari/recognize.py", line 44, in __init__
    self.setup()
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/ocrd_calamari/recognize.py", line 52, in setup
    self.predictor = MultiPredictor(checkpoints=checkpoints)
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in __init__
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in <listcomp>
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 116, in __init__
    graph_type="predict", batch_size=batch_size)
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py", line 17, in create_net
    processes=self.processes,
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 59, in __init__
    print(self.model.summary())
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/engine/training.py", line 3304, in summary
    layer_range=layer_range,
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/utils/layer_utils.py", line 319, in print_summary
    print_fn(f'Model: "{model.name}"')
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/utils/io_utils.py", line 80, in print_msg
    sys.stdout.flush()
OSError: [Errno 5] Input/output error
2023-02-17 15:11:55,233 - ocrd.network.processing_worker - ERROR - <class 'ocrd_calamari.recognize.CalamariRecognize'> failed with an exception.

@kba
Copy link
Member Author

kba commented Sep 11, 2023

We can start by fixing this in ocrd_calamari. I'll drop the actual calls to the method from core and add them to ocrd_calamari.

@kba
Copy link
Member Author

kba commented Sep 11, 2023

@MehmedGIT Can you check whether #1091 combined with OCR-D/ocrd_calamari#90 solves the issue? Then I can check which other processors need this.

@MehmedGIT
Copy link
Contributor

@kba, I have just tested and I see no problems.

@kba kba closed this as completed in 483b9d4 Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants