Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd_logging.conf: attach fileHandlers #1117

Merged
merged 6 commits into from
Oct 18, 2023
Merged

ocrd_logging.conf: attach fileHandlers #1117

merged 6 commits into from
Oct 18, 2023

Conversation

kba
Copy link
Member

@kba kba commented Oct 18, 2023

  • The ocrd_models logger was already redundant, removed
  • attach fileHandler to both root and ocrd logger
  • reduce log level of the ocrd logger from ERROR to INFO

Background is that we ship the configuration as /etc/ocrd_logging.conf with the docker containers. With the current setup this led to an empty ocrd.log being created and debugging further impeded by the high log level.

Maybe we should go even lower and default to logging everything (DEBUG). It is always possible to provide a custom logging config to fine-tune the behavior, but the main goal should be to offer as much information as possible for debugging- thoughts?

@MehmedGIT
Copy link
Contributor

Maybe we should go even lower and default to logging everything (DEBUG).

I like this idea. Logging as much as possible by default would be ideal.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 18, 2023

I have just tested running natively. Although I have a logging configuration in my home directory the produced ocrd.log inside the workspace is still empty.

EDIT: With a configuration file the changed part is never executed. I ran afterward without a logging configuration file and no logging file was created at all.

@kba
Copy link
Member Author

kba commented Oct 18, 2023

I have just tested running natively. Although I have a logging configuration in my home directory the produced ocrd.log inside the workspace is still empty.

Can you retry with the current state of the PR and OCRD_LOGGING_DEBUG=1 and post the messages starting with [LOGGING] from STDERR? And make sure that the config file you have has the md5sum 8f5f0bb665e433fdfbb40d28d51e3e28 (i.e. latest version in this PR).

EDIT: With a configuration file the changed part is never executed.

What do you mean with "the changed part is never executed"?

I ran afterward without a logging configuration file and no logging file was created at all.

That is to be expected, the builtin logging configuration only logs to STDERR.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 18, 2023

Can you retry with the current state of the PR and OCRD_LOGGING_DEBUG=1 and post the messages starting with [LOGGING] from STDERR?

Now it correctly produces and fills the ocrd.log. Mirrors the logs to the shell as well.. Btw, I am just running a single binarization processor without using the ocrd_network architecture.

Output

[LOGGING] Picked up logging config at /home/mm/ocrd_logging.conf

With the ocrd_network architecture there is also an ocrd.log file but it contains only logs that were not redirected to another file, i.e., in the case of separate modules.

What do you mean with "the changed part is never executed"?

I mean the changed source code from the commits in this PR.

@kba
Copy link
Member Author

kba commented Oct 18, 2023

OK, so it looks like the behavior is consistent now. Anything else needs adapting or can I merge?

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 18, 2023

OK, so it looks like the behavior is consistent now. Anything else needs adapting or can I merge?

I just realized that with the ocrd_network module when I run a workflow there is another ocrd.log file produced in my home directory that contains around 6k lines - potentially these are all logs. Is that expected?

EDIT: Running a second workflow just appends to the same log file. Could this potentially lead to logging explosion of the logs to a single file and fill the server space? This is observed only with a logging configuration file available. Should not be a main concern.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 18, 2023

Also in the ocrd.log file inside the workspace directory, there are very low-level logs from the multipart library. Should we set that by default to INFO?

14:27:17.364 DEBUG ocrd.workspace.add_file - outputfile file_grp=OCR-D-DEWARP local_filename=OCR-D-DEWARP/FILE_0014_OCR-D-DEWARP_region0004_region0004_line0011.IMG-DEWARP.png content=False
14:27:17.380 DEBUG multipart.multipart - Calling on_field_start with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_name with data[0:8]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_data with data[9:21]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_end with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_start with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_name with data[22:29]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_data with data[30:94]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_end with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_start with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_name with data[95:103]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_data with data[104:115]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_end with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_start with no data
14:27:17.380 DEBUG multipart.multipart - Calling on_field_name with data[116:123]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_data with data[124:133]
14:27:17.380 DEBUG multipart.multipart - Calling on_field_end with no data
14:27:17.381 DEBUG multipart.multipart - Calling on_field_start with no data
14:27:17.381 DEBUG multipart.multipart - Calling on_field_name with data[134:148]
14:27:17.381 DEBUG multipart.multipart - Calling on_field_data with data[149:232]
14:27:17.381 DEBUG multipart.multipart - Calling on_field_end with no data
14:27:17.381 DEBUG multipart.multipart - Calling on_end with no data

@kba
Copy link
Member Author

kba commented Oct 18, 2023

Also in the ocrd.log file, there are very low-level logs from the multipart library. Should we set that by default to INFO?

Yeah, that's the consequence of setting the root logger to DEBUG. I'll add an exception for multipart to be logged only INFO+.

I just realized that with the ocrd_network module when I run a workflow there is another ocrd.log file produced in my home directory that contains around 6k lines - potentially these are all logs. Is that expected?

The amount of log messages depend on what is happening exactly, I suspect there will be a lot of debug logging from libraries that we should shut up by default (like multipart). If you spot some patterns to reduce the noise, let know.

EDIT: Running a second workflow just appends to the same log file. Could this potentially lead to logging explosion of the logs to a single file and fill the server space? This is observed only with a logging configuration file available. Should not be a main concern.

Yes, this will become a problem at some point, where you probably want to switch from the simple FileHandler to RotatingFileHandler or similar.

@MehmedGIT
Copy link
Contributor

If you spot some patterns to reduce the noise, let know.

Give me a few minutes before merging this so I can check the produced ocrd.log file and see if there are some other libraries to be suppressed.

@kba
Copy link
Member Author

kba commented Oct 18, 2023

Sure, take your time, will merge in the evening. multipart log noise should be reduced with
e2b0b63

Copy link
Contributor

@MehmedGIT MehmedGIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@kba kba merged commit 3b75c36 into master Oct 18, 2023
2 checks passed
@kba kba deleted the logging-2023-10-18 branch October 18, 2023 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants