Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployer: Remove any pre-existing socket file before starting the server (again) #1282

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

kba
Copy link
Member

@kba kba commented Oct 1, 2024

Tackling those errors:

17:03:19.064 ERROR ocrd_network.tcp_to_uds_mets_proxy - Uds-Mets-Server gives unexpected error. Response: {'_content': b'"A file with ID==FILE_0015_OCR-D-DEWARP_region0002_region0002_line0011.IMG-DEWARP already exists <OcrdFile fileGrp=OCR-D-DEWARP ID=FILE_0015_OCR-D-DEWARP_region0002_region0002_line0011.IMG-DEWARP, mimetype=image/png, url=---, local_filename=OCR-D-DEWARP/FILE_0015_OCR-D-DEWARP_region0002_region0002_line0011.IMG-DEWARP.png]/> and neither force nor ignore are set"', '_content_consumed': True, '_next': None, 'status_code': 400, 'headers': {'date': 'Tue, 01 Oct 2024 15:03:18 GMT', 'server': 'uvicorn', 'content-length': '364', 'content-type': 'application/json'}, 'raw': <urllib3.response.HTTPResponse object at 0x7fc5ad35cac0>, 'url': 'http+unix://%2Ftmp%2Focrd_network_sockets%2F_home_mm_repos_ocrd_network_tests_ws29_data.sock/file', 'encoding': 'utf-8', 'history': [], 'reason': 'Bad Request', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(microseconds=5356), 'request': <PreparedRequest [POST]>, 'connection': <requests_unixsocket.adapters.UnixAdapter object at 0x7fc5b00fb8e0>}

But before we merge, we should be absolutely sure that there really is no METS server running anymore at that location at this point. Because otherwise, we still have spurious METS server instances running and just hiding the error.

@kba kba requested a review from MehmedGIT October 1, 2024 16:05
@kba
Copy link
Member Author

kba commented Oct 1, 2024

The plot thickens...

Since the METS servers are started in forked processes, it's easy to find them with ps.

So just before my processing server instance reached the eviction RAM threshold I did:

ps xu --sort=start_date | grep "ocrd workspace -U .* server start"

And found lots of old METS servers idling, 127 in total 😬

After killing them with |head -n100|tr -s ' '|cut -d' ' -f 2|xargs kill this is what the memory did:

grafik

So let's make sure that METS server processes are killed when they are no longer needed under all circumstances. For now I'll implement a quick workaround to periodically kill those old METS servers and remove their socket files externally.

@kba kba changed the title wdeployer: Remove any pre-existing socket file before starting the server (again) deployer: Remove any pre-existing socket file before starting the server (again) Oct 1, 2024
Copy link
Contributor

@MehmedGIT MehmedGIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue of removing the socket file during run-time is that sometimes there may be a request that needs the mets server that has been already removed. There is a concise window between removing the socket file and creating a new one, in which a request could land on the processing server (which happens very rarely). Thus, it is not as easy to just check if there are cached requests for a workspace and delete the socket file if there are none.

If that could be avoided, then Deployer.stop_uds_mets_server() method is the correct place where to remove the socket file. Even better, if separately implemented inside OcrdMetsServer.kill_process() and the DELETE / endpoint of the Mets Server for better coverage if these are called in other places.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 2, 2024

There is another place where the socket file removal also happens (not sure why). Also not sure, why it does not remove the stale socket files in your case when using the add file endpoint. Something to keep an eye on to see if it may have any side effects (although I think it should not).

@MehmedGIT
Copy link
Contributor

Check 9a71d04. I have changed the log message level of the shutdown method to warning as well. Maybe that is the reason why you got no such logs.

@kba kba marked this pull request as ready for review October 2, 2024 12:43
@MehmedGIT MehmedGIT self-requested a review October 2, 2024 12:45
@kba kba merged commit 8b6a49c into network_client_block_prints Oct 2, 2024
1 check passed
@kba kba deleted the mets-server-rm-socket branch October 2, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants