shutdown the weight proof process pool #10163

altendky · 2022-02-09T15:26:08Z

Sometimes when we shutdown with chia stop -d all we get 4 chia_full_node processes floating around unclosed.

$ ps aux | grep chia_
altendky  591271 84.4  0.1 547924 124968 pts/2   S    09:58   0:20 chia_full_node
altendky  591284 22.2  0.2 634456 140436 pts/2   S    09:58   0:04 chia_full_node
altendky  591285 22.1  0.2 634200 140180 pts/2   S    09:58   0:04 chia_full_node
altendky  591286 22.5  0.2 634200 140180 pts/2   S    09:58   0:04 chia_full_node
altendky  591345  0.0  0.0   9688  2464 pts/2    S+   09:58   0:00 grep --color=auto chia_

Issues:

WeightProofHandler.validate_weight_proof had a bunch of sync code with no interleaved awaits resulting in a 4 second blocking of all other activity (as timed on my laptop).
- Added several await asyncio.sleep(0) to take turns. It is called cooperative concurrency after all.
- If the sync functions are well isolate it may make sense to push them into threads. This both involves an await that will break the sync nature up at those points and also will revert to regular OS/threads/GIL sharing of CPU time while the sync work is being done. But, this depends on the functions be thread safe, hopefully just fully isolated functional input/output with no global state dependence or modification.
The executor was not being closed.
- Added a context manager to close it.
Executor subprocesses continue until they are done with their tasks, there is no systematic way to cancel them.
- In this case the code in the workers had already been added to trigger cancellation via a temp file. It just needed to be leveraged from this second use of the worker functions.
The full node sync task was being cancelled late.
- Moved the cancellation from FullNode._await_closed() to FullNode._close()
The full node sync task was not being awaited.
- Added awaiting of it in FullNode._await_closed() along with consuming the cancellation at that point since we are already shutting down there.
Service start signal handler registration was using the signal.signal() function instead of integrating itself with asyncio signal handling via asyncio.get_running_loop().add_signal_handler(). It seems this would end up either overriding other signal handlers or being overridden by them depending on order of execution. Neither seem good.
- Switched.
Various other shutdown cleanup will be submitted in separate PRs

Draft for:

Self reflection
- 🪞 👨‍🚀
Testing
- Left it cycling overnight with no failures. It's not perfect, but a lot better in the test sequence I'm exposing it to.
- set -vx; while true; do echo "about to start $(date --iso-8601=n)"; which chia; chia start node; sleep 30; echo "about to stop $(date --iso-8601=n)"; chia stop -d all; sleep 5; ps aux | grep chia_; pkill -9 chia_; done
We should explicitly try to shutdown, but can we also adjust a setting so the processes would naturally die as well?
- Leveraged the existing file-based shutdown approach already implemented for the wallet weight proof pool shtudown.
Consider an attribute and reusing the same pool across multiple calls
- Mariano said this was used infrequently enough that just recreating the pool is ok. Creating it looks to take about 0.1s, that is specifically from right before the context manager to inside of it. There may be other delays induced later that I didn't isolate.

mariano54 · 2022-02-09T16:01:48Z

This code was recently changed as part of the new wallet, and it did not get much testing. So it makes sense that there might be a bug here

arvidn · 2022-02-10T00:59:08Z

chia/full_node/weight_proof.py

-            byte_chunks = []
-            for vdf_proof, classgroup, vdf_info in chunk:
-                byte_chunks.append((bytes(vdf_proof), bytes(classgroup), bytes(vdf_info)))
+        with ProcessPoolExecutor(self._num_processes) as executor:


do we spin up these processes every time we validate a weight proof? It seems like we should just keep this ProcessPoolExecutor around, but maybe we only validate weight proofs very rarely.

Either way, probably not for this PR

Yeah, I asked about this and Mariano said it was fine to just recreate the pool.

altendky · 2022-02-10T01:37:43Z

Thanks for looking. There's a 'bunch' more coming on this. It's basically the same situation as with the weight proof process pool in #9050. This will get another temp file to trigger shutdown of the work in the subprocesses since the executor can't handle such cancellation.

…ync sleeps

…text manager

altendky added 2 commits February 9, 2022 10:21

shutdown the weight proof process pool

b6dd24e

use a context manager for the weight proof process pool executor

b9498e7

arvidn previously approved these changes Feb 10, 2022

View reviewed changes

altendky added 4 commits February 9, 2022 21:26

record of the debug code

6223b0f

mostly cleaned up

b2988dd

suppress sync task cancellation propagation when awaited while closing

10fd5f9

breakup multi-second WeightProofHandler.validate_weight_proof with as…

a2ed37d

…ync sleeps

altendky dismissed arvidn’s stale review via a2ed37d February 10, 2022 15:31

altendky added 3 commits February 10, 2022 10:44

move awaiting of sync task until after existing cancellation

6751878

properly handle shutdown file with a new instance each time and a con…

bca6652

…text manager

cleanup

334cefe

altendky marked this pull request as ready for review February 10, 2022 18:44

altendky requested review from almogdepaz and mariano54 February 11, 2022 16:45

mariano54 approved these changes Feb 15, 2022

View reviewed changes

wjblanke merged commit 220e845 into main Feb 15, 2022

wjblanke deleted the shutdown_weight_proof_process_pool branch February 15, 2022 20:22

This was referenced Feb 16, 2022

reinstate pre-existing signal configuration on windows for services #10263

Merged

make KeyboardInterrupt cancel pytest #10279

Closed

asyncify wallet weight proof validation #10376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shutdown the weight proof process pool #10163

shutdown the weight proof process pool #10163

altendky commented Feb 9, 2022 •

edited

Loading

mariano54 commented Feb 9, 2022 •

edited

Loading

arvidn Feb 10, 2022

altendky Feb 10, 2022

altendky commented Feb 10, 2022

shutdown the weight proof process pool #10163

shutdown the weight proof process pool #10163

Conversation

altendky commented Feb 9, 2022 • edited Loading

mariano54 commented Feb 9, 2022 • edited Loading

arvidn Feb 10, 2022

Choose a reason for hiding this comment

altendky Feb 10, 2022

Choose a reason for hiding this comment

altendky commented Feb 10, 2022

altendky commented Feb 9, 2022 •

edited

Loading

mariano54 commented Feb 9, 2022 •

edited

Loading