Ms.mempool locking #9050

mariano54 · 2021-10-31T20:08:36Z

Many improvements with two goals:

Improve block propagation and validation times, to make sure everyone can farm at all times, and blockchain health does not deteriorate in times of load,
Improve how many transactions per second each node can accept, especially for low end hardware.

Improvements

The Process pool for validating transactions has been increased from 1 to 2, to better utilize idle cores.
Validation of signatures for new transactions now happens in a different process (in the process pool where CLVM was getting validated). This means we can do more TPS and free the main thread for block validation.
For unfinished block validation, CLVM and signatures are validated outside the blockchain lock (and clvm in the process pool executor)
The mempool validation (which reads coin store) now uses the same lock as block validation (blockchain.lock). This means that block validation gets full priority of using IO and the main thread CPU.
The blockchain lock has been changed to use a LockQueue, where different clients can use different lock priorities. Block validation gets high priority, and mempool low, so that block validation is always prioritized.
A transaction queue is added so that we don't have to keep many long running asyncio tasks from the server, and can keep things in the queue for a while until the node has some free time to validate them. Things are pulled from the cache 200 at a time (with the semaphore) so that we can efficiently validate transactions in different processes. However, note that once a transaction has been selected from the queue, it still might need to wait for the blockchain lock later on. Before, with no queue, transactions would all try to get validated at once, even if the node could not handle them.
respond_transction in the full node protocol now returns instantly and puts TX in the queue.
send_transaction in the wallet protocol maintains it's behavior, but internally pushes transactions into the queue and waits for a result before returning.
CancelledException used to put the node into an invalid state, now it does not; we ensure to call peak_post_processing when tasks are cancelled.
peak_post_processing has been divided into two methods, peak_post_processing and peak_post_processing_2. The first one must be called under the blockchain lock, since it modifies full node state. The second one can be called outside the lock, and it deals with sending blocks and messages to other peers and wallets. Before, they were both called inside the lock.
When the mempool receives a new block, it must go through the whole mempool and re-check all transactions to see if they are still valid. WIth a new optimization, we now only check to see if the removals were removed in the block, if not, the spendbundle must be valid, so we don't check it.
New peaks can also stack up on each other in times of high load, so we limit the number of pending messages, and discard the rest. Also, only 2 run in parallel instead of 8, leading to faster catch-up times in RPI.
When an API times out, we no longer close the connection. This was making slow peers get disconnected in times of high load.
Reduce the number of peers we re-ask for a transaction when the first peer is unresponsive, because we might drop some transactions due to load, so we don't want to keep asking for them.
The pairing cache has been increased from 10k to 50k for more cache hits.
Lazy pairing cache subgroup check to save ~1.3 seconds per block validation on RPI

lgtm-com · 2021-10-31T20:24:19Z

This pull request introduces 3 alerts when merging 01f3688 into 3df0f9e - view on LGTM.com

new alerts:

1 for Unused local variable
1 for Unused import
1 for Incomplete ordering

lgtm-com · 2021-11-01T01:15:08Z

This pull request introduces 1 alert when merging bf4a33d into 3df0f9e - view on LGTM.com

new alerts:

1 for Unused local variable

lgtm-com · 2021-11-01T01:39:10Z

This pull request introduces 1 alert when merging c44e676 into 3df0f9e - view on LGTM.com

new alerts:

1 for Unused local variable

lgtm-com · 2021-11-01T04:26:35Z

This pull request introduces 1 alert when merging e5ee15a into 3df0f9e - view on LGTM.com

new alerts:

1 for Unused local variable

chia/full_node/full_node.py

chia/full_node/full_node_api.py

lgtm-com · 2021-11-03T21:34:30Z

This pull request introduces 1 alert when merging 516b7cf into eb62a28 - view on LGTM.com

new alerts:

1 for Unused import

* Logging for cache * Less logging * Return to original plan * Clean up * Remove coment * Remove log

Yostra · 2021-11-04T04:58:44Z

chia/server/server.py


                    f = getattr(self.api, message_type, None)
+                    if len(message_types) % 100 == 0:


are changes in chia/server/server.py just logging every 100th message per type ?

They are logging the current state (which API tasks are still running) for debugging purposes

Yostra · 2021-11-04T05:00:11Z

chia/server/server.py

@@ -493,12 +495,11 @@ def connection_closed(self, connection: WSChiaConnection, ban_time: int):
                f"Invalid connection type for connection {connection.peer_host},"
                f" while closing. Handshake never finished."
            )
+        self.cancel_tasks_from_peer(connection.peer_node_id)


was there some race going on here?

(it got moved before on disconnect)

No i just did that as a precaution, i was scared the stuff below it might throw a exception

Yostra · 2021-11-04T05:01:54Z

chia/full_node/mempool_manager.py

@@ -484,30 +513,52 @@ async def new_peak(self, new_peak: Optional[BlockRecord]) -> List[Tuple[SpendBun
            return []
        assert new_peak.timestamp is not None

+        use_optimization: bool = self.peak is not None and new_peak.prev_transaction_block_hash == self.peak.header_hash


is this needed, I though changes are tracked across batch updates?

Well after sync we call this method, but we dont have all of the coin changes since the start of the sync. We could instead clear the mepool after the sync.

wjblanke

aok

mariano54 added 2 commits October 31, 2021 16:06

Prority locking to consensus

f545bed

Remove pstats

01f3688

mariano54 requested a review from altendky October 31, 2021 20:08

mariano54 marked this pull request as draft October 31, 2021 20:08

mariano54 added 4 commits October 31, 2021 18:12

Linting

1ffe17e

Do some stuff outside of lock

fe5e528

Fix startup

a27a049

Add log timings

bf4a33d

Try some different locking

c44e676

mariano54 added 2 commits November 1, 2021 00:07

Add limit

95f1c5c

catch excp

e5ee15a

mariano54 and others added 9 commits November 1, 2021 01:15

CLVM inside lock

e23be2f

Try using a semaphore instead

05da9c3

use events for lock queue

629185d

test

175397f

Add logging for message types

abf2690

type

54c0fcb

remove seed

2748641

check new peak waiters

ce61397

correct FullNodeAPI self.full_node.new_peak._waiters typo

5d65024

altendky reviewed Nov 1, 2021

View reviewed changes

chia/full_node/full_node.py Outdated Show resolved Hide resolved

chia/full_node/full_node.py Outdated Show resolved Hide resolved

chia/full_node/full_node_api.py Outdated Show resolved Hide resolved

chia/full_node/full_node_api.py Outdated Show resolved Hide resolved

altendky added 5 commits November 1, 2021 12:01

correct logging string typos

9b8a08f

only warn about new_peak Waiters if there is at least 1

632a9f2

remove no-longer-accepted parameter to FullNode.peak_post_processing()

f39553b

only warn about respond_transaction Waiters if there is at least 1

4b179a4

lint

9662e7c

mariano54 and others added 5 commits November 3, 2021 17:55

Fix more tests, reduce logging, lint

6f60f4a

One more lint

fabfa6d

blockchain tests, pass bytes directly, single call

10ff6b6

Try to fix rl_wallet failures

571df63

Fix mempool test

dd6e486

mariano54 marked this pull request as ready for review November 3, 2021 23:09

Yostra and others added 11 commits November 3, 2021 19:13

catch everything

3b561b6

Don't test RL wallet

bee7128

Merge branch 'main' into ms.mempool_locking

69e1a8f

Fix more tests and return error code

3a0d8e0

Improve error handling in multiprocess

9b9e988

Add pre-validation time

903357c

Add pre-validation time in logs, and revert pytest.ini changes

6f80227

Add log correctly

b4aa321

Ms.bls cache experiment (#9115)

a8c1d58

* Logging for cache * Less logging * Return to original plan * Clean up * Remove coment * Remove log

formalize LockQueue shutdown

7380e99

Comments

632e480

Yostra reviewed Nov 4, 2021

View reviewed changes

mariano54 added 5 commits November 4, 2021 01:17

Fix blockchain test

3b6a9a0

Improve cache

8f0afdd

Remove logs

686afb7

Fix sign_coin_spends

e8ef4ab

Fix pool wallet

5a9d291

wjblanke approved these changes Nov 4, 2021

View reviewed changes

wjblanke merged commit 8a028c3 into main Nov 4, 2021

wjblanke deleted the ms.mempool_locking branch November 4, 2021 16:29

altendky mentioned this pull request Feb 10, 2022

shutdown the weight proof process pool #10163

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ms.mempool locking #9050

Ms.mempool locking #9050

mariano54 commented Oct 31, 2021 •

edited

Loading

lgtm-com bot commented Oct 31, 2021

lgtm-com bot commented Nov 1, 2021

lgtm-com bot commented Nov 1, 2021

lgtm-com bot commented Nov 1, 2021

lgtm-com bot commented Nov 3, 2021

Yostra Nov 4, 2021

mariano54 Nov 4, 2021

Yostra Nov 4, 2021

Yostra Nov 4, 2021

mariano54 Nov 4, 2021

Yostra Nov 4, 2021

mariano54 Nov 4, 2021

wjblanke left a comment


		f = getattr(self.api, message_type, None)
		if len(message_types) % 100 == 0:

Ms.mempool locking #9050

Ms.mempool locking #9050

Conversation

mariano54 commented Oct 31, 2021 • edited Loading

Improvements

lgtm-com bot commented Oct 31, 2021

lgtm-com bot commented Nov 1, 2021

lgtm-com bot commented Nov 1, 2021

lgtm-com bot commented Nov 1, 2021

lgtm-com bot commented Nov 3, 2021

Yostra Nov 4, 2021

Choose a reason for hiding this comment

mariano54 Nov 4, 2021

Choose a reason for hiding this comment

Yostra Nov 4, 2021

Choose a reason for hiding this comment

Yostra Nov 4, 2021

Choose a reason for hiding this comment

mariano54 Nov 4, 2021

Choose a reason for hiding this comment

Yostra Nov 4, 2021

Choose a reason for hiding this comment

mariano54 Nov 4, 2021

Choose a reason for hiding this comment

wjblanke left a comment

Choose a reason for hiding this comment

mariano54 commented Oct 31, 2021 •

edited

Loading