Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1 / 5] Optimize logic for gossiping assignments #4848

Merged
merged 3 commits into from
Jul 16, 2024

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Jun 20, 2024

This is part of the work to further optimize the approval subsystems, if you want to understand the full context start with reading #4849 (comment), however that's not necessary, as this change is self-contained and nodes would benefit from it regardless of subsequent changes landing or not.

While testing with 1000 validators I found out that the logic for determining the validators an assignment should be gossiped to is taking a lot of time, because it always iterated through all the peers, to determine which are X and Y neighbours and to which we should randomly gossip(4 samples).

This could be actually optimised, so we don't have to iterate through all peers for each new assignment, by fetching the list of X and Y peer ids from the topology first and then stopping the loop once we took the 4 random samples.

With this improvements we reduce the total CPU time spent in approval-distribution with 15% on networks with 500 validators and 20% on networks with 1000 validators.

Test coverage:

propagates_assignments_along_unshared_dimension and propagates_locally_generated_assignment_to_both_dimensions cover already logic and they passed, confirm that there is no breaking change.

Additionally, the approval voting benchmark measure the traffic sent to other peers, so I confirmed that for various network size there is no difference in the size of the traffic sent to other peers.

@alexggh alexggh changed the title Optimize logic for gossiping assignments [4 / 5] Optimize logic for gossiping assignments Jun 20, 2024
@alexggh alexggh changed the base branch from alexaggh/approval-voting-parallel-3-5 to master July 2, 2024 11:39
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-4-5 branch from cb57906 to 4b3f489 Compare July 2, 2024 11:40
@alexggh alexggh changed the title [4 / 5] Optimize logic for gossiping assignments [1 / 5] Optimize logic for gossiping assignments Jul 2, 2024
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-4-5 branch 2 times, most recently from 7add070 to 4b3f489 Compare July 2, 2024 12:01
@alexggh alexggh marked this pull request as ready for review July 2, 2024 12:24
@alexggh alexggh added the T0-node This PR/Issue is related to the topic “node”. label Jul 2, 2024
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-4-5 branch from 4b3f489 to 713aef4 Compare July 2, 2024 12:29
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6605079

Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find!

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Copy link
Contributor

@AndreiEres AndreiEres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh
Copy link
Contributor Author

alexggh commented Jul 16, 2024

Double checked that things are working as expected on versi as well. Merging it now.

@alexggh alexggh enabled auto-merge July 16, 2024 08:18
@alexggh alexggh added this pull request to the merge queue Jul 16, 2024
Merged via the queue into master with commit cde2eb4 Jul 16, 2024
155 of 160 checks passed
@alexggh alexggh deleted the alexaggh/approval-voting-parallel-4-5 branch July 16, 2024 10:16
ordian added a commit that referenced this pull request Jul 17, 2024
* master:
  add elastic scaling MVP guide (#4663)
  Send PeerViewChange with high priority (#4755)
  [ci] Update forklift in CI image (#5032)
  Adjust base value for statement-distribution regression tests (#5028)
  [pallet_contracts] Add support for transient storage in contracts host functions (#4566)
  [1 / 5] Optimize logic for gossiping assignments (#4848)
  Remove `pallet-getter` usage from pallet-session (#4972)
  command-action: added scoped permissions to the github tokens (#5016)
  net/litep2p: Propagate ValuePut events to the network backend (#5018)
  rpc: add back rpc logger (#4952)
  Updated substrate-relay version for tests (#5017)
  Remove most all usage of `sp-std` (#5010)
  Use sp_runtime::traits::BadOrigin (#5011)
jpserrat pushed a commit to jpserrat/polkadot-sdk that referenced this pull request Jul 18, 2024
This is part of the work to further optimize the approval subsystems, if
you want to understand the full context start with reading
paritytech#4849 (comment),
however that's not necessary, as this change is self-contained and nodes
would benefit from it regardless of subsequent changes landing or not.

While testing with 1000 validators I found out that the logic for
determining the validators an assignment should be gossiped to is taking
a lot of time, because it always iterated through all the peers, to
determine which are X and Y neighbours and to which we should randomly
gossip(4 samples).

This could be actually optimised, so we don't have to iterate through
all peers for each new assignment, by fetching the list of X and Y peer
ids from the topology first and then stopping the loop once we took the
4 random samples.

With this improvements we reduce the total CPU time spent in
approval-distribution with 15% on networks with 500 validators and 20%
on networks with 1000 validators.

## Test coverage:

`propagates_assignments_along_unshared_dimension` and
`propagates_locally_generated_assignment_to_both_dimensions` cover
already logic and they passed, confirm that there is no breaking change.

Additionally, the approval voting benchmark measure the traffic sent to
other peers, so I confirmed that for various network size there is no
difference in the size of the traffic sent to other peers.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
ordian added a commit that referenced this pull request Jul 18, 2024
* master: (125 commits)
  add elastic scaling MVP guide (#4663)
  Send PeerViewChange with high priority (#4755)
  [ci] Update forklift in CI image (#5032)
  Adjust base value for statement-distribution regression tests (#5028)
  [pallet_contracts] Add support for transient storage in contracts host functions (#4566)
  [1 / 5] Optimize logic for gossiping assignments (#4848)
  Remove `pallet-getter` usage from pallet-session (#4972)
  command-action: added scoped permissions to the github tokens (#5016)
  net/litep2p: Propagate ValuePut events to the network backend (#5018)
  rpc: add back rpc logger (#4952)
  Updated substrate-relay version for tests (#5017)
  Remove most all usage of `sp-std` (#5010)
  Use sp_runtime::traits::BadOrigin (#5011)
  network/tx: Ban peers with tx that fail to decode (#5002)
  Try State Hook for Bounties (#4563)
  [statement-distribution] Add metrics for distributed statements in V2 (#4554)
  added sync command (#4818)
  Bridges V2 refactoring backport and `pallet_bridge_messages` simplifications (#4935)
  xcm-executor: Improve logging (#4996)
  Remove usage of `sp-std` on templates (#5001)
  ...
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this pull request Aug 2, 2024
This is part of the work to further optimize the approval subsystems, if
you want to understand the full context start with reading
paritytech#4849 (comment),
however that's not necessary, as this change is self-contained and nodes
would benefit from it regardless of subsequent changes landing or not.

While testing with 1000 validators I found out that the logic for
determining the validators an assignment should be gossiped to is taking
a lot of time, because it always iterated through all the peers, to
determine which are X and Y neighbours and to which we should randomly
gossip(4 samples).

This could be actually optimised, so we don't have to iterate through
all peers for each new assignment, by fetching the list of X and Y peer
ids from the topology first and then stopping the loop once we took the
4 random samples.

With this improvements we reduce the total CPU time spent in
approval-distribution with 15% on networks with 500 validators and 20%
on networks with 1000 validators.

## Test coverage:

`propagates_assignments_along_unshared_dimension` and
`propagates_locally_generated_assignment_to_both_dimensions` cover
already logic and they passed, confirm that there is no breaking change.

Additionally, the approval voting benchmark measure the traffic sent to
other peers, so I confirmed that for various network size there is no
difference in the size of the traffic sent to other peers.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
ordian added a commit that referenced this pull request Aug 6, 2024
* master: (130 commits)
  add elastic scaling MVP guide (#4663)
  Send PeerViewChange with high priority (#4755)
  [ci] Update forklift in CI image (#5032)
  Adjust base value for statement-distribution regression tests (#5028)
  [pallet_contracts] Add support for transient storage in contracts host functions (#4566)
  [1 / 5] Optimize logic for gossiping assignments (#4848)
  Remove `pallet-getter` usage from pallet-session (#4972)
  command-action: added scoped permissions to the github tokens (#5016)
  net/litep2p: Propagate ValuePut events to the network backend (#5018)
  rpc: add back rpc logger (#4952)
  Updated substrate-relay version for tests (#5017)
  Remove most all usage of `sp-std` (#5010)
  Use sp_runtime::traits::BadOrigin (#5011)
  network/tx: Ban peers with tx that fail to decode (#5002)
  Try State Hook for Bounties (#4563)
  [statement-distribution] Add metrics for distributed statements in V2 (#4554)
  added sync command (#4818)
  Bridges V2 refactoring backport and `pallet_bridge_messages` simplifications (#4935)
  xcm-executor: Improve logging (#4996)
  Remove usage of `sp-std` on templates (#5001)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants