Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decentralized learning fail #708

Merged
merged 19 commits into from
Jul 23, 2024
Merged

Conversation

JulienVig
Copy link
Collaborator

@JulienVig JulienVig commented Jul 16, 2024

@JulienVig JulienVig marked this pull request as ready for review July 18, 2024 12:20
@JulienVig JulienVig requested a review from tharvik July 18, 2024 12:20
Copy link
Collaborator

@tharvik tharvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work on making decentralized work in browser! 🎉

a few logic/safety errors to fix and it can be merged 👍

discojs/src/aggregator/get.ts Outdated Show resolved Hide resolved
discojs/src/aggregator/get.ts Outdated Show resolved Hide resolved
discojs/src/aggregator/get.ts Outdated Show resolved Hide resolved
discojs/src/aggregator/get.ts Outdated Show resolved Hide resolved
discojs/src/aggregator/mean.ts Outdated Show resolved Hide resolved
this.aggregationResult,
timeout(undefined, "Timeout waiting on the aggregation result promise to resolve")
])
} catch (e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum, so if this timeouts, we will still procede to the next communication round, but the payload will be invalid (already used by current round and very probably changed by it). I think breaking would be safer.

also, by creating the timeout in there, it is a timeout per communication round, not for the whole onRoundEnd, which is a bit unexcepted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I agree we should break/return here. However I don't see why a timeout per communication round is unexpected. Shouldn't we timeout whenever we may wait on a promise forever?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I don't see why a timeout per communication round is unexpected. Shouldn't we timeout whenever we may wait on a promise forever?

yes, we should timeout for sure, but as each aggregator has a various number of communication round, the "onRoundEnd" timeout varies here which I find a bit weird. you can create the timeout promise at the beginning of the function and reference the same timeout in every communication round.
but yeah, maybe I'm overthinking this (won't be the first time), if you don't find it strange, let's keep it as it is (we don't really have a policy on timeouts anyway).

discojs/src/client/decentralized/peer.ts Show resolved Hide resolved
discojs/src/client/event_connection.ts Outdated Show resolved Hide resolved
discojs/src/client/federated/base.ts Outdated Show resolved Hide resolved
discojs/src/default_tasks/mnist.ts Show resolved Hide resolved
@JulienVig JulienVig requested a review from tharvik July 23, 2024 13:57
@JulienVig JulienVig merged commit edbfce9 into develop Jul 23, 2024
23 checks passed
@JulienVig JulienVig deleted the 694-decentralized-fail-julien branch July 23, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants