Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantiate the RemoteClusterWatcher with gatewayAlive=true (#8590) #8725

Merged
merged 1 commit into from
Jun 23, 2022
Merged

Instantiate the RemoteClusterWatcher with gatewayAlive=true (#8590) #8725

merged 1 commit into from
Jun 23, 2022

Conversation

chenaoxd
Copy link
Contributor

@chenaoxd chenaoxd commented Jun 22, 2022

If we don't instantiate the RemoteClusterWatcher
with gatewayAlive=true, then there will be a small
period all services will fail fast unexpectedly.

Simply Instantiate the RemoteClusterWatcher with
gatewayAlive=true.

Fix #8590

Signed-off-by: Ao Chen chenao3220@gmail.com

Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I don't think there's any need to test this. Have you checked whether this fixes your problem?

@@ -204,6 +204,8 @@ func restartClusterWatcher(
repairPeriod,
probeWorker.Liveness,
enableHeadlessSvc,
// always instantiate the gatewayAlive=0 to prevent unexpected service fail fast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make more sense for gatewayAlive=1 here? generally true evaluates to the constant 1, and false to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the mistake, I'll correct this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! :)

@chenaoxd
Copy link
Contributor Author

@mateiidavid I've publish this change to our cluster, but there's no watch restart yet. I'll give you the confirmation in maybe 12 hours.

If we don't instantiate the RemoteClusterWatcher
with gatewayAlive=true, then there will be a small
period all services will fail-fast unexpectedly.

Simply Instantiate the RemoteClusterWatcher with
gatewayAlive=true.

Fix #8590

Signed-off-by: Ao Chen <chenao3220@gmail.com>
@chenaoxd
Copy link
Contributor Author

@mateiidavid Confirmed, the watch restarting happens and there's no service fail fast. It fixes my problem.

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks @chenaoxd

@mateiidavid mateiidavid merged commit 9d868c0 into linkerd:main Jun 23, 2022
kleimkuhler pushed a commit that referenced this pull request Jun 30, 2022
If we don't instantiate the RemoteClusterWatcher
with gatewayAlive=true, then there will be a small
period all services will fail-fast unexpectedly.

Simply Instantiate the RemoteClusterWatcher with
gatewayAlive=true.

Fix #8590

Signed-off-by: Ao Chen <chenao3220@gmail.com>
kleimkuhler added a commit that referenced this pull request Jul 1, 2022
If we don't instantiate the RemoteClusterWatcher with gatewayAlive=true, then
there will be a small period all services will fail-fast unexpectedly.

Simply Instantiate the RemoteClusterWatcher with gatewayAlive=true.

Signed-off-by: Ao Chen <chenao3220@gmail.com>
@chenaoxd chenaoxd deleted the chenao/fix-liveness branch July 10, 2022 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[multicluster] Fail fast caused by delayed gateway liveness probe.
3 participants