Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkmgrd subscribing State DB route event #13

Merged
merged 23 commits into from
Jan 19, 2022

Conversation

zjswhhh
Copy link
Contributor

@zjswhhh zjswhhh commented Dec 15, 2021

Description of PR

Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:

  • If any of the two default route state appears to be 'na', linkmgrd should switch to standby.
  • If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.

Type of change

  • Bug fix
  • Test case(new/improvement)
  • New feature

Approach

What is the motivation for this PR?

To make linkmgrd subscribe state DB route event, and handle the switchovers.

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Related PR: sonic-net/sonic-swss#2009

zjswhhh and others added 11 commits December 15, 2021 03:58
…net#8)

Description of PR
Summary:
Fixes # (issue)

Fix unstable unit tests.

People observed that linkmgrd unit tests appeared to be flaky, and kept failing PR checks. After checking the build logs, I found that in most of the failures, the state change handler didn't seem to be invoked: Linkmgrd's composite state remained as it was before the state change was post.

This PR updates code to invoke io_service::reset(), sets the io_service to no longer be in a stopped state, allowing subsequent calls to run_one() to invoke handlers. (When io_service is stopped, any call to run_one() will return immediately without invoking any handler. )

Signed-off-by: Jing Zhang zhangjing@microsoft.com

Type of change
 Bug fix

Approach
What is the motivation for this PR?
Stabilize linkmgrd unit tests.

How did you do it?
Reset stopped io_service before calling to run_one(), to guarantee state change handlers are invoked as expected.

How did you verify/test it?
Tested building locally.

This issue is hard to reproduce, the current build failure data is 2.98% (56/1877 in last 30 days). We should expect to see improvement after this change.
Add pull request template for linkmgrd repo.

sign-off: Jing Zhang zhangjing@microsoft.com
}

std::string nextState = "na";
if (mIpv4DefaultRouteState == "ok" && mIpv6DefaultRouteState == "ok") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan do we have concern if either IPv4 or IPv6 BGP is not configured? Can we assume we will have them both eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per discussion in Gemini sync-up meeting.

{
MUXLOGWARNING(boost::format("%s: state db default route state: %s") % mMuxPortConfig.getPortName() % routeState);

if(mComponentInitState.test(MuxStateComponent)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.


if(mComponentInitState.test(MuxStateComponent)) {
if (ms(mCompositeState) == mux_state::MuxState::Label::Active &&
routeState == "na") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest putting '{' on a new line, especially when if condition expands multiple lines. new line helps to identify the beginning of the if body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

}

PortMapIterator portMapIterator = mPortMap.begin();
while(portMapIterator != mPortMap.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after while please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

MUXLOGWARNING(boost::format("%s: state db default route state: %s") % mMuxPortConfig.getPortName() % routeState);

if (mComponentInitState.test(MuxStateComponent)) {
if (ms(mCompositeState) == mux_state::MuxState::Label::Active && routeState == "na") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should check for != Standby. What if mux is in a transient state and we got this event? What do you think?

Copy link
Contributor Author

@zjswhhh zjswhhh Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is a good point. I will evaluate the behavior and update accordingly.
Updated, thanks!

if (ms(mCompositeState) == mux_state::MuxState::Label::Active && routeState == "na") {
CompositeState nextState = mCompositeState;
enterLinkProberState(nextState, link_prober::LinkProberState::Wait);
switchMuxState(nextState, mux_state::MuxState::Label::Standby);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it behave if both ToRs have default route as "na" ?

Copy link
Contributor Author

@zjswhhh zjswhhh Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both ToRs will switch to standby, the one switches first will be "active" at the end.

Do we expect both ToRs to be "standby" in this scenario? This is not supported currently by linkmgrd as I know, we might need find other workaround if so.

@zjswhhh zjswhhh requested a review from prsunny January 14, 2022 22:27
lolyu and others added 3 commits January 19, 2022 02:32
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@zjswhhh
Copy link
Contributor Author

zjswhhh commented Jan 19, 2022

boost::asio::ip::make_address is unable to parse "::/0" and "0.0.0.0/0", and considering the key will always be these two strings, I updated the code and removed make_address(). Changes have been tested on virtual dual testbeds and worked as expected.

@prsunny could you review again? Thanks!

@zjswhhh zjswhhh requested a review from prsunny January 19, 2022 17:15
@zjswhhh zjswhhh merged commit 0c23756 into sonic-net:master Jan 19, 2022
@zjswhhh zjswhhh deleted the subscribeRouteEvent branch January 19, 2022 20:59
zjswhhh added a commit to zjswhhh/sonic-linkmgrd that referenced this pull request Jan 20, 2022
Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:
- If any of the two default route state appears to be 'na', linkmgrd should switch to standby.
- If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.

Type of change:
New feature

Motivation for this PR:
To make linkmgrd subscribe state DB route event, and handle the switchovers.

Documentation:
Related PR: sonic-net/sonic-swss#2009
zjswhhh added a commit to zjswhhh/sonic-linkmgrd that referenced this pull request Jan 24, 2022
Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:  
- If any of the two default route state appears to be 'na', linkmgrd should switch to standby. 
- If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.  

Type of change:
New feature

Motivation for this PR:
To make linkmgrd subscribe state DB route event, and handle the switchovers. 

Documentation: 
Related PR: sonic-net/sonic-swss#2009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants