Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLD for changing teamd expiry timer #1073

Merged

Conversation

saiarcot895
Copy link
Contributor

@saiarcot895 saiarcot895 commented Aug 26, 2022

This PR adds a HLD for changing the duration of teamd's expiry timer, by sending a message to the peer device with the number of retries it should do for this LAG.

Code PRs:

Repo PR title State
sonic-buildimage teamd: Add support for custom retry counts for LACP sessions GitHub issue/pull request detail
sonic-mgmt Add test cases for teamd retry count feature GitHub issue/pull request detail
sonic-utilities Add CLI configuration options for teamd retry count feature GitHub issue/pull request detail

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

## Requirements

- Switch running a supported SONiC with patches in libteam for this feature on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this feature (patch) stay as patch in libteam specifically for SONiC or will be pushed to libteam community as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is effectively breaking the LACP protocol, I'm not planning on submitting this patch upstream.

# Protocol

To change the number of retries, an Ethernet packet of the fillowing structure
will be sent. This Ethernet packet will have an ethertype of 0x6300, and will
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for ethertype 0x6300 selection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just meant to be a custom ethertype that appears to be unused. I needed something that was unused and is unlikely to get treated like a "normal" data packet.


# CLI

No new CLI options or config options will be added, as this is not meant to be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config knob may be required to avoid sending unnecessary 0x6300 packets during warm-reboot when SONiC is connected to Non-SONiC device.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a configurable option to enable this feature, default should be left disabled with no custom TLV to interfere with WB. The HLD diverged from standard LACP protocol definition, using custom TLV to overcome SONiC WB timing, and potentially a phase2 send and ack mechanism in the future that could potentially block WB from proceeding. For deployment where non standard protocol packets are forbidden, we need a configurable option to control this behavior. Preferably leaving it disable by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HLD has been updated with a new design/approach. By default, this feature is disabled, so there won't be any custom packets unless configured.


Now, in addition to refreshing the PDUs timer, the above-specified Ethernet
packet (with ethertype 0x6300) will be sent to the peer devices, with the new
retry count set to 5. This notifies the peer device that for this device that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we make this retry-count as user configurable with default value 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HLD has been updated with a new design/approach. The retry count is now user-configurable.

with ethertype 0x6300, and the data will contain the Actor Information, Partner
Information, and Retry Count TLVs. The receiving device must validate the actor
and partner information, and then update the retry count as specified. No
acknowledgment packet is sent back.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you ensure the packet reached the peer without ack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack will be added into the protocol.


| Value | Description |
|-------|---------------------|
| 0x01 | Actor Information |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about sending this special packet at LAG level, instead of sending on every lag member which can be heavy if the lag size is significantly high; for example 64 member lag

@zhangyanzhao
Copy link
Collaborator

@saiarcot895 can you please help to add the code PRs into this HLD by referring to #806 ? Thanks.

@zhangyanzhao
Copy link
Collaborator

@saiarcot895 can you please add the code PRs by referring to #806 ? Thanks.

@saiarcot895
Copy link
Contributor Author

No code has been committed yet for this. There are changes to the design of this feature, and this HLD needs to be updated. I'm moving this PR to draft status.

@saiarcot895 saiarcot895 marked this pull request as draft December 7, 2022 22:45
@saiarcot895 saiarcot895 marked this pull request as ready for review February 27, 2023 19:42
@zhangyanzhao
Copy link
Collaborator

@zhangyanzhao
Copy link
Collaborator

added to 202305 release

@zhangyanzhao
Copy link
Collaborator

move to post 202305 release. code PRs are not ready

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@yxieca yxieca merged commit c875c38 into sonic-net:master Sep 7, 2023
@saiarcot895 saiarcot895 deleted the teamd-customize-number-of-retries branch September 7, 2023 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants