Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance orchagent and buffer manager in error handling #2414

Merged
merged 4 commits into from
Sep 9, 2022

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Aug 15, 2022

What I did
Enhance orchagent and buffer manager

  • Buffer manager: do not insert buffer queue into cache if the profile is illegal, which prevents an empty string from being inserted into APPL_DB during initialization.
  • orchagent: handle the case that a field referencing other objects is an empty string.
    There had been such logic that was broken by a PR last year.

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
Enhance the error handling logic.
In most cases, a user will not encounter such scenarios in a production environment because it's the front-ends' (eg. CLI) responsibility to identify the wrong configuration and prevent them from being inserted to CONFIG_DB.
However, in some cases, like a wrong config_db.json composed and copied to the switch, front-ends can not prevent that.

How I verified it
Manual and mock tests.

Details if related
For the improvement in buffer manager:

  • previously, the logic was:
    • declare a reference portQueue to m_portQueueLookup[port][queues] and then assign fvValue(i) to portQueue.running_profile_name
    • But [] operation on C++ map has a side-effect -- it will insert a new element into the map if there wasn't one. In case the validation check in checkBufferProfileDirection failed and there was not one in the map, the portQueue.running_profile_name will keep empty. This is not what we want.
    • In case there was an item configured in the map, we should not remove it on failure because we want to prevent the user from being affected by misconfiguration and alert user to correct the error. There is log in checkBufferProfileDirection
  • Now it is improved in this way:
    • Avoid using reference and initialize m_portQueueLookup[port][queues] only if there is a valid egress profile configured

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs stephenxs changed the title Enhance buffer manager and buffer orchagent Enhance orchagent and buffer manager in error handling Aug 16, 2022
@stephenxs stephenxs marked this pull request as ready for review August 16, 2022 07:30
@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@liat-grozovik
Copy link
Collaborator

@neethajohn , @prsunny kindly reminder as this is a bug fixc for 202205 appreciate if you can prioritize it.

@prsunny
Copy link
Collaborator

prsunny commented Sep 9, 2022

@neethajohn , can you please review/sign-off?

@neethajohn neethajohn merged commit b62c716 into sonic-net:master Sep 9, 2022
@stephenxs stephenxs deleted the enhance-buffer-mgr-oa branch September 10, 2022 00:32
stephenxs added a commit to stephenxs/sonic-swss that referenced this pull request Sep 14, 2022
What I did
Enhance orchagent and buffer manager

Buffer manager: do not insert buffer queue into cache if the profile is illegal, which prevents an empty string from being inserted into APPL_DB during initialization.
orchagent: handle the case that a field referencing other objects is an empty string.
There had been such logic that was broken by a PR last year.
Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
Enhance the error handling logic.
In most cases, a user will not encounter such scenarios in a production environment because it's the front-ends' (eg. CLI) responsibility to identify the wrong configuration and prevent them from being inserted to CONFIG_DB.
However, in some cases, like a wrong config_db.json composed and copied to the switch, front-ends can not prevent that.

How I verified it
Manual and mock tests.

Details if related
For the improvement in buffer manager:

previously, the logic was:
declare a reference portQueue to m_portQueueLookup[port][queues] and then assign fvValue(i) to portQueue.running_profile_name
But [] operation on C++ map has a side-effect -- it will insert a new element into the map if there wasn't one. In case the validation check in checkBufferProfileDirection failed and there was not one in the map, the portQueue.running_profile_name will keep empty. This is not what we want.
In case there was an item configured in the map, we should not remove it on failure because we want to prevent the user from being affected by misconfiguration and alert user to correct the error. There is log in checkBufferProfileDirection
Now it is improved in this way:
Avoid using reference and initialize m_portQueueLookup[port][queues] only if there is a valid egress profile configured
liat-grozovik pushed a commit that referenced this pull request Sep 19, 2022
… (#2449)

Cherry-pick #2414

- Why I did it
Enhance the error handling logic.
In most cases, a user will not encounter such scenarios in a production environment because it's the front-ends' (eg. CLI) responsibility to identify the wrong configuration and prevent them from being inserted to CONFIG_DB.
However, in some cases, like a wrong config_db.json composed and copied to the switch, front-ends can not prevent that.

- How I verified it
Manual and mock tests.

- Details if related
For the improvement in buffer manager:

previously, the logic was:
declare a reference portQueue to m_portQueueLookup[port][queues] and then assign fvValue(i) to portQueue.running_profile_name
But [] operation on C++ map has a side-effect -- it will insert a new element into the map if there wasn't one. In case the validation check in checkBufferProfileDirection failed and there was not one in the map, the portQueue.running_profile_name will keep empty. This is not what we want.
In case there was an item configured in the map, we should not remove it on failure because we want to prevent the user from being affected by misconfiguration and alert user to correct the error. There is log in checkBufferProfileDirection
Now it is improved in this way:
Avoid using reference and initialize m_portQueueLookup[port][queues] only if there is a valid egress profile configured

Signed-off-by: Stephen Sun stephens@nvidia.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants