Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fast/Warm restart] Implement helper class for waiting restart done #691

Merged
merged 9 commits into from
Oct 18, 2022

Conversation

Junchao-Mellanox
Copy link
Collaborator

Depends on:

Why I did this?

Daemons which are not related to warm/fast restart might affect the performance of warm/fast restart. A hardcoded start up delay is the current solution to avoid this.

This PR implements a function to wait warm/fast restart done. This function provided a efficiency and graceful way for daemons to wait warm/fast restart done.

How I did it?

Implement a utility function RestartWaiter::waitRestartDone. This function waits warm restart done flag in STATE DB and return true if the flag is set by warm restart finalizer. This function is also exposed as python extension so that python daemons can utilize it.

This PR depends on new fastboot design: https://github.com/sonic-net/SONiC/blob/master/doc/fast-reboot/Fast-reboot_Flow_Improvements_HLD.md

stephenxs
stephenxs previously approved these changes Sep 19, 2022
@Junchao-Mellanox
Copy link
Collaborator Author

/azpw run Azure.sonic-swss-common

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss-common

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Junchao-Mellanox
Copy link
Collaborator Author

/azpw run Azure.sonic-swss-common

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss-common

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

stephenxs
stephenxs previously approved these changes Sep 23, 2022
@Junchao-Mellanox
Copy link
Collaborator Author

Hi @qiluo-msft @liuh-80, could you please review and sign-off?

common/restart_waiter.cpp Outdated Show resolved Hide resolved
common/restart_waiter.cpp Outdated Show resolved Hide resolved
common/restart_waiter.cpp Outdated Show resolved Hide resolved
@stephenxs stephenxs self-requested a review September 23, 2022 11:18
common/restart_waiter.h Outdated Show resolved Hide resolved
stephenxs
stephenxs previously approved these changes Sep 26, 2022
@Junchao-Mellanox
Copy link
Collaborator Author

/azpw run Azure.sonic-swss-common

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss-common

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft self-requested a review October 9, 2022 20:42
return false;
}

selectTimeout -= delay;
Copy link
Contributor

@qiluo-msft qiluo-msft Oct 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delay

Suspecting a bug: delay is measured from a fixed start time, you should not decrease by delay in a loop. #Closed

Copy link
Collaborator Author

@Junchao-Mellanox Junchao-Mellanox Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. Let's say maxWaitTime is 90 seconds, thus initial selectTimeout is 90000 ms. For first iteration, restart is not done, and delay is 10000 ms. To make sure maxWaitTime is 90, the selectTimeout should be adjusted (90000 - 10000) = 80000 seconds because we have already waited 10 seconds.

@Junchao-Mellanox
Copy link
Collaborator Author

/azpw run Azure.sonic-swss-common

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss-common

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft self-requested a review October 10, 2022 04:29
@Junchao-Mellanox
Copy link
Collaborator Author

/azpw run Azure.sonic-swss-common

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss-common

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

common/restart_waiter.h Outdated Show resolved Hide resolved
@Junchao-Mellanox
Copy link
Collaborator Author

Hi @vaibhavhd @liuh-80 @stephenxs @arfeigin , could you please review and sign-off?

@qiluo-msft qiluo-msft merged commit 2cae742 into sonic-net:master Oct 18, 2022
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Oct 28, 2022
Update sonic-swss-common submodule pointer to include the following:
* abda263 Make the loglevel persistent by moving the LOGGER table from the LOGLEVEL DB to the CONFIG DB ([sonic-net#687](sonic-net/sonic-swss-common#687))
* d0fdf62 Check whether a pointer created by dynamic_cast is null before using it. ([sonic-net#689](sonic-net/sonic-swss-common#689))
* 2cae742 [Fast/Warm restart] Implement helper class for waiting restart done ([sonic-net#691](sonic-net/sonic-swss-common#691))

Signed-off-by: dprital <drorp@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants