Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the rw folder from the image after installing in KVM #8746

Merged
merged 4 commits into from
Dec 10, 2021

Conversation

saiarcot895
Copy link
Contributor

Why I did it

When the image is installed from within KVM and then loaded, some files
(such as timer stamp files) are created as part of that bootup that then
get into the final image. This can cause some side effects, such as
systemd thinking that some persistent timers need to run because the
last trigger time got missed.

Therefore, at the end of the check_install.py script, remove the rw
folder so that it doesn't exist in the image, and that when this image
is started up in a KVM setup for the first time, it starts with a truly
clean slate.

Without this change, the issue seen was that for fstrim.timer, a stamp
file would be present in /var/lib/systemd/timers (and for other timers
that are marked as persistent). This would then cause fstrim.service to
get started immediately when starting a QEMU setup if the timer for that
service missed a trigger, and not wait 10 minutes after bootup. In the
case of fstrim.timer, that means if the image was started in QEMU after
next Monday, since that timer is scheduled to be triggered weekly.

Signed-off-by: Saikrishna Arcot sarcot@microsoft.com

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

When the image is installed from within KVM and then loaded, some files
(such as timer stamp files) are created as part of that bootup that then
get into the final image. This can cause some side effects, such as
systemd thinking that some persistent timers need to run because the
last trigger time got missed.

Therefore, at the end of the check_install.py script, remove the rw
folder so that it doesn't exist in the image, and that when this image
is started up in a KVM setup for the first time, it starts with a truly
clean slate.

Without this change, the issue seen was that for fstrim.timer, a stamp
file would be present in /var/lib/systemd/timers (and for other timers
that are marked as persistent). This would then cause fstrim.service to
get started immediately when starting a QEMU setup if the timer for that
service missed a trigger, and not wait 10 minutes after bootup. In the
case of fstrim.timer, that means if the image was started in QEMU after
next Monday, since that timer is scheduled to be triggered weekly.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@yxieca
Copy link
Contributor

yxieca commented Sep 14, 2021

test is failing, did this remove being too aggressive?

@saiarcot895
Copy link
Contributor Author

Yes, the firstboot file in the /host/ directory was no longer present, and so none of the first boot steps ran. One of these steps is to create the /var/platform directory, which is then bind-mounted into the mgmt-framework container.

I'm working on splitting up the check_install.py script into two separate scripts. The first script will install the image, and the second script will boot into the image and verify that things are working. The KVM instance for the second script will use the -snapshot flag, so that any changes made by that KVM instance aren't actually written to disk.

Just removing the rw directory causes other issues, since the first boot
tasks no longer run since that file isn't present. Also, just recreating
that file doesn't completely help, because there are some files that are
moved from the /host folder into the base filesystem layer, and so are
no longer available.

Instead, split the installation of SONiC and doing the test bootup into
two separate scripts and two separate KVM instances. The first KVM
instance is the one currently being run, while the second one has the
`-snapshot` flag added in, which means any changes to the disk image
don't take effect.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Sep 15, 2021

This pull request introduces 4 alerts when merging 2f4ac58 into 1863e1f - view on LGTM.com

new alerts:

  • 4 for Unused local variable

check_install.py Outdated Show resolved Hide resolved
Since SONiC will be booted up with -snapshot in the KVM flags, the rw
folder doesn't need to be removed, since changes there won't be saved to
the image file.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Sep 15, 2021

This pull request introduces 3 alerts and fixes 3 when merging 5554290 into 1863e1f - view on LGTM.com

new alerts:

  • 3 for Unused local variable

fixed alerts:

  • 3 for Unused local variable

yxieca
yxieca previously approved these changes Oct 11, 2021
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 12, 2021

This pull request fixes 3 alerts when merging 42339f7 into 7d40384 - view on LGTM.com

fixed alerts:

  • 3 for Unused local variable

@saiarcot895 saiarcot895 merged commit 4803847 into sonic-net:master Dec 10, 2021
@saiarcot895 saiarcot895 deleted the remove-rw-content branch December 10, 2021 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants