Skip to content

Commit

Permalink
[config] Enable/disable container monitoring when starting/stopping t…
Browse files Browse the repository at this point in the history
…he services. (sonic-net#1471)

What I did
When we ran the command sudo config load, sudo config reload or sudo config load_minigraph, the containers swss, snmp, lldp, teamd, syncd, snmp, bgp, radv, pmon, dhcp_relay, telemetry and restapi would be stopped and then restarted. The script container_checker ran by Monit will generate false alerting messages into syslog to indicate some containers were not running during such stopping and restarting process. So this PR aims to prevent Monit from generating false alarm messages.

How I did it
Before stopping services, we disable Monit to monitor the running status of containers. After restarting services, we enable Monit to monitor the running status of containers again.

How to verify it
I deliberately reduce the monitoring interval of Monit from 60 seconds to 10 seconds to ensure the alerting messages from the script container_checker was generated during sudo config reload, sudo config load and sudo config load_minigraph. After this change was added into _stop_services(...) and _restart_services(...) , I checked that the alerting messages from container_checker did not appear in the syslog.

I verified this change on the device str-a7050-acs-3.

Previous command output (if the output of a command-line utility has changed)
admin@vlab-01:~$ sudo config reload -y
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Executing stop of service swss...
Executing stop of service lldp...
Executing stop of service pmon...
Executing stop of service bgp...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Executing reset-failed of service bgp...
Executing reset-failed of service dhcp_relay...
Executing reset-failed of service hostname-config...
Executing reset-failed of service interfaces-config...
Executing reset-failed of service lldp...
Executing reset-failed of service ntp-config...
Executing reset-failed of service pmon...
Executing reset-failed of service radv...
Executing reset-failed of service rsyslog-config...
Executing reset-failed of service snmp...
Executing reset-failed of service swss...
Executing reset-failed of service syncd...
Executing reset-failed of service teamd...
Executing reset-failed of service telemetry...
Executing restart of service hostname-config...
Executing restart of service interfaces-config...
Executing restart of service ntp-config...
Executing restart of service rsyslog-config...
Executing restart of service swss...
Executing restart of service bgp...
Executing restart of service pmon...
Executing restart of service lldp...
Executing restart of service telemetry...
Reloading Monit configuration ...
Reinitializing monit daemon
New command output (if the output of a command-line utility has changed)
admin@vlab-01:~$ sudo config reload -y
Disabling container monitoring ...
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Executing stop of service swss...
Executing stop of service lldp...
Executing stop of service pmon...
Executing stop of service bgp...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Executing reset-failed of service bgp...
Executing reset-failed of service dhcp_relay...
Executing reset-failed of service hostname-config...
Executing reset-failed of service interfaces-config...
Executing reset-failed of service lldp...
Executing reset-failed of service ntp-config...
Executing reset-failed of service pmon...
Executing reset-failed of service radv...
Executing reset-failed of service rsyslog-config...
Executing reset-failed of service snmp...
Executing reset-failed of service swss...
Executing reset-failed of service syncd...
Executing reset-failed of service teamd...
Executing reset-failed of service telemetry...
Executing restart of service hostname-config...
Executing restart of service interfaces-config...
Executing restart of service ntp-config...
Executing restart of service rsyslog-config...
Executing restart of service swss...
Executing restart of service bgp...
Executing restart of service pmon...
Executing restart of service lldp...
Executing restart of service telemetry...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
  • Loading branch information
yozhao101 authored Mar 3, 2021
1 parent dd3c2c3 commit 4a78c01
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
6 changes: 6 additions & 0 deletions config/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -669,6 +669,9 @@ def _get_disabled_services_list(config_db):


def _stop_services():
click.echo("Disabling container monitoring ...")
clicommon.run_command("sudo monit unmonitor container_checker")

click.echo("Stopping SONiC target ...")
clicommon.run_command("sudo systemctl stop sonic.target")

Expand All @@ -692,6 +695,9 @@ def _restart_services():
click.echo("Reloading Monit configuration ...")
clicommon.run_command("sudo monit reload")

click.echo("Enabling container monitoring ...")
clicommon.run_command("sudo monit monitor container_checker")


def interface_is_in_vlan(vlan_member_table, interface_name):
""" Check if an interface is in a vlan """
Expand Down
4 changes: 3 additions & 1 deletion tests/config_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@
from utilities_common.db import Db

load_minigraph_command_output="""\
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -H -m --write-to-db
Running command: pfcwd start_default
Running command: config qos reload --no-dynamic-buffer
Restarting SONiC target ...
Reloading Monit configuration ...
Enabling container monitoring ...
Please note setting loaded from minigraph will be lost after system reboot. To preserve setting, run `config save`.
"""

Expand Down Expand Up @@ -49,7 +51,7 @@ def test_load_minigraph(self, get_cmd_module, setup_single_broadcom_asic):
traceback.print_tb(result.exc_info[2])
assert result.exit_code == 0
assert "\n".join([l.rstrip() for l in result.output.split('\n')]) == load_minigraph_command_output
assert mock_run_command.call_count == 7
assert mock_run_command.call_count == 9

@classmethod
def teardown_class(cls):
Expand Down

0 comments on commit 4a78c01

Please sign in to comment.