Skip to content

Commit

Permalink
[memory_checker] Add a specific log message in a case when the docker…
Browse files Browse the repository at this point in the history
… service is not running. (#16018)

#### Why I did it
To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](#11129).
There could be a scenario before the reboot, where
1. The `docker service` has stopped
2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry`

In such scenario, the `memory_checker` script will throw an error to the syslog:
```
ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'
```
But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog.

#### How I did it
Change the log severity to the warning and changed the return value.

#### How to verify it
It is really hard to catch the exact moment described in the `Why I did it` section.
In order to check the logic:
1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](https://github.com/sonic-net/sonic-buildimage/blob/47742dfc2c0d1fa27198d69c9183ddc044e11b22/files/image_config/monit/memory_checker#L139) file on the switch.
2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry`
3. Check the syslog for such messages:
```
WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte
d.', FileNotFoundError(2, 'No such file or directory'))'

INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
```
  • Loading branch information
vadymhlushko-mlnx authored and mssonicbld committed Sep 3, 2023
1 parent edc1e48 commit b7dfc5b
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions files/image_config/monit/memory_checker
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,11 @@ def get_running_container_names():
running_container_list = docker_client.containers.list(filters={"status": "running"})
running_container_names = [ container.name for container in running_container_list ]
except (docker.errors.APIError, docker.errors.DockerException) as err:
if not is_service_active("docker"):
syslog.syslog(syslog.LOG_INFO,
"[memory_checker] Docker service is not running. Error message is: '{}'".format(err))
return []

syslog.syslog(syslog.LOG_ERR,
"Failed to retrieve the running container list from docker daemon! Error message is: '{}'"
.format(err))
Expand Down

0 comments on commit b7dfc5b

Please sign in to comment.