Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corefile uploader: Updates per review comments offline #3915

Merged
merged 5 commits into from
Dec 30, 2019

Conversation

renukamanavalan
Copy link
Contributor

  1. core_uploader service waits for syslog.service
  2. core_uploader service enabled for restart on failure
  3. Use mtime instead of file size + ample time to be robust.

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.
@renukamanavalan
Copy link
Contributor Author

I have not tested the code yet. Can I get your comments please, before I do my last round testing.

1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.
@renukamanavalan renukamanavalan changed the title Updates per review comments corefile uploader: Updates per review comments offline Dec 16, 2019
@renukamanavalan
Copy link
Contributor Author

Changes:

  1. The service dependency switched to syslog.service
  2. file_write_complete switch to watching st_mtime instead of size.
  3. Added some time buffer to ensure all write completes.
  4. Upon failure to upload, keep retrying every hour with an error message,

daemonname = fname.split(".")[0]
i = 0
fail_msg = ""

while i <= MAX_RETRIES:
while True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do retry in the big loop. if it fails, retry in the big loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess no. The bigger loop is either scan/wait-for-core. In either case, the next one would suffer the same fate. So I rather spew log & retry, until either I succeed or service is restarted, by someone alerted by these log messages.


[Service]
Type=simple
ExecStart=/usr/bin/core_uploader.py
StandardOutput=null
Restart=on-failure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need back off here? if core_uploader is constantly restarting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need. The only failure that crashes the service, is some fatal things like, sonic_version.yaml does not have expected attributes or some fatal system related failure (which should crash many more ...), and running out of disk space.

The one thing, I would need to take care of "running out of disk space". This I can take care of inside the script.

@renukamanavalan
Copy link
Contributor Author

vs & vsimage build failure are unrelated to this PR.

console o/p:
"
_```
Step 6/28 : RUN apt-get update && apt-get install -y curl ca-certificates gcc make libdpkg-perl ipmitool
---> Running in bf1b21cffd19
Ign:1 http://debian-archive.trafficmanager.net/debian stretch InRelease
Hit:2 http://debian-archive.trafficmanager.net/debian-security stretch/updates InRelease
Hit:3 http://debian-archive.trafficmanager.net/debian stretch-backports InRelease
Hit:4 http://debian-archive.trafficmanager.net/debian stretch Release
Reading package lists...
�[91mE: Release file for http://debian-archive.trafficmanager.net/debian/dists/stretch-backports/InRelease is expired (invalid since 12min 13s). Updates for this repository will not be applied.
�[0mThe command '/bin/sh -c apt-get update && apt-get install -y curl ca-certificates gcc make libdpkg-perl ipmitool' returned a non-zero code: 100
[ FAIL LOG END ] [ target/docker-snmp-sv2.gz ]
slave.mk:526: recipe for target 'target/docker-snmp-sv2.gz' failed

"

@renukamanavalan renukamanavalan merged commit 78db080 into sonic-net:master Dec 30, 2019
abdosi pushed a commit that referenced this pull request Jan 3, 2020
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.

* Avoid reloading already uploaded file, by marking the names with a prefix.

* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.

* Fix few bugs

* The binary update_json.py will come from sonic-utilities.
yxieca pushed a commit that referenced this pull request Jan 6, 2020
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.

* Avoid reloading already uploaded file, by marking the names with a prefix.

* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.

* Fix few bugs

* The binary update_json.py will come from sonic-utilities.
@renukamanavalan renukamanavalan deleted the core_uploader_upd branch April 17, 2022 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants