Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul-template sometimes does not handle external signals, probably due to abundance of SIGUSR internal signals #1548

Closed
jema-ta opened this issue Mar 3, 2022 · 3 comments · Fixed by #1575
Labels
Milestone

Comments

@jema-ta
Copy link

jema-ta commented Mar 3, 2022

Configuration

We use default configuration for the consul-template itself, the templates doesn't seem to be relevant and not needed details

Command

	exec \
		/usr/bin/timeout -s SIGTERM 120 \
		/usr/local/consul-template/bin/consul-template \
			-log-level info \
			-kill-signal SIGTERM \
			-reload-signal SIGHUP \
			-vault-renew-token=false \
			-max-stale 10m \
			-config=/etc/consul-template/configs \
			-config=/etc/consul-template/templates.hcl

Expected behavior

The timeout command's SIGTERM signal should always cause the consul-template to terminate.

Actual behavior

After running for sometime, intermittently, the consul-template doesn't get terminated upon timeout command's SIGTERM signal, thus it keeps running until something else happens, but not as expected on the given timeout.

Debug output

According to my investigation and debug, what we concluded is that the SIGTERM signal from the timeout command comes at exact really short time as a SIGUSR signal, internally produced by go and the SIGTERM signal is not handled, due to the signal channel buffer set to only 1:

signalCh: make(chan os.Signal, 1),

The following logs are from strace of the timeout command running the consul-template and bellow is the log from consul-template itself at exactly the same 1/1000 of second - 12:29:49.789.
Just after that it shows that it had received SIGCONT - which is sent from the timeout command after the SIGTERM signal - thus the SIGTERM has not been handled by the consul-template process:

[pid 77569] 12:29:49.789258 <... wait4 resumed> 0x7ffc3950d3bc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 77569] 12:29:49.789440 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 77640] 12:29:49.789794 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=77569, si_uid=0} ---
[pid 77569] 12:29:49.789854 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=77569, si_uid=0} ---
[pid 77594] 12:29:49.791460 --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=77569, si_uid=0} ---
[pid 77569] 12:29:49.792050 --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=77569, si_uid=0} ---


2022-03-03 12:29:49.789211 2022-03-03T12:29:49.789Z [INFO] (cli) receiving signal "urgent I/O condition"
2022-03-03 12:29:49.789539 2022-03-03T12:29:49.789Z [INFO] (cli) receiving signal "urgent I/O condition"
2022-03-03 12:29:49.790111 2022-03-03T12:29:49.790Z [INFO] (cli) receiving signal "urgent I/O condition"
2022-03-03 12:29:49.792238 2022-03-03T12:29:49.792Z [INFO] (cli) receiving signal "continued"
2022-03-03 12:29:49.792413 2022-03-03T12:29:49.792Z [INFO] (cli) receiving signal "urgent I/O condition"
2022-03-03 12:29:49.792575 2022-03-03T12:29:49.792Z [INFO] (cli) receiving signal "urgent I/O condition"

Here I am running just slightly modified version - just changed logging level - of the latest consul-template in order to get only the needed logs:

$ consul-template -v
consul-template v0.27.2 (b26e2c6c)

Steps to reproduce

  1. Run the consul-template as provided above with the timeout command - doesn't matter of the configuration - it just needs to do something, so the internal SIGUSR sent by the go routines is being sent to the consul-template handler
  2. The above should be run in supervisor of your choosing so the process is started everytime it terminates
  3. Monitor that after some point the consul-template is never terminated by the timeout command

References

There has been a complaint regarding the abundance of SIGUSR produced logs and an implementation has been provided to ignore them - but as far as I understand it - the implementation, doesn't account for the fact that the SIGUSR signals are still received by the go channel, which handles them, with buffer set to 1 - which seems, as described seems too small for the amount of signals received due to internal SIGUSR signals.

@eikenb eikenb added the bug label Mar 3, 2022
@jema-ta
Copy link
Author

jema-ta commented Mar 7, 2022

Just to add an update - we tried with the obvious fix of increasing the channel buffer - at first to 10, but the issue appeared again.
Increasing it to 100, seems to be solving the issue on our end - just to add our findings for completeness

--- cli.go
+++ cli.go
@@ -55,7 +55,7 @@ func NewCLI(out, err io.Writer) *CLI {
 	return &CLI{
 		outStream: out,
 		errStream: err,
-		signalCh:  make(chan os.Signal, 1),
+		signalCh:  make(chan os.Signal, 100),
 		stopCh:    make(chan struct{}),
 	}
 }

@eikenb eikenb added this to the v0.29.0 milestone Apr 13, 2022
@eikenb
Copy link
Contributor

eikenb commented Apr 13, 2022

I'm just going to bump the channel size to 100 as you have done. Makes almost no difference to consul-template except a small bit of added memory and should prevent this from being an issue for anyone in the future.

Thanks for the report and followup!

@eikenb
Copy link
Contributor

eikenb commented Apr 13, 2022

Hmm.. looking at the signal/ code we already have handy lists of the signals we care about. I'm going to look quickly into the idea of passing the list to Notify (limiting what we receive) instead of receiving everything and filtering them out locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants