Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdns_recursor_service_sysvinit behaves different when using Vagrant or Docker as providers #77

Closed
therobot opened this issue Jun 27, 2017 · 11 comments · Fixed by #78
Closed

Comments

@therobot
Copy link
Contributor

therobot commented Jun 27, 2017

Our internal wrapper cookbook has been failing converge after 3.3.1 release.

The reason is that the service block that stops the initial pdns-recursor service installed by the package itself behaves different in Vagrant and Docker as providers.

The following block part on pdns_recursor_service_sysvinit:

  service 'pdns-recursor' do
    supports restart: true, status: true
    action :disable
  end
  • Effectively stop pdns-recursor and disables by running update-rc.d.
  • Don't stop pdns-recursor but it disables by running update-rc.d

I have been able to reproduce this by simply running a recipe:

pdns_recursor_install 'instance' do
  action :install
end

pdns_recursor_config 'instance' do
   action :create
 end

pdns_recursor_service 'instance' do
   action [:enable, :start]
 end

And running it with Vagrant and Docker providers (using: export KITCHEN_LOCAL_YAML=.kitchen.dokken.yml for Docker).

Chef fails in Vagrant, but converges fine with Docker.

@martinisoft
Copy link
Contributor

Have you opened an upstream issue with kitchen? I have a feeling someone may have run into this.

@therobot
Copy link
Contributor Author

I have not @martinisoft since I'm not really sure where the problem comes (kitchen, kitchen-dokken or even chef).

@martinisoft
Copy link
Contributor

Can you run Chef in debug mode to show what actions it is taking versus the service setup?

@therobot
Copy link
Contributor Author

@martinisoft that's interesting. I've been able to run Chef in debug mode by attaching log_level: 'debug' into our wrapper cookbook. But both .kitchen.yml and .kitchen.dokken.yml have ignored the setting.

@martinisoft
Copy link
Contributor

that's interesting. I've been able to run Chef in debug mode by attaching log_level: 'debug' into our wrapper cookbook. But both .kitchen.yml and .kitchen.dokken.yml have ignored the setting.

There is a difference between chef debug and kitchen debug.

@therobot
Copy link
Contributor Author

@martinisoft the important bytes come when running chef debug.

This is the trace of our internal cookbook (Vagrant), Chef fails to recognize the recursor process (and therefore it decides that it does not exist so it does not stop it). I don't know what happens in Docker because the issue mentioned above

         * service[pdns-recursor] action disable[2017-06-27T16:36:03+00:00] INFO: Processing service[pdns-recursor] action disable (dnsimple-pdns::recursor line 25)
       [2017-06-27T16:36:03+00:00] DEBUG: Providers for generic service resource enabled on node include: [Chef::Provider::Service::Upstart, Chef::Provider::Service::Debian, Chef::Provider::Service::Invokercd, Chef::Provider::Service::Init]
       [2017-06-27T16:36:03+00:00] DEBUG: Provider for action disable on resource service[pdns-recursor] is Chef::Provider::Service::Debian
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] falling back to process table inspection
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] attempting to match 'pdns-recursor' (/pdns-recursor/) against process list
       UID        PID  PPID  C STIME TTY          TIME CMD
       root         1     0  0 16:22 ?        00:00:01 /sbin/init
       root         2     0  0 16:22 ?        00:00:00 [kthreadd]
       root         3     2  0 16:22 ?        00:00:00 [ksoftirqd/0]
       root         4     2  0 16:22 ?        00:00:00 [kworker/0:0]
       root         5     2  0 16:22 ?        00:00:00 [kworker/0:0H]
       root         6     2  0 16:22 ?        00:00:00 [kworker/u2:0]
       root         7     2  0 16:22 ?        00:00:00 [rcu_sched]
       root         8     2  0 16:22 ?        00:00:00 [rcuos/0]
       root         9     2  0 16:22 ?        00:00:00 [rcu_bh]
       root        10     2  0 16:22 ?        00:00:00 [rcuob/0]
       root        11     2  0 16:22 ?        00:00:00 [migration/0]
       root        12     2  0 16:22 ?        00:00:00 [watchdog/0]
       root        13     2  0 16:22 ?        00:00:00 [khelper]
       root        14     2  0 16:22 ?        00:00:00 [kdevtmpfs]
       root        15     2  0 16:22 ?        00:00:00 [netns]
       root        16     2  0 16:22 ?        00:00:00 [writeback]
       root        17     2  0 16:22 ?        00:00:00 [kintegrityd]
       root        18     2  0 16:22 ?        00:00:00 [bioset]
       root        19     2  0 16:22 ?        00:00:00 [kworker/u3:0]
       root        20     2  0 16:22 ?        00:00:00 [kblockd]
       root        21     2  0 16:22 ?        00:00:00 [ata_sff]
       root        22     2  0 16:22 ?        00:00:00 [khubd]
       root        23     2  0 16:22 ?        00:00:00 [md]
       root        24     2  0 16:22 ?        00:00:00 [devfreq_wq]
       root        25     2  0 16:22 ?        00:00:01 [kworker/0:1]
       root        27     2  0 16:22 ?        00:00:00 [khungtaskd]
       root        28     2  0 16:22 ?        00:00:00 [kswapd0]
       root        29     2  0 16:22 ?        00:00:00 [vmstat]
       root        30     2  0 16:22 ?        00:00:00 [ksmd]
       root        31     2  0 16:22 ?        00:00:00 [khugepaged]
       root        32     2  0 16:22 ?        00:00:00 [fsnotify_mark]
       root        33     2  0 16:22 ?        00:00:00 [ecryptfs-kthrea]
       root        34     2  0 16:22 ?        00:00:00 [crypto]
       root        46     2  0 16:22 ?        00:00:00 [kthrotld]
       root        47     2  0 16:22 ?        00:00:00 [kworker/u2:1]
       root        48     2  0 16:22 ?        00:00:00 [scsi_eh_0]
       root        49     2  0 16:22 ?        00:00:00 [scsi_eh_1]
       root        70     2  0 16:22 ?        00:00:00 [deferwq]
       root        71     2  0 16:22 ?        00:00:00 [charger_manager]
       root       113     2  0 16:22 ?        00:00:00 [kpsmoused]
       root       114     2  0 16:22 ?        00:00:00 [scsi_eh_2]
       root       122     2  0 16:22 ?        00:00:00 [kdmflush]
       root       123     2  0 16:22 ?        00:00:00 [bioset]
       root       125     2  0 16:22 ?        00:00:00 [kdmflush]
       root       128     2  0 16:22 ?        00:00:00 [bioset]
       root       183     2  0 16:22 ?        00:00:00 [jbd2/dm-0-8]
       root       184     2  0 16:22 ?        00:00:00 [ext4-rsv-conver]
       root       272     2  0 16:22 ?        00:00:00 [kworker/u3:1]
       root       317     1  0 16:22 ?        00:00:00 upstart-udev-bridge --daemon
       root       324     1  0 16:22 ?        00:00:00 /lib/systemd/systemd-udevd --daemon
       root       371     2  0 16:22 ?        00:00:00 [ext4-rsv-conver]
       root       401     2  0 16:22 ?        00:00:00 [rpciod]
       root       403     2  0 16:22 ?        00:00:00 [iprt-VBoxWQueue]
       root       411     2  0 16:22 ?        00:00:00 [nfsiod]
       message+   415     1  0 16:22 ?        00:00:00 dbus-daemon --system --fork
       root       424     1  0 16:22 ?        00:00:00 rpc.idmapd
       root       437     1  0 16:22 ?        00:00:00 /lib/systemd/systemd-logind
       syslog     455     1  0 16:22 ?        00:00:00 rsyslogd
       root       476     1  0 16:22 ?        00:00:00 upstart-file-bridge --daemon
       root       577     1  0 16:22 ?        00:00:00 rpcbind
       statd      613     1  0 16:22 ?        00:00:00 rpc.statd -L
       root       627     1  0 16:22 ?        00:00:00 upstart-socket-bridge --daemon
       root       726     1  0 16:22 ?        00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
       root       830     2  0 16:22 ?        00:00:00 [ttm_swap]
       root       982     1  0 16:22 tty4     00:00:00 /sbin/getty -8 38400 tty4
       root       985     1  0 16:22 tty5     00:00:00 /sbin/getty -8 38400 tty5
       root       990     1  0 16:22 tty2     00:00:00 /sbin/getty -8 38400 tty2
       root       991     1  0 16:22 tty3     00:00:00 /sbin/getty -8 38400 tty3
       root       993     1  0 16:22 tty6     00:00:00 /sbin/getty -8 38400 tty6
       root      1021     1  0 16:22 ?        00:00:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
       root      1023     1  0 16:22 ?        00:00:00 cron
       daemon    1024     1  0 16:22 ?        00:00:00 atd
       root      1134     1  0 16:22 ?        00:00:00 /usr/sbin/VBoxService --pidfile /var/run/vboxadd-service.sh
       root      1163     1  0 16:22 tty1     00:00:00 /sbin/getty -8 38400 tty1
       root      1175     2  0 16:23 ?        00:00:00 [kauditd]
       root      1620     1  0 16:23 ?        00:00:00 /usr/sbin/sshd -D
       root      2097  1620  0 16:23 ?        00:00:00 sshd: vagrant [priv]
       vagrant   2158  2097  0 16:23 ?        00:00:00 sshd: vagrant@pts/2
       vagrant   2159  2158  0 16:23 pts/2    00:00:00 -bash
       pdns     11737     1  0 16:24 ?        00:00:00 /usr/sbin/pdns_recursor --daemon=yes
       root     12103     1  0 16:33 ?        00:00:00 /lib/systemd/systemd-hostnamed
       root     12613  1620  0 16:35 ?        00:00:00 sshd: vagrant [priv]
       vagrant  12677 12613  6 16:35 ?        00:00:01 sshd: vagrant@pts/0
       vagrant  12695 12677  0 16:35 pts/0    00:00:00 sh -c  TEST_KITCHEN="1"; export TEST_KITCHEN sudo -E /opt/chef/bin/chef-client --local-mode --config /tmp/kitchen/client.rb --log_level debug --force-formatter --no-color --json-attributes /tmp/kitchen/dna.json --chef-zero-port 8889
       root     12696 12695  0 16:35 pts/0    00:00:00 sudo -E /opt/chef/bin/chef-client --local-mode --config /tmp/kitchen/client.rb --log_level debug --force-formatter --no-color --json-attributes /tmp/kitchen/dna.json --chef-zero-port 8889
       root     12697 12696  5 16:35 pts/0    00:00:01 /opt/chef/embedded/bin/ruby --disable-gems /opt/chef/bin/chef-client --local-mode --config /tmp/kitchen/client.rb --log_level debug --force-formatter --no-color --json-attributes /tmp/kitchen/dna.json --chef-zero-port 8889
       root     12703 12697 17 16:35 pts/0    00:00:03 chef-client worker: ppid=12697;start=16:35:42;
       root     13077 12703  0 16:36 ?        00:00:00 ps -ef
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] running: false
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 0, action stop, priority 85
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 1, action stop, priority 85
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 2, action start, priority 19
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 3, action start, priority 19
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 4, action start, priority 19
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 5, action start, priority 19
       [2017-06-27T16:36:03+00:00] DEBUG: service[pdns-recursor] runlevel 6, action stop, priority 85
        Removing any system startup links for /etc/init.d/pdns-recursor ...
          /etc/rc0.d/K85pdns-recursor
          /etc/rc1.d/K85pdns-recursor
          /etc/rc2.d/S19pdns-recursor
          /etc/rc3.d/S19pdns-recursor
          /etc/rc4.d/S19pdns-recursor
          /etc/rc5.d/S19pdns-recursor
          /etc/rc6.d/K85pdns-recursor
       update-rc.d: warning:  start runlevel arguments (none) do not match pdns-recursor Default-Start values (2 3 4 5)
       update-rc.d: warning:  stop runlevel arguments (2 3 4 5) do not match pdns-recursor Default-Stop values (0 1 6)
        Adding system startup for /etc/init.d/pdns-recursor ...
          /etc/rc2.d/K80pdns-recursor -> ../init.d/pdns-recursor
          /etc/rc3.d/K80pdns-recursor -> ../init.d/pdns-recursor
          /etc/rc4.d/K80pdns-recursor -> ../init.d/pdns-recursor
          /etc/rc5.d/K80pdns-recursor -> ../init.d/pdns-recursor
       [2017-06-27T16:36:03+00:00] INFO: service[pdns-recursor] disabled

           - disable service service[pdns-recursor]

@jmauro
Copy link
Contributor

jmauro commented Jun 27, 2017

Hello,

I think I had the same issue on kitchen vagrant that's why I put the guardfile:
only_if { ::File.exist?(/var/run/pdns-recursor.pid}
if I recall correctly.

@therobot
Copy link
Contributor Author

This is the trace when running kitchen converge recursor-multi-ubuntu-1404 on this cookbook:

    * service[pdns-recursor] action disable[2017-06-24T19:26:14+00:00] INFO: Processing service[pdns-recursor] action disable (/opt/kitchen/cache/cookbooks/pdns/resources/pdns_recursor_service_sysvinit.rb line 46)
[2017-06-24T19:26:14+00:00] DEBUG: Providers for generic service resource enabled on node include: [Chef::Provider::Service::Upstart, Chef::Provider::Service::Debian, Chef::Provider::Service::Invokercd, Chef::Provider::Service::Init]
[2017-06-24T19:26:14+00:00] DEBUG: Provider for action disable on resource service[pdns-recursor] is Chef::Provider::Service::Debian
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] supports status, running
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 0, action stop, priority 85
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 1, action stop, priority 85
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 2, action start, priority 19
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 3, action start, priority 19
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 4, action start, priority 19
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 5, action start, priority 19
[2017-06-24T19:26:14+00:00] DEBUG: service[pdns-recursor] runlevel 6, action stop, priority 85
[2017-06-24T19:26:15+00:00] INFO: service[pdns-recursor] disabled

      - disable service service[pdns-recursor]

@therobot
Copy link
Contributor Author

A better trace of the wrapper cookbook:

           * service[pdns-recursor] action disable[2017-06-28T08:59:38+00:00] INFO: Processing service[pdns-recursor] action disable (/tmp/kitchen/cache/cookbooks/pdns/resources/pdns_recursor_service_sysvinit.rb line 46)
       [2017-06-28T08:59:38+00:00] INFO: Processing service[pdns-recursor] action disable (/tmp/kitchen/cache/cookbooks/pdns/resources/pdns_recursor_service_sysvinit.rb line 46)
       [2017-06-28T08:59:38+00:00] DEBUG: Providers for generic service resource enabled on node include: [Chef::Provider::Service::Upstart, Chef::Provider::Service::Debian, Chef::Provider::Service::Invokercd, Chef::Provider::Service::Init]
       [2017-06-28T08:59:38+00:00] DEBUG: Providers for generic service resource enabled on node include: [Chef::Provider::Service::Upstart, Chef::Provider::Service::Debian, Chef::Provider::Service::Invokercd, Chef::Provider::Service::Init]
       [2017-06-28T08:59:38+00:00] DEBUG: Provider for action disable on resource service[pdns-recursor] is Chef::Provider::Service::Debian
       [2017-06-28T08:59:38+00:00] DEBUG: Provider for action disable on resource service[pdns-recursor] is Chef::Provider::Service::Debian
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] supports status, running
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] supports status, running
        * pdns_recursor is running
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] is running
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] is running
        Removing any system startup links for /etc/init.d/pdns-recursor ...
          /etc/rc0.d/K85pdns-recursor
          /etc/rc1.d/K85pdns-recursor
          /etc/rc2.d/S19pdns-recursor
          /etc/rc3.d/S19pdns-recursor
          /etc/rc4.d/S19pdns-recursor
          /etc/rc5.d/S19pdns-recursor
          /etc/rc6.d/K85pdns-recursor
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 0, action stop, priority 85
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 0, action stop, priority 85
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 1, action stop, priority 85
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 1, action stop, priority 85
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 2, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 2, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 3, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 3, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 4, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 4, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 5, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 5, action start, priority 19
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 6, action stop, priority 85
       [2017-06-28T08:59:38+00:00] DEBUG: service[pdns-recursor] runlevel 6, action stop, priority 85
        Removing any system startup links for /etc/init.d/pdns-recursor ...
          /etc/rc0.d/K85pdns-recursor
          /etc/rc1.d/K85pdns-recursor
          /etc/rc2.d/S19pdns-recursor
          /etc/rc3.d/S19pdns-recursor
          /etc/rc4.d/S19pdns-recursor
          /etc/rc5.d/S19pdns-recursor
          /etc/rc6.d/K85pdns-recursor
       update-rc.d: warning:  start runlevel arguments (none) do not match pdns-recursor Default-Start values (2 3 4 5)
       update-rc.d: warning:  stop runlevel arguments (2 3 4 5) do not match pdns-recursor Default-Stop values (0 1 6)
        Adding system startup for /etc/init.d/pdns-recursor ...
          /etc/rc2.d/K80pdns-recursor -> ../init.d/pdns-recursor
          /etc/rc3.d/K80pdns-recursor -> ../init.d/pdns-recursor
          /etc/rc4.d/K80pdns-recursor -> ../init.d/pdns-recursor
          /etc/rc5.d/K80pdns-recursor -> ../init.d/pdns-recursor
       [2017-06-28T08:59:38+00:00] INFO: service[pdns-recursor] disabled
       [2017-06-28T08:59:38+00:00] INFO: service[pdns-recursor] disabled

             - disable service service[pdns-recursor]

@therobot
Copy link
Contributor Author

therobot commented Jun 28, 2017

With this round of findings, I have the issue narrowed down and a solution.

Behaviour in:

Docker

  • The default pdns-recursor is not started after it's installed.
  • Chef disables pdns-recursor and it's not necessary to stop it since it was never started in the beginning.

Vagrant

  • The default pdns-recursor is started after it's installed.
  • Chef acknowledges that is running but the commands behind :disable (update-rc.d) don't stop the process, they just disable the service.

Conclusion

The erratic behaviour in docker (and in CentOS, see addendum below) mislead me on thinking that :disable also stopped the service, but it does not.

Solution

Run a :stop command on the default pdns-recursor installed.

Addendum

There is an additional problem on CentOS 6.7 that surfaces only when we try to stop the default pdns-recursor on top on disabling it (see code example below). This which is the simplest and ideal solution to the issue will break tests on CentOS 6.7. This piece of code stops one of the recursors created by recursor_install_multi.rb recipe. I will explain how this happens.

  service 'pdns-recursor' do
    supports restart: true, status: true
    action [:stop, :disable]
  end

When running ce kitchen verify recursor-multi-centos-67 this causes the following sequence of actions.

  • Chef stops the default pdns-recursor (it's not started at the beginning ) as part of the block pdns_recursor_service 'server_01' do (code above).
  • Chef starts pdns-recursor_server_01 instance as part of the block pdns_recursor_service 'server_01' do
  • Chef stops pdns-recursor_server_01 as part of the block pdns_recursor_service 'server_02' do. (BUG! from the code above, the init script fails to detect that pdns-recursor_server_01 is not pdns-recursor).
  • Chef starts pdns-recursor_server_02 as part of the block pdns_recursor_service 'server_02' do

Why this happens?

  • When the block pdns_recursor_service 'server_02' do is executed Chef calls the recursor init script to see detect if there is any instance of pdns-recursor running by calling /etc/init.d/pdns-recursor status.
  • The init script resorts to call __pids_pidof pdns_recursor of /etc/rc.d/init.d/functions in order to return the pidfile to Chef. When called this function returns a list of PIDs it detect (something like: pidof -c -m -o 4686 -o 4685 -o %PPID -x pdns_recursor).
  • Chef then receives a this list of PIDs that don't match (Chef asked for pdns-recursor, and got the pids for pdns-recursor_server_01 and pdns-recursor_server_02)
  • Chef asumes that the PID it got is from pdns-recursor so it stops it.
  • Chef run completes with only pdns-recursor_server_02 running
  • Tests fail since it expects two recursors running and there is only

 Final solution

Stopping the recursor only on debian based distros:

  service 'pdns-recursor' do
    supports restart: true, status: true
    action :stop
    only_if { node['platform_family'] == 'debian' }
  end

  service 'pdns-recursor' do
    supports restart: true, status: true
    action :disable
  end

Notes

The mix of different behaviours between Centos6.7 (addressed first in 3.3.2 release), Vagrant and Docker has been confused me for the time being and it has not being until a couple of heavy debugging sessions that I've got all the pieces together.

therobot added a commit that referenced this issue Jun 28, 2017
@therobot
Copy link
Contributor Author

therobot commented Jun 28, 2017

@jmauro the problem with your solution is that it did not work well on Ubuntu. This workaround makes both CentOS and Ubuntu happy. Also works in both Docker and Vagrant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants