[Agent] Elastic Agent Unenroll Action Leaves Dormant Service and Process Running #24568

EricDavisX · 2021-03-16T17:56:32Z

Describe the bug
I installed a 7.12.0 BC4 Agent onto Windows 10 using the provided command line. It installed itself and the Endpoint.

When I unenrolled the Agent via Security App, it uninstalled Endpoint but left the Agent process running and the service registered:

Process, service, and files are still present:

C:\Users\user\Desktop\elastic-agent-7.12.0-windows-x86_64>dir /b "C:\Program Files\Elastic"
Agent

C:\Users\user\Desktop\elastic-agent-7.12.0-windows-x86_64>sc query "Elastic Agent"

SERVICE_NAME: Elastic Agent
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

C:\Users\user\Desktop\elastic-agent-7.12.0-windows-x86_64>tasklist | grep elastic
elastic-agent.exe             8488 Services                   0     11,312 K

This is documented behavior, but I believe it goes against user expectations. Why would a user want a zombie Agent left running on their endpoint after explicitly unenrolling it?

If we change this to behavior, we may want to update the Security App verbiage to "Uninstall Endpoint".

Desktop (please complete the following information):

OS: Windows 10 1903
Kibana Version: 7.12.0 BC4
Endpoint Version: 7.12.0 BC4
Agent Version: 7.12.0 BC4

And thank you to @gabriellandau for logging this initially - I'll close the private issue in the security team repo in favor of this (cannot transfer from private to public repo).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-16T17:56:34Z

Pinging @elastic/agent (Team:Agent)

ph · 2021-03-16T19:26:35Z

This look like something we need to investigate from our users @mostlyjason, as engineer we considered we shouldn't mess around with services after the unenrollment was done and we would expect user to manually uninstall the elastic agent on the machine. Can we clarify our users expectation here?

mostlyjason · 2021-03-17T11:15:46Z

Good point! I don't think its a bug because it is performing the requested action, which is to unenroll the agent. However, I agree its a little confusing because what is the point of running without an agent policy? What about disabling the services when the agent does not have an agent policy? That should stop the processes without uninstalling them.

I think we expect the user to uninstall the agent after unenrolling it. Perhaps we can optimize this in the future to happen in one step instead of two.

That said, I think we should prioritize getting agents enrolled, and come back to optimizing the unenrolling/disabling behavior later. I'd say we should track this item on the backlog for now.

ph · 2021-03-17T12:24:42Z

@mostlyjason thanks for input, +1 to add to the backlog.

botelastic · 2022-03-17T12:52:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gabriellandau · 2022-03-17T15:38:43Z

what is the point of running without an agent policy?

++

michalpristas · 2022-03-23T17:02:04Z

@gabriellandau the point/idea was that if we exit agent which is managed by service manager (e.g systemd) it will get restarted and then we end up in a restart loop because without policy we dont need to run.
by having a 'zombie' agent we don't use any resources (or not as much) as we close any handles agent holds, agent appear running in service manager and can be enrolled later again (which after restart, which is part of the enrollment, will act as usual)
the other solution would be to handle every service manager

gabriellandau · 2022-03-23T17:19:26Z

Thanks for the response @michalpristas

if we exit agent

I am suggesting that we uninstall Agent, not exit it. Rename Unenroll Agent to Uninstall Agent (or present them side-by-side) and run the same code that executes when the user logs into the machine and runs "C:\Program Files\Elastic\Agent\elastic-agent.exe" uninstall. This removes it from the Windows service manager.

zez3 · 2022-03-23T19:43:24Z

This sounds very familiar to the issue that I had in
elastic/elastic-agent#127
Which hopefully will be fixed in V2
elastic/elastic-agent#189

zez3 · 2022-10-06T08:46:18Z

Running 8.4.2 now and the issue still persists

ps aux | grep 'Z'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 445286 0.0 0.0 0 0 ? Zs Jun17 0:00 [elastic-agent]
root 445936 0.0 0.0 0 0 ? Zs Jun17 0:00 [elastic-agent]
root 507152 0.0 0.0 0 0 ? Zs Jun20 0:00 [elastic-agent]
root 507182 0.0 0.0 0 0 ? Zs Jun20 0:00 [elastic-agent]
root 886183 0.0 0.0 0 0 ? Zs Oct04 0:00 [elastic-agent]
root 886213 0.0 0.0 0 0 ? Zs Oct04 0:00 [elastic-agent]
root 1416116 0.0 0.0 0 0 ? Zs Jul11 0:00 [elastic-agent]
root 2086332 0.0 0.0 0 0 ? Zs Jul20 0:00 [elastic-agent]
root 2086362 0.0 0.0 0 0 ? Zs Jul20 0:00 [elastic-agent]
root 3302312 0.0 0.0 0 0 ? Zs Aug25 0:00 [elastic-agent]
root 3302346 0.0 0.0 0 0 ? Zs Aug25 0:00 [elastic-agent]
root 3999249 0.0 0.0 0 0 ? Zs Sep05 0:00 [elastic-agent]
root 3999278 0.0 0.0 0 0 ? Zs Sep05 0:00 [elastic-agent]

zez3 · 2022-10-06T09:17:32Z

All of my linux hosts where my Agents are running present the same behavior

gabriellandau · 2023-05-26T16:41:18Z

Bump @ph @michalpristas @cmacknz @nimarezainia @bjmcnic

This over-two-year-old confusing behavior has now resulted in at least one user with zombie Endpoints and Agents installed. The user still has access to Kibana, but has no way to uninstall these Agents and Endpoints. This can happen, for example, when IR / MDR contracts end.

the other solution would be to handle every service manager

We already do this for local uninstalls (e.g. "C:\Program Files\Elastic\Agent\elastic-agent.exe" uninstall). The code already exists in Agent. It's the opposite of install.

We're willing to add a service when users install Agent locally.
We're willing to add a service when users enable the Endpoint Security integration remotely.
We're willing to remove a service when users disable the Endpoint Security integration remotely.
We're willing to remove a service when users run "C:\Program Files\Elastic\Agent\elastic-agent.exe" uninstall locally.

Why are we requiring them to shell into each machine to uninstall agent?

cmacknz · 2023-05-29T15:18:35Z

Why are we requiring them to shell into each machine to uninstall agent?

I don't think there's a good reason for this, considering that we already support the uninstall command that does remove the agent correctly on every supported platform.

I'll leave it to @nimarezainia to prioritize this from the product side, although possibly improvements here could be tied in with the ongoing tamper protection work since we are already changing the uninstallation flow there. CC @roxana-gheorghe

blakerouse · 2023-06-05T13:50:58Z

Support for full removal will be based on the installation type. We should really only support full uninstall doing unenrollment if the elastic-agent is installed using the elastic-agent install. Otherwise we would be uninstalling DEB's and RPM's which we should not be doing.

zez3 · 2023-11-20T13:02:54Z

still happening on the latest 8.11.1

cmacknz · 2023-11-21T19:38:57Z

The zombie processes after a restart are now tracked in elastic/elastic-agent#2190 (comment)

gabriellandau · 2023-11-21T20:58:22Z

@cmacknz perhaps the title of this issue was a misnomer. I just renamed it.

cmacknz · 2023-11-22T15:32:54Z

Ah yes I just pattern matched to the word zombie. Thanks that clarifies things, different issue and yes still a problem.

EricDavisX added the Team:Elastic-Agent Label for the Agent team label Mar 16, 2021

ph added the enhancement label Mar 17, 2021

botelastic bot added the Stalled label Mar 17, 2022

botelastic bot removed the Stalled label Mar 17, 2022

gabriellandau changed the title ~~[Agent] Elastic Agent Unenroll Action Leaves Zombie Service and Process~~ [Agent] Elastic Agent Unenroll Action Leaves Dormant Service and Process Running Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Agent] Elastic Agent Unenroll Action Leaves Dormant Service and Process Running #24568

[Agent] Elastic Agent Unenroll Action Leaves Dormant Service and Process Running #24568

EricDavisX commented Mar 16, 2021

elasticmachine commented Mar 16, 2021

ph commented Mar 16, 2021

mostlyjason commented Mar 17, 2021

ph commented Mar 17, 2021

botelastic bot commented Mar 17, 2022

gabriellandau commented Mar 17, 2022

michalpristas commented Mar 23, 2022

gabriellandau commented Mar 23, 2022

zez3 commented Mar 23, 2022

zez3 commented Oct 6, 2022

zez3 commented Oct 6, 2022

gabriellandau commented May 26, 2023

cmacknz commented May 29, 2023

blakerouse commented Jun 5, 2023

zez3 commented Nov 20, 2023

cmacknz commented Nov 21, 2023

gabriellandau commented Nov 21, 2023 •

edited

Loading

cmacknz commented Nov 22, 2023

[Agent] Elastic Agent Unenroll Action Leaves Dormant Service and Process Running #24568

[Agent] Elastic Agent Unenroll Action Leaves Dormant Service and Process Running #24568

Comments

EricDavisX commented Mar 16, 2021

elasticmachine commented Mar 16, 2021

ph commented Mar 16, 2021

mostlyjason commented Mar 17, 2021

ph commented Mar 17, 2021

botelastic bot commented Mar 17, 2022

gabriellandau commented Mar 17, 2022

michalpristas commented Mar 23, 2022

gabriellandau commented Mar 23, 2022

zez3 commented Mar 23, 2022

zez3 commented Oct 6, 2022

zez3 commented Oct 6, 2022

gabriellandau commented May 26, 2023

cmacknz commented May 29, 2023

blakerouse commented Jun 5, 2023

zez3 commented Nov 20, 2023

cmacknz commented Nov 21, 2023

gabriellandau commented Nov 21, 2023 • edited Loading

cmacknz commented Nov 22, 2023

gabriellandau commented Nov 21, 2023 •

edited

Loading