Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elastic-agent: Windows failing to enroll due to CreateFile error with temporary files and permissions #27836

Closed
adam-stokes opened this issue Sep 9, 2021 · 11 comments · Fixed by #27846
Assignees
Labels
bug Team:Elastic-Agent Label for the Agent team v7.16.0

Comments

@adam-stokes
Copy link

adam-stokes commented Sep 9, 2021

This was found using the e2e-testing repo automation scenarios, nice.

  • Version: 8.0.0-SNAPSHOT
  • Operating System: Windows 2019

Pipeline failure: https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp/detail/PR-1539/13/pipeline

Error:

[2021-09-08T20:48:19.050Z] time="2021-09-08T20:48:18Z" level=error msg="Error executing command" args="[install -e -v --force --insecure --enrollment-token=cDdNcHgzc0JvSUxob0swY3M1VWU6WFRxUERmdjVSR3lHMk54ODF5S2xaZw== --url http://10.224.1.88:8220]" baseDir=. command="C:\\elastic-agent\\elastic-agent.exe" error="exit status 1" stderr="2021-09-08T20:47:35.591Z\tINFO\t[composable.providers.docker]\tdocker/docker.go:43\tDocker provider skipped, unable to connect: protocol not available\n2021-09-08T20:47:35.693Z\tINFO\tcapabilities/capabilities.go:59\tcapabilities file not found in C:\\elastic-agent\\capabilities.yml\n2021-09-08T20:47:40.849Z\tWARN\t[tls]\ttlscommon/tls_config.go:98\tSSL/TLS verifications disabled.\n2021-09-08T20:47:41.806Z\tINFO\tcmd/enroll_cmd.go:432\tStarting enrollment to URL: http://10.224.1.88:8220/\nError: failed to fix permissions: CreateFile C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-20be50\\install\\tmp122602367: The system cannot find the file specified.\nFor help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.0/fleet-troubleshooting.html\n2021-09-08T20:48:17.979Z\tINFO\t[composable.providers.docker]\tdocker/docker.go:43\tDocker provider skipped, unable to connect: protocol not available\n2021-09-08T20:48:18.081Z\tINFO\tcapabilities/capabilities.go:59\tcapabilities file not found in C:\\elastic-agent\\capabilities.yml\n2021-09-08T20:48:18.564Z\tINFO\t[composable.providers.docker]\tdocker/docker.go:43\tDocker provider skipped, unable to connect: protocol not available\n2021-09-08T20:48:18.666Z\tINFO\tcapabilities/capabilities.go:59\tcapabilities file not found in C:\\elastic-agent\\capabilities.yml\nError: enroll command failed with exit code: 1\nFor help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.0/fleet-troubleshooting.html\n"

Specific Error:

Error: failed to fix permissions: CreateFile C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-20be50\\install\\tmp122602367: The system cannot find the file specified

This is happening pretty consistently now on fresh Windows VM each time

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 9, 2021
@EricDavisX EricDavisX added the Team:Elastic-Agent Label for the Agent team label Sep 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 9, 2021
@ruflin ruflin added the bug label Sep 9, 2021
@EricDavisX
Copy link
Contributor

Michal's first suggestion for review to Michel via slack was:
...can you modify 'fixPermissions' to ignore NotExist error?

@blakerouse
Copy link
Contributor

This seems weird, being that others are not seeing this currently with other Windows installations. Does this path exist C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-20be50\\install?

@EricDavisX
Copy link
Contributor

@adam-stokes hi, do you have ability to spin up a host and check that path?

@adam-stokes
Copy link
Author

adam-stokes commented Sep 10, 2021

This is the output from the file listings:

PS C:\Program Files\Elastic\Agent> ls .\data\


    Directory: C:\Program Files\Elastic\Agent\data

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            9/9/2021  2:43 PM                elastic-agent-20be50
-a---            9/9/2021  2:42 PM              0 agent.lock

PS C:\Program Files\Elastic\Agent> ls .\data\elastic-agent-20be50\


    Directory: C:\Program Files\Elastic\Agent\data\elastic-agent-20be50

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            9/9/2021  2:42 PM                downloads
d----            9/9/2021  2:43 PM                install
d----            9/9/2021  2:42 PM                logs
d----            9/9/2021  2:43 PM                run
-a---            9/9/2021  2:42 PM       42204656 elastic-agent.exe

PS C:\Program Files\Elastic\Agent> ls .\data\elastic-agent-20be50\install\


    Directory: C:\Program Files\Elastic\Agent\data\elastic-agent-20be50\install

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            9/9/2021  2:42 PM                filebeat-8.0.0-SNAPSHOT-windows-x86_64
d----            9/9/2021  2:42 PM                metricbeat-8.0.0-SNAPSHOT-windows-x86_64

I'm not sure where the tmp file is being created from or at what point does it go out of scope and clean itself up? The only thing I can think of is when we are trying to fix permissions we do a filepath.Walk within that directory and the tmp file is there, but by the time we run the acl changes that file has since disappeared. I think the fix that @michel-laterman is proposing is probably good overall to have in place since we really can't guarantee the life of a tmp file, wdyt?

@adam-stokes
Copy link
Author

Reported by @skearns64 as also affecting: 7.15.0 BC5 on Windows 10

@michel-laterman
Copy link
Contributor

I'll merge and backport to 7.15 and 7.16

@michel-laterman
Copy link
Contributor

@adam-stokes, @skearns64 the fix is now a part of 8.0, 7.16, and 7.15. Please reopen the issue (and ping me!) if it occurs again

@EricDavisX
Copy link
Contributor

I believe this scenario is reproducible via PR runs of the e2e-testing scenario. recent run:
https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp%2Fmaster/detail/master/1580/pipeline/

@adam-stokes
Copy link
Author

adam-stokes commented Sep 28, 2021

This is a different error that I am tracking down:

[2021-09-28T06:23:32.430Z] time="2021-09-28T06:23:31Z" level=error msg="Error executing command" args="[install -e -v --force --insecure --enrollment-token=WS1JUUszd0J0TzRsX0ppXzBMVXE6N3JGVjFGOUxSd2lHVktfdG84T3ZjUQ== --url http://10.224.0.24:8220]" baseDir=. command="C:\\elastic-agent\\elastic-agent.exe" error="exit status 1" stderr="2021-09-28T06:23:27.512Z\tINFO\t[composable.providers.docker]\tdocker/docker.go:43\tDocker provider skipped, unable to connect: protocol not available\n2021-09-28T06:23:27.612Z\tINFO\tcapabilities/capabilities.go:59\tcapabilities file not found in C:\\elastic-agent\\capabilities.yml\n2021-09-28T06:23:30.035Z\tWARN\t[tls]\ttlscommon/tls_config.go:98\tSSL/TLS verifications disabled.\n2021-09-28T06:23:30.178Z\tINFO\tcmd/enroll_cmd.go:432\tStarting enrollment to URL: http://10.224.0.24:8220/\nError: fail to enroll: fail to execute request to fleet-server: status code: 400, fleet-server returned an error: BadRequest\nFor help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.0/fleet-troubleshooting.html\n2021-09-28T06:23:31.126Z\tINFO\t[composable.providers.docker]\tdocker/docker.go:43\tDocker provider skipped, unable to connect: protocol not available\n2021-09-28T06:23:31.226Z\tINFO\tcapabilities/capabilities.go:59\tcapabilities file not found in C:\\elastic-agent\\capabilities.yml\n2021-09-28T06:23:31.475Z\tINFO\t[composable.providers.docker]\tdocker/docker.go:43\tDocker provider skipped, unable to connect: protocol not available\n2021-09-28T06:23:31.578Z\tINFO\tcapabilities/capabilities.go:59\tcapabilities file not found in C:\\elastic-agent\\capabilities.yml\nError: enroll command failed with exit code: 1\nFor help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.0/fleet-troubleshooting.html\n"

We can re-close this issue as the fix for the reported error in the issue was resolved

@EricDavisX
Copy link
Contributor

Adam is right. let's re-close - sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Elastic-Agent Label for the Agent team v7.16.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants