Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic-agent gets stuck in Updating - How to troubleshoot and where to look for logs? #1034

Closed
Blason opened this issue Dec 28, 2021 · 6 comments

Comments

@Blason
Copy link

Blason commented Dec 28, 2021

Hi Team,

On certain windows hosts we observed that elastic-agent as soon as enroll gets stuck in updating phase. Can someone guide me where do I look for debug logs on fleet server as well as on windows host?

@mimeie
Copy link

mimeie commented Feb 6, 2022

Hi

Did it help restarting the service?
Had the same issue, after that it worked fine

@matthiasledergerber
Copy link

Observed the same issue. Restarting the Elastic-Agent Service helped to get the agent from Updating to Healthy state in Fleet. Elastic-Stack 8.4.0. We rolled out the agents in an industrial environment where bandwith is very low due to different bandwith demanding traffic so maybe there was an issue with the connection during the rollout. Peaked into the logs but didn't find anything interesting.

What would be interesting if agent rollout could be simulated in an environment with low bandwith / packet loss etc. like https://docs.vmware.com/en/VMware-Workstation-Pro/16.0/com.vmware.ws.using.doc/GUID-7BFFA8B3-C134-4801-A0AD-3DA53BBAC5CA.html

@gonzo919
Copy link

gonzo919 commented Oct 3, 2022

@mimeie I have a customer with similar issue but restarting did not work. It ended up creating 3 instances in healthy, offline and updating status:
Screenshot from 2022-10-03 11-21-39

Anything else to try? Thanks

@cmacknz
Copy link
Member

cmacknz commented Oct 3, 2022

Possible duplicate of elastic/elastic-agent#760 which we have now root caused.

@joshdover is there a link we can share for recovering agents in this state?

@joshdover
Copy link
Contributor

We don't have a public doc, but we should add this to our troubleshooting page.

The most simple workaround is to force the agent to upgrade again via API from the Dev Tools app:

POST kbn:/api/fleet/agents/<id>/upgrade
{ "version": "8.4.1", "force": true }

A slightly less disruptive change if you have many agents in this state is to manually fix the documents in the agent index:

All that we need to do, is to un-mark the Elastic Agents as upgrading, but this is a bit tricky. Just follow the below commands as superuser:

  1. Create a new service token:
    curl -XPOST --user elastic:${SUPERUSER_PASS} -H'x-elastic-product-origin:fleet' -H'content-type:application/json' "https://${ELASTICSEARCH_HOST}/_security/service/elastic/fleet-server/credential/token/fix-agents"
  2. Next we will do an update_by_query. The query finds all Elastic Agents which are marked updating and marks the update as complete. With the created token replace the ${TOKEN} below with that created token so the query can complete successfully:
    curl -XPOST -H'Authorization: Bearer ${TOKEN}' -H'x-elastic-product-origin:fleet' -H'content-type:application/json' "https://${ELASTICSEARCH_HOST}/.fleet-agents/_update_by_query" -d '{"query": {"bool": {"must": [{ "exists": { "field": "upgrade_started_at" } }],"must_not": [{ "exists": { "field": "upgraded_at" } }]}},"script": {"source": "ctx._source.upgraded_at = ctx._source.upgrade_started_at; ctx._source.upgrade_started_at = null;","lang": "painless"}}'
  3. Finally, clean up the created service token:
    curl -XDELETE --user elastic:${SUPERUSER_PASS} -H'x-elastic-product-origin:fleet' -H'content-type:application/json' "https://${ELASTICSEARCH_HOST}/_security/service/elastic/fleet-server/credential/token/fix-agents"

@jlind23
Copy link
Contributor

jlind23 commented May 27, 2024

Closing this one as outdated/not prioritised, we will reopen it later if required.
cc @ycombinator

@jlind23 jlind23 closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants