Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #19764 to 7.7: [Auditbeat] Fix up socket dataset runaway CPU usage #19783

Merged
merged 3 commits into from
Jul 9, 2020

Conversation

andrewstucki
Copy link
Contributor

@andrewstucki andrewstucki commented Jul 9, 2020

Cherry-pick of PR #19764 to 7.7 branch. Original message:

What does this PR do?

Fix for auditbeat runaway CPU usage: #19141

So, here's the explanation, basically everything was pretty much as described in the previous PR (#19033), the only additional things that I found were that:

  1. When a *socket is terminated by another socket with a different kernel tid it's moved to the closing LRU list.
  2. The new *socket is added to the state socks map with the ptr reference pointing to it
  3. The reaper comes along and hits the following code path:
	for item := s.closing.peek(); item != nil && item.Timestamp().Before(deadline); {
		if sock, ok := item.(*socket); ok {
			s.onSockTerminated(sock)
		} else {
			s.closing.get()
		}
		item = s.closing.peek()
	}
  1. The old "terminated" socket is now in a "closing" state, so onSockTerminated is called again
  2. In onSockTerminated the socket is pruned again from the socks map with the call to delete(s.socks, sock.sock)
  3. The problem is that the socks map now refers to the new *socket rather than the old one
  4. Eventually if the new *socket times out onSockDestroyed is called on it with the code that's doing the peek on the socketLRU in the reaper code
  5. That was taking a reference to the socket pointer that had been deleted from the socks map in step 5
  6. onSockDestroyed was running the following code:
	sock, found = s.socks[ptr]
	if !found {
		return nil
	}
  1. found was returning false and the function was returning
  2. Because of the call to s.socketLRU.peek() the same socket was getting returned over and over, resulting in the reaper routine getting wedged in a tight for loop (hence the high CPU usage).

The fix

Basically we pass a reference to the *socket object in the reaper's onSockDestroyed call, that way we don't have to look up the socket in s.socks and, instead handle the socket closure directly.

Related issues

* Fix up socket dataset
* Add Changelog entry

(cherry picked from commit cb4cedc)
@andrewstucki andrewstucki requested a review from a team as a code owner July 9, 2020 13:51
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 9, 2020
CHANGELOG.next.asciidoc Show resolved Hide resolved
@andrewstucki andrewstucki merged commit c6f9c26 into elastic:7.7 Jul 9, 2020
@andrewstucki andrewstucki deleted the backport_19764_7.7 branch July 9, 2020 18:59
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…unaway CPU usage (elastic#19783)

* [Auditbeat] Fix up socket dataset runaway CPU usage (elastic#19764)

* Fix up socket dataset
* Add Changelog entry

(cherry picked from commit f1ef970)

* fix up changelog

* Fix changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants