Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dynamic buffer size for log publication #1630

Merged
merged 5 commits into from
Mar 1, 2024

Conversation

zackattack01
Copy link
Contributor

These changes are to address the log buffering fragility issue noted here.

The adjustment logic took a few different forms based on testing - i also have a version of this that keeps a queue of latest results if anyone is interested in going down that route let me know (I ended up trimming this down to a more simple mechanism after seeing the complexity there).

I am more than happy to take suggestions on the adjustment logic - it's easy to make the backoff more dramatic if desired but I found it to be safest to keep the changes to small increments, if the network is really in a bad state the batch limit will be adjusted eventually.

complications/lessons learned

  • increasing the batch limit:
    • We publish logs in batches every minute by log type. This means that while some log types may become backlogged others can always publish successfully - causing the batch limit to re-adjust upwards too quickly. To get around this we only increase the limit if the previous batch limit was both hit and published successfully.
  • decreasing the batch limit:
    • depending on how bad the network connection is and how it is terminated, any number of errors can pop out (not just network/transport errors, sometimes just failures to parse the response) . To get around this we keep track of when the batch was started, and only reduce the limit if an error occurred after a 20 second duration

RebeccaMahany
RebeccaMahany previously approved these changes Feb 29, 2024
Copy link
Contributor

@RebeccaMahany RebeccaMahany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- this was easy to read/follow.

pkg/osquery/extension.go Show resolved Hide resolved
pkg/osquery/log_publication_state.go Outdated Show resolved Hide resolved
pkg/osquery/log_publication_state.go Show resolved Hide resolved
Copy link
Contributor

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid. Not sure if it's worth stashing the values for next startup.

currentBatchStartTime time.Time
}

func NewLogPublicationState(maxBytesPerBatch int) *logPublicationState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good catch I had meant to ask for feedback here - technically none of these need to be exported (logic is entirely self-contained in osquery). I opted to export some of these just to make it extremely clear which bits we expect to be used (NewLogPublicationState -> StartBatch -> EndBatch) vs the bits that were internal to logPublicationState. I can move those all to be internal only if people would prefer?

Comment on lines +99 to +101
if lps.currentMaxBytesPerBatch >= lps.maxBytesPerBatch {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I suspect we'll need to figure out how to grow beyond this -- some customers have it coded really small. But maybe this is fine to start, and we can make a real max value later

Copy link
Contributor

@James-Pickett James-Pickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@zackattack01 zackattack01 added this pull request to the merge queue Mar 1, 2024
Merged via the queue into main with commit 7b1b2f1 Mar 1, 2024
29 checks passed
@zackattack01 zackattack01 deleted the zack/log_buffering_backoff branch March 1, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants