Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thriftbp: Add a counter for opener calls in client pool #594

Merged
merged 1 commit into from
Jan 23, 2023

Conversation

fishy
Copy link
Member

@fishy fishy commented Jan 20, 2023

We did some work previously to try to move dns out of hot path as much as possible in thrift client pool, by doing background refreshes, etc.. But there are always a (hopefully small) portion of client calls would still do dns on hot path, and we lack observability into how often that actually happens.

To get the precise data would require some breaking changes to clientpool package (to pass in required prometheus labels), so adding a counter to the opener call instead, which is the next best thing, as in we only need to ignore the spikes when a new pod comes up and need to fill in the initial clients when initialize a client pool.

We did some work previously to try to move dns out of hot path as much
as possible in thrift client pool, by doing background refreshes, etc..
But there are always a (hopefully small) portion of client calls would
still do dns on hot path, and we lack observability into how often that
actually happens.

To get the precise data would require some breaking changes to
clientpool package (to pass in required prometheus labels), so adding a
counter to the opener call instead, which is the next best thing, as in
we only need to ignore the spikes when a new pod comes up and need to
fill in the initial clients when initialize a client pool.
@fishy fishy requested a review from a team as a code owner January 20, 2023 22:41
@fishy fishy requested review from kylelemons, konradreiche and mterwill and removed request for a team January 20, 2023 22:41
Copy link
Contributor

@kylelemons kylelemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know this ignores case 1? do you have a way to ignore case 1 programmatically or is it more of a "we'll ignore any spikes we see during a rollout" or more "we'll use the prometheus rate bug (ignoring anything before the first sample) in our favor"?

@fishy
Copy link
Member Author

fishy commented Jan 20, 2023

How do we know this ignores case 1? do you have a way to ignore case 1 programmatically or is it more of a "we'll ignore any spikes we see during a rollout" or more "we'll use the prometheus rate bug (ignoring anything before the first sample) in our favor"?

Oh I didn't even realize the first scrape thing can be used in our favor in this case :)

What I meant is basically a caveat on how to read/interpret the data it presents (e.g. if you see a spike, double check whether you have a new pod started around that time)

@fishy fishy merged commit 31ad3c4 into reddit:master Jan 23, 2023
@fishy fishy deleted the thriftbp-opener-counter branch January 23, 2023 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants