Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use Redis ConnectionManager to reconnect on failures #34

Merged
merged 4 commits into from
Jun 21, 2024

Conversation

sbernauer
Copy link
Member

@sbernauer sbernauer commented Jun 21, 2024

Previously trino-lb would not recover from a closed connection

2024-06-21T08:00:17.924954Z  INFO Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Updated query counters from 1 remote clusters
2024-06-21T08:00:22.720512Z ERROR Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Failed to get last QueryCountFetcher update timestamp err=RedisError { source: GetLastQueryCountFetcherUpdate { source: broken pipe } }
2024-06-21T08:00:27.719737Z ERROR Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Failed to get last QueryCountFetcher update timestamp err=RedisError { source: GetLastQueryCountFetcherUpdate { source: broken pipe } }
2024-06-21T08:00:32.720312Z ERROR Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Failed to get last QueryCountFetcher update timestamp err=RedisError { source: GetLastQueryCountFetcherUpdate { source: broken pipe } }

Now the first call to redis will fail, but the following calls will work again

2024-06-21T07:58:57.264989Z  INFO Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Updated query counters from 1 remote clusters
2024-06-21T07:59:02.085295Z ERROR Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Failed to get last QueryCountFetcher update timestamp err=RedisError { source: GetLastQueryCountFetcherUpdate { source: broken pipe } }
2024-06-21T07:59:07.086128Z  INFO Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Updated query counters from 0 remote clusters
2024-06-21T07:59:12.085915Z  INFO Fetching current query counters: trino_lb::maintenance::query_count_fetcher: QueryCountFetcher: Updated query counters from 0 remote clusters

@sbernauer sbernauer changed the title fix: Use redis ConnectionManager fix: Use Redis ConnectionManager to re-connect on broken pipe Jun 21, 2024
@sbernauer sbernauer changed the title fix: Use Redis ConnectionManager to re-connect on broken pipe fix: Use Redis ConnectionManager to re-connect on failures Jun 21, 2024
@sbernauer sbernauer self-assigned this Jun 21, 2024
@Techassi
Copy link
Member

How many times does the connection manager try to reconnect? Is there an exponential timeout? Are these parameters configurable?

Because these reconnect attempts can cause performance issues on the Redis server side which could lead to DoS.

@maltesander maltesander self-requested a review June 21, 2024 10:14
@sbernauer
Copy link
Member Author

Seems like they are using https://docs.rs/tokio-retry/latest/tokio_retry/ for that. They have some default values in https://docs.rs/redis/latest/src/redis/aio/connection_manager.rs.html#101-109, which I guess should be a good starting point

maltesander
maltesander previously approved these changes Jun 21, 2024
Copy link
Member

@maltesander maltesander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if sascha is happy :)

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
@Techassi
Copy link
Member

Seems like they are using https://docs.rs/tokio-retry/latest/tokio_retry/ for that. They have some default values in https://docs.rs/redis/latest/src/redis/aio/connection_manager.rs.html#101-109, which I guess should be a good starting point

Thanks! That looks reasonable.

Copy link
Member

@maltesander maltesander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@Techassi Techassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question (non-blocking).

CHANGELOG.md Show resolved Hide resolved
@sbernauer sbernauer changed the title fix: Use Redis ConnectionManager to re-connect on failures fix: Use Redis ConnectionManager to reconnect on failures Jun 21, 2024
@sbernauer sbernauer added this pull request to the merge queue Jun 21, 2024
Merged via the queue into main with commit 74560bc Jun 21, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants