Automatic reconnect clarification #1210

Arttii · 2024-02-29T15:39:50Z

Describe the bug

In our app we use a shared instance of the stdLib connection, and today we had an instance of the connection seemingly not being alive any more. Queries were retried, but nothing was happening, which leads me to believe that maybe the connection was not up anymore?

Restarting the app, resolved the issue.

Expected behaviour

There are mentions of automatic reconnects in the lib, but was unable to find any docs on when and how this happens. I guess it should reconnect in this case?

Configuration

Environment

Client version: v2.3.0
Language version: GO
OS: Linux
Interface: database/sql compatible driver

ClickHouse server

ClickHouse Server version: Clickhouse Cloud latest

jkaflik · 2024-02-29T18:35:20Z

@Arttii Hi! Is there any reason to use such an old version?

Arttii · 2024-02-29T18:59:57Z

@jkaflik That is a good point. We are planning to upgrade in the next few days. Do you think that could impact this?

Arttii · 2024-03-03T15:49:47Z

Upgrading did not really help. Same behavior on the app being inactive for a while

jkaflik · 2024-03-04T08:06:46Z

@Arttii could you clarify this?

Queries were retried, but nothing was happening

what does it mean? The query was blocked and no response came from the server. Is this TCP or HTTP protocol?
Can you try enabling debug logs? https://github.com/ClickHouse/clickhouse-go?tab=readme-ov-file#dsn (see debug)

Arttii · 2024-03-04T16:00:41Z

We are using the TCP protocol, and yes, it seems no response came, and the query was blocked. The connection is concurrency safe, right? We have a pool being reused by several go routines at the same time.

I will enable the debug logs and report back.

jkaflik · 2024-03-04T18:13:59Z

@Arttii connection pool is safe, but the connection itself is not thread-safe.

Arttii · 2024-03-04T21:08:28Z

I am basically doing this:

opts, err := clickhouse.ParseDSN(cfg.DataDBUrl)
if err != nil {

	return err
}

clickhouseConn := clickhouse.OpenDB(opts)
clickhouseConn.SetMaxIdleConns(5)
clickhouseConn.SetMaxOpenConns(10)

and the dns is something like this: 'clickhouse://default:@127.0.0.1:9000/activities?dial_timeout=200ms&max_execution_time=60&compress=lz4&dial_timeout=30s&debug=true'

Is there anything obvious that is wrong?

jkaflik · 2024-03-05T07:51:41Z

Yes, debug=true is enough to get stuff logged to the stdout. Do you have it?

How do you handle your connections? Are you sure you properly close all sql.Rows and so on? Connections might be not released and thus pool blocks waiting for a free connection.

Arttii · 2024-03-05T14:34:49Z

We just pass the pointer around to the gourotines. For querying, we use https://github.com/georgysavva/scany/ to scan in the struct(it seems the Rows are being closed by the package, just looking at the code)

What is the best way to understand, which statements have an impact on this? Just looking at our metrics, i think you are right it seems its waiting for connections.

# HELP db_sql_connections_active_ratio Count of active connections in the pool
# TYPE db_sql_connections_active_ratio gauge
db_sql_connections_active_ratio{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 10
# HELP db_sql_connections_idle_closed_ratio The total number of connections closed due to SetMaxIdleConns
# TYPE db_sql_connections_idle_closed_ratio gauge
db_sql_connections_idle_closed_ratio{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 4
# HELP db_sql_connections_idle_ratio Count of idle connections in the pool
# TYPE db_sql_connections_idle_ratio gauge
db_sql_connections_idle_ratio{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 0
# HELP db_sql_connections_lifetime_closed_ratio The total number of connections closed due to SetConnMaxLifetime
# TYPE db_sql_connections_lifetime_closed_ratio gauge
db_sql_connections_lifetime_closed_ratio{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 0
# HELP db_sql_connections_open_ratio Count of open connections in the pool
# TYPE db_sql_connections_open_ratio gauge
db_sql_connections_open_ratio{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 10
# HELP db_sql_connections_wait_count_ratio The total number of connections waited for
# TYPE db_sql_connections_wait_count_ratio gauge
db_sql_connections_wait_count_ratio{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 1189
# HELP db_sql_connections_wait_duration_milliseconds The total time blocked waiting for a new connection
# TYPE db_sql_connections_wait_duration_milliseconds gauge
db_sql_connections_wait_duration_milliseconds{db_name="activities",db_system="clickhouse",otel_scope_name="go.nhat.io/otelsql",otel_scope_version=""} 932384.413475

The wait duration specifically.

Would setting max_execution_time perhaps fix this? We basically have several queries, always running at an interval. Maybe they are starving out the pool?

jkaflik · 2024-03-07T08:13:07Z

Would setting max_execution_time perhaps fix this? We basically have several queries, always running at an interval. Maybe they are starving out the pool?

Does it mean these queries do not finish on time? You can basically play with timeout context and see if these queries are problematic.

Could you provide a code snippet more or less reflecting how you query?

Arttii added the bug label Feb 29, 2024

jkaflik added stale requires a follow-up feedback required reporter response needed and removed bug labels Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic reconnect clarification #1210

Automatic reconnect clarification #1210

Arttii commented Feb 29, 2024

jkaflik commented Feb 29, 2024

Arttii commented Feb 29, 2024 •

edited

Loading

Arttii commented Mar 3, 2024

jkaflik commented Mar 4, 2024

Arttii commented Mar 4, 2024

jkaflik commented Mar 4, 2024

Arttii commented Mar 4, 2024

jkaflik commented Mar 5, 2024

Arttii commented Mar 5, 2024 •

edited

Loading

jkaflik commented Mar 7, 2024

Automatic reconnect clarification #1210

Automatic reconnect clarification #1210

Comments

Arttii commented Feb 29, 2024

Describe the bug

Expected behaviour

Configuration

Environment

ClickHouse server

jkaflik commented Feb 29, 2024

Arttii commented Feb 29, 2024 • edited Loading

Arttii commented Mar 3, 2024

jkaflik commented Mar 4, 2024

Arttii commented Mar 4, 2024

jkaflik commented Mar 4, 2024

Arttii commented Mar 4, 2024

jkaflik commented Mar 5, 2024

Arttii commented Mar 5, 2024 • edited Loading

jkaflik commented Mar 7, 2024

Arttii commented Feb 29, 2024 •

edited

Loading

Arttii commented Mar 5, 2024 •

edited

Loading