Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Connect works partially, with a full timeout. #2251

Closed
iteplov opened this issue Sep 9, 2022 · 14 comments · Fixed by #2268
Closed

Cluster Connect works partially, with a full timeout. #2251

iteplov opened this issue Sep 9, 2022 · 14 comments · Fixed by #2268
Assignees

Comments

@iteplov
Copy link
Contributor

iteplov commented Sep 9, 2022

AWS Redis 6.2.5 cluster 2 shards, each with replica, with a config endpoint. FWIW, the config endpoint is a DNS alias of a random cluster node, I believe.

Originally, a .NET Core app uses SE.Redis 2.2.88, configured with that single config endpoint. Everything works.

After upgrading to SE.Redis 2.6.66, random Redis commands started to return non-sensical errors that were reproducible somewhat in 50% of the cases.

E.g. a simple transaction with a single operation against a single random key would give EXECABORT in 50% of the runs. (was not the root issue but the specific reason for that was that in 50% tx would be sent to the server as an empty "multi"/"exec" pair, i.e. missing the actual op, according to redis-cli monitor).

Additionally, the time of the initial ConnectionMultiplexer.Connect would be always around ConnectTimeout, regardless of the value. Very much similar to what is described in this issue here.

All errors/delays would disappear if all 4 endpoints would be spelled out in configuration, i.e. the issue would take place only if the muxer has to discover remaining endpoints from the provided one.

The described issues were reproduced in tests, so what the app does and where it runs seems irrelevant. The configuration is mostly vanilla (dumped below) and didn't change during the upgrade. The app uses custom SSL callback that allows "*.amazonaws.com" certs, not sure if that makes a difference here.

Logs

SE.Redis 2.6.66 connection sequence (ConnectTimeout = 30s), from ConnectionMultplexer.Connect

**** Connecting...
01:32:28.2322: Connecting (sync) on .NET 6.0.7 (StackExchange.Redis: v2.6.66.47313)
01:32:28.3465: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380,keepAlive=60,syncTimeout=5000,asyncTimeout=5000,allowAdmin=False,version=5.0,connectTimeout=30000,password=*****,tiebreaker=__Booksleeve_TieBreak,ssl=True,sslHost=localhost,configChannel=__Booksleeve_MasterChanged,abortConnect=False,resolveDns=False,connectRetry=3,proxy=None,configCheckSeconds=60,defaultDatabase=0
01:32:28.3548: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connecting...
01:32:28.3777: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: BeginConnectAsync
01:32:28.4253: 1 unique nodes specified (with tiebreaker)
01:32:28.4301: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
01:32:28.4350: Allowing 1 endpoint(s) 00:00:30 to respond...
01:32:28.4446: Awaiting 1 available task completion(s) for 30000ms, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=1,Free=32766,Min=12,Max=32767), POOL: (Threads=5,QueuedItems=0,CompletedItems=5)
01:32:28.5670: Configuring TLS
01:32:28.7636: TLS connection established successfully using protocol: Tls12
01:32:28.7692: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connected 
01:32:28.7768: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Server handshake
01:32:28.7811: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Authenticating (password)
01:32:28.8240: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Setting client name: W-ITEPLO-TJ5N7(SE.Redis-v2.6.66.47313)
01:32:28.8331: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configuring...
01:32:28.8421: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
01:32:28.8506: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
01:32:28.8588: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Sending critical tracer (handshake): ECHO
01:32:28.8620: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: ECHO
01:32:28.8711: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Flushing outbound buffer
01:32:28.8761: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: OnEstablishingAsync complete
01:32:28.8791: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Starting read
01:32:28.9559: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) role: primary
01:32:28.9617: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) version: 6.2.5
01:32:28.9678: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) server-type: cluster
01:32:29.0186: Response from clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
01:32:29.0252: Response from clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / ECHO: BulkString: 16 bytes
01:32:29.0342: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
01:32:29.0390: All 1 available tasks completed cleanly, IOCP: (Busy=1,Free=999,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767), POOL: (Threads=10,QueuedItems=0,CompletedItems=20)
01:32:29.0441: Endpoint summary:
01:32:29.0482:   clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
01:32:29.0536:   clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)
01:32:29.1106: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
01:32:29.1156: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
01:32:29.1202: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
01:32:29.1236: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
01:32:29.1273: Allowing 4 endpoint(s) 00:00:29.3080000 to respond...
01:32:29.1315: Awaiting 4 available task completion(s) for 29308ms, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767), POOL: (Threads=10,QueuedItems=0,CompletedItems=26)
01:32:58.4418: Not all available tasks completed cleanly (from ReconfigureAsync#1284, timeout 29308ms), IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767), POOL: (Threads=6,QueuedItems=0,CompletedItems=73)
01:32:58.4475:   Server[0] (services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380) Status: WaitingForActivation (inst: 12, qs: 0, in: 0, qu: 12, aw: False, in-pipe: -1, out-pipe: -1, bw: Inactive, rs: ReadAsync. ws: Idle)
01:32:58.4510:   Server[1] (services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380) Status: WaitingForActivation (inst: 11, qs: 0, in: 0, qu: 11, aw: False, in-pipe: -1, out-pipe: -1, bw: Inactive, rs: ReadAsync. ws: Idle)
01:32:58.4550:   Server[2] (services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380) Status: WaitingForActivation (inst: 12, qs: 0, in: 0, qu: 12, aw: False, in-pipe: -1, out-pipe: -1, bw: Inactive, rs: ReadAsync. ws: Idle)
01:32:58.4591:   Server[3] (services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380) Status: WaitingForActivation (inst: 11, qs: 0, in: 0, qu: 11, aw: False, in-pipe: -1, out-pipe: -1, bw: Inactive, rs: ReadAsync. ws: Idle)
01:32:58.4632: Endpoint summary:
01:32:58.4668:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
01:32:58.4737:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
01:32:58.4795:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
01:32:58.4833:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
01:32:58.4875:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Did not respond
01:32:58.4908:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Did not respond
01:32:58.4940:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Did not respond
01:32:58.4969:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Did not respond
01:32:58.5004: Cluster: 16384 of 16384 slots covered
01:32:58.5055: Endpoint Summary:
01:32:58.5101:   clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, primary; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
01:32:58.5155:   clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=12, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
01:32:58.5243:   clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+12=12 (1.20 ops/s; spans 10s); sub: 0+4=4 (0.40 ops/s; spans 10s)
01:32:58.5276:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, primary; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a; not in use: DidNotRespond
01:32:58.5302:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=11, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
01:32:58.5332:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+11=11 (1.10 ops/s; spans 10s); sub:
01:32:58.5361:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, primary; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a; not in use: DidNotRespond
01:32:58.5385:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=11, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
01:32:58.5409:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+11=11 (1.10 ops/s; spans 10s); sub:
01:32:58.5432:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a; not in use: DidNotRespond
01:32:58.5456:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=12, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
01:32:58.5481:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+12=12 (1.20 ops/s; spans 10s); sub:
01:32:58.5510:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a; not in use: DidNotRespond
01:32:58.5535:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=12, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
01:32:58.5560:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+12=12 (1.20 ops/s; spans 10s); sub:
01:32:58.5587: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
01:32:58.5610: Starting heartbeat...
01:32:58.5643: Total connect time: 30,332 ms
**** Connected

Theory

After debugging for some time, I believe the issue was introduced in #1947. Specifically this:

The connection completion handler will now fire when the second bridge/connection completes, this means we won't have interactive connected but subscription in an unknown state - both are connected before we fire the handler meaning the moment we come back from connect, subscriptions are in business.

Even more specifically, this line, that was later slightly updated in #2001, but not in a way to help here (all my cluster nodes use subscriptions).

Super-specifically, during activation of the discovered endpoints, the connection sequence is awaiting on these tasks. These are TCS tasks that are set completed when this happens.

And this now (well, since 2.5.*) requires a Subscription PhysicalBridge present on that endpoint. For the life of me I don't see it being created during the Reconfigure flow anywhere but here, which clearly happens only for the initially provided endpoints.

$1M Question

How come it's been out for that long and nobody noticed? Things go rather bad in this partially connected state. So I mostly hope that my theory is missing something obvious and all of that is fixable with a simple option flip.

Thanks for looking into this and all your work!

@iteplov
Copy link
Contributor Author

iteplov commented Sep 9, 2022

For reference, same setup but with 2.2.88, same connection sequence, everything works:

**** Connecting...
Connecting (sync) on .NET 6.0.7
02:01:13.1094: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380,keepAlive=60,syncTimeout=5000,asyncTimeout=5000,allowAdmin=False,version=2.0,connectTimeout=30000,password=*****,tiebreaker=__Booksleeve_TieBreak,ssl=True,sslHost=localhost,configChannel=__Booksleeve_MasterChanged,abortConnect=False,resolveDns=False,connectRetry=3,proxy=None,configCheckSeconds=60,defaultDatabase=0
02:01:13.1247: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connecting...
02:01:13.1459: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: BeginConnectAsync
02:01:13.3024: 1 unique nodes specified
02:01:13.3078: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
02:01:13.3120: Allowing 1 endpoint(s) 00:00:30 to respond...
02:01:13.3237: Awaiting 1 available task completion(s) for 30000ms, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=1,Free=32766,Min=12,Max=32767)
02:01:13.4446: Configuring TLS
02:01:13.7256: TLS connection established successfully using protocol: Tls12
02:01:13.7319: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connected 
02:01:13.7411: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Server handshake
02:01:13.7453: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Authenticating (password)
02:01:13.8084: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Setting client name: W-ITEPLO-TJ5N7
02:01:13.8187: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configuring...
02:01:13.8523: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Sending critical tracer (handshake): ECHO
02:01:13.8576: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: ECHO
02:01:13.8629: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Flushing outbound buffer
02:01:13.8689: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: OnEstablishingAsync complete
02:01:13.8743: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Starting read
02:01:13.9612: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) version: 6.2.5
02:01:13.9655: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) server-type: cluster
02:01:13.9715: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) role: replica
02:01:13.9994: Response from clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / ECHO: BulkString: 16 bytes
02:01:14.0091: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
02:01:14.0152: All 1 available tasks completed cleanly, IOCP: (Busy=1,Free=999,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767)
02:01:14.0260: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
02:01:14.0314: Election: Gathering tie-breakers...
02:01:14.0413: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
02:01:14.0528: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
02:01:14.0634: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
02:01:14.0974: Response from clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
02:01:14.1177: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
02:01:14.1241: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
02:01:14.1303: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
02:01:14.1399: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
02:01:14.1456: Allowing 4 endpoint(s) 00:00:29.1670000 to respond...
02:01:14.1535: Awaiting 4 available task completion(s) for 29167ms, IOCP: (Busy=2,Free=998,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767)
02:01:14.2754: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
02:01:14.2802: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
02:01:14.3543: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
02:01:14.3985: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
02:01:14.4043: All 4 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=3,Free=32764,Min=12,Max=32767)
02:01:14.4102: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
02:01:14.4149: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
02:01:14.4199: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
02:01:14.4245: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
02:01:14.4277: Election: Gathering tie-breakers...
02:01:14.4305: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
02:01:14.4337: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
02:01:14.4374: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
02:01:14.4424: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
02:01:14.4474: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
02:01:14.4527: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
02:01:14.4576: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
02:01:14.4620: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
02:01:14.4665: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
02:01:14.4715: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)
02:01:14.4770: Response from services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
02:01:14.5155: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)
02:01:14.5212: Response from services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
02:01:14.5258: Response from services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: (null)
02:01:14.5301: Response from services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: (null)
02:01:14.5337: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
02:01:14.5384: Cluster: 16384 of 16384 slots covered
02:01:14.5451: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
02:01:14.5513: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=12, qu=0, qs=0, qc=0, wr=0, socks=1, non-pref=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
02:01:14.5649: clustercfg.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+12=12 (1.20 ops/s; spans 10s); sub: 0+4=4 (0.40 ops/s; spans 10s)
02:01:14.5702: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, master; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
02:01:14.5774: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=10, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
02:01:14.5825: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+10=10 (1.00 ops/s; spans 10s); sub:
02:01:14.5869: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, master; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
02:01:14.5906: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=10, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
02:01:14.5941: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+10=10 (1.00 ops/s; spans 10s); sub:
02:01:14.5979: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
02:01:14.6019: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=11, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
02:01:14.6052: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+11=11 (1.10 ops/s; spans 10s); sub:
02:01:14.6090: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
02:01:14.6130: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=11, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
02:01:14.6161: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+11=11 (1.10 ops/s; spans 10s); sub:
02:01:14.6200: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
02:01:14.6233: Starting heartbeat...
**** Connected

@iteplov
Copy link
Contributor Author

iteplov commented Sep 9, 2022

For another reference, same setup with 2.6.66 and spelled out endpoints. everything works.

**** Connecting...
19:23:53.0062: Connecting (sync) on .NET 6.0.7 (StackExchange.Redis: v2.6.66.47313)
19:23:53.0998: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380,services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380,services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380,services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380,keepAlive=60,syncTimeout=5000,asyncTimeout=5000,allowAdmin=False,version=5.0,connectTimeout=30000,password=*****,tiebreaker=__Booksleeve_TieBreak,ssl=True,sslHost=localhost,configChannel=__Booksleeve_MasterChanged,abortConnect=False,resolveDns=False,connectRetry=3,proxy=None,configCheckSeconds=60,defaultDatabase=0
19:23:53.1061: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connecting...
19:23:53.1228: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: BeginConnectAsync
19:23:53.1652: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connecting...
19:23:53.1674: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: BeginConnectAsync
19:23:53.1739: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connecting...
19:23:53.1763: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: BeginConnectAsync
19:23:53.1788: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connecting...
19:23:53.1814: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: BeginConnectAsync
19:23:53.1845: 4 unique nodes specified (with tiebreaker)
19:23:53.1874: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
19:23:53.1894: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
19:23:53.1915: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
19:23:53.1940: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync init (State=Connecting)
19:23:53.1966: Allowing 4 endpoint(s) 00:00:30 to respond...
19:23:53.2043: Awaiting 4 available task completion(s) for 30000ms, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=1,Free=32766,Min=12,Max=32767), POOL: (Threads=5,QueuedItems=0,CompletedItems=4)
19:23:53.3736: Configuring TLS
19:23:53.3768: Configuring TLS
19:23:53.3796: Configuring TLS
19:23:53.3823: Configuring TLS
19:23:53.5613: TLS connection established successfully using protocol: Tls12
19:23:53.5645: TLS connection established successfully using protocol: Tls12
19:23:53.5679: TLS connection established successfully using protocol: Tls12
19:23:53.5717: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connected 
19:23:53.5756: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connected 
19:23:53.5795: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connected 
19:23:53.5836: TLS connection established successfully using protocol: Tls12
19:23:53.5883: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Server handshake
19:23:53.5924: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Server handshake
19:23:53.5958: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Authenticating (password)
19:23:53.5994: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Authenticating (password)
19:23:53.6032: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Server handshake
19:23:53.6070: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Connected 
19:23:53.6105: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Authenticating (password)
19:23:53.6151: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Server handshake
19:23:53.6186: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Authenticating (password)
19:23:53.6602: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Setting client name: W-ITEPLO-TJ5N7(SE.Redis-v2.6.66.47313)
19:23:53.6629: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Setting client name: W-ITEPLO-TJ5N7(SE.Redis-v2.6.66.47313)
19:23:53.6653: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Setting client name: W-ITEPLO-TJ5N7(SE.Redis-v2.6.66.47313)
19:23:53.6680: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Setting client name: W-ITEPLO-TJ5N7(SE.Redis-v2.6.66.47313)
19:23:53.6707: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configuring...
19:23:53.6731: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configuring...
19:23:53.6752: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configuring...
19:23:53.6775: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configuring...
19:23:53.6919: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
19:23:53.6968: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
19:23:53.6995: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
19:23:53.7019: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Requesting tie-break (Key="__Booksleeve_TieBreak")...
19:23:53.7067: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
19:23:53.7095: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
19:23:53.7124: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Sending critical tracer (handshake): ECHO
19:23:53.7147: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Sending critical tracer (handshake): ECHO
19:23:53.7178: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: ECHO
19:23:53.7204: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: ECHO
19:23:53.7232: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Flushing outbound buffer
19:23:53.7266: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Flushing outbound buffer
19:23:53.7300: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnEstablishingAsync complete
19:23:53.7333: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Starting read
19:23:53.7362: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnEstablishingAsync complete
19:23:53.7390: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
19:23:53.7417: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Starting read
19:23:53.7449: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Sending critical tracer (handshake): ECHO
19:23:53.7479: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: GET __Booksleeve_TieBreak
19:23:53.7506: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: ECHO
19:23:53.7533: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Sending critical tracer (handshake): ECHO
19:23:53.7560: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Flushing outbound buffer
19:23:53.7602: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: ECHO
19:23:53.7639: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnEstablishingAsync complete
19:23:53.7666: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Flushing outbound buffer
19:23:53.7691: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Starting read
19:23:53.7718: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnEstablishingAsync complete
19:23:53.7743: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Starting read
19:23:53.8003: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) role: replica
19:23:53.8044: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) role: replica
19:23:53.8070: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) role: primary
19:23:53.8097: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) version: 6.2.5
19:23:53.8124: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) version: 6.2.5
19:23:53.8149: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) server-type: cluster
19:23:53.8174: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) server-type: cluster
19:23:53.8200: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) version: 6.2.5
19:23:53.8226: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) role: primary
19:23:53.8254: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) server-type: cluster
19:23:53.8287: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) version: 6.2.5
19:23:53.8316: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Auto-configured (INFO) server-type: cluster
19:23:53.8394: Response from services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
19:23:53.8429: Response from services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
19:23:53.8463: Response from services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / ECHO: BulkString: 16 bytes
19:23:53.8493: Response from services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / ECHO: BulkString: 16 bytes
19:23:53.8521: Response from services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: Error: MOVED 6202 services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380
19:23:53.8550: Response from services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / GET __Booksleeve_TieBreak: (null)
19:23:53.8582: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
19:23:53.8628: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
19:23:53.8666: Response from services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / ECHO: BulkString: 16 bytes
19:23:53.8698: Response from services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / ECHO: BulkString: 16 bytes
19:23:53.8728: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
19:23:53.8754: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync completed (From command: ECHO)
19:23:53.8787: All 4 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=3,Free=32764,Min=12,Max=32767), POOL: (Threads=23,QueuedItems=0,CompletedItems=68)
19:23:53.8821: Endpoint summary:
19:23:53.8848:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:53.8873:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:53.8903:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:53.8935:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:53.8974:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
19:23:53.9369:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)
19:23:53.9413:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
19:23:53.9445:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)
19:23:53.9482: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected start
19:23:53.9519: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: PING
19:23:53.9563: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected start
19:23:53.9596: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: PING
19:23:53.9638: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected start
19:23:53.9666: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: PING
19:23:53.9695: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected start
19:23:53.9727: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive: Writing: PING
19:23:53.9763: Allowing 4 endpoint(s) 00:00:29.2200000 to respond...
19:23:53.9798: Awaiting 4 available task completion(s) for 29220ms, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767), POOL: (Threads=23,QueuedItems=0,CompletedItems=74)
19:23:53.9864: Response from services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / PING: SimpleString: PONG
19:23:53.9915: services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected end
19:23:53.9956: Response from services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / PING: SimpleString: PONG
19:23:53.9986: services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected end
19:23:54.0019: Response from services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / PING: SimpleString: PONG
19:23:54.0048: services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected end
19:23:54.0075: Response from services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380/Interactive / PING: SimpleString: PONG
19:23:54.0107: services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: OnConnectedAsync already connected end
19:23:54.0138: All 4 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=2,Free=32765,Min=12,Max=32767), POOL: (Threads=23,QueuedItems=0,CompletedItems=82)
19:23:54.0165: Endpoint summary:
19:23:54.0191:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:54.0225:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:54.0252:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:54.0281:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
19:23:54.0310:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: Already connected)
19:23:54.0338:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: Already connected)
19:23:54.0367:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: Already connected)
19:23:54.0401:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: Already connected)
19:23:54.0445: Cluster: 16384 of 16384 slots covered
19:23:54.0483: Endpoint Summary:
19:23:54.0522:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
19:23:54.0562:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=14, qu=0, qs=0, qc=0, wr=0, socks=1, non-pref=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
19:23:54.0641:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+14=14 (1.40 ops/s; spans 10s); sub: 0+4=4 (0.40 ops/s; spans 10s)
19:23:54.0684:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, primary; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
19:23:54.0719:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=12, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
19:23:54.0754:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+12=12 (1.20 ops/s; spans 10s); sub: 0+4=4 (0.40 ops/s; spans 10s)
19:23:54.0786:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, replica; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
19:23:54.0823:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=13, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
19:23:54.0861:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+13=13 (1.30 ops/s; spans 10s); sub: 0+4=4 (0.40 ops/s; spans 10s)
19:23:54.0902:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Cluster v6.2.5, primary; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
19:23:54.0942:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: int ops=12, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
19:23:54.0988:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Circular op-count snapshot; int: 0+12=12 (1.20 ops/s; spans 10s); sub: 0+4=4 (0.40 ops/s; spans 10s)
19:23:54.1027: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
19:23:54.1059: Starting heartbeat...
19:23:54.1112: Total connect time: 1,105 ms
**** Connected

EDIT: initially grabbed it from 2.2.88 by mistake, now put the correct one.

@NickCraver
Copy link
Collaborator

Thanks for the detailed info here! It may be 1-2 days before I get to poke hard at this (kiddo birthday tomorrow) but lots to go on, thanks a ton for that <3

@NickCraver NickCraver self-assigned this Sep 9, 2022
@NickCraver
Copy link
Collaborator

Apologies on time here - a day after posting this said 8yo also brought COVID round 2 home for fun - still catching up on work and OSS here.

@iteplov
Copy link
Contributor Author

iteplov commented Sep 15, 2022

no worries, hope you all are well. this is not extremely urgent, considering the circumstances. thank you for the heads up.

@pthargis
Copy link

@iteplov - can you let me know how did you enable the detailed logging ?

@iteplov
Copy link
Contributor Author

iteplov commented Sep 23, 2022

can you let me know how did you enable the detailed logging ?

something along these lines. not working code, i just typed it here but hopefully you get the gist

var logger = new MyLogger();
var multiplexer = ConnectionMultiplexer.Connect(redisOptions, logger);
public class MyLogger: TextWriter
{
    public override Encoding Encoding => Encoding.Default;
    public override void WriteLine(string value) => Debug.WriteLine(value);
    public override void Write(string value) => Debug.Write(value);
}

@pthargis
Copy link

Thank you @iteplov

@mgravell
Copy link
Collaborator

Looking at what is happening here, our hypothesis is that pub/sub simply isn't enabled in this environment; if this is correct, a pragmatic workaround here might be to simply tell the muxer to not try pub/sub - which can be done by adding ,$SUBSCRIBE= to the end of the connection-string. Is this possible to test? Thanks.

@iteplov
Copy link
Contributor Author

iteplov commented Oct 12, 2022

@mgravell i was not smart enough to figure out how to tweak the connection string but i made this change here, which i think should result in the same outcome:

        /// <summary>
        /// The default commands specified by redis.
        /// </summary>
        public static CommandMap Default { get; } = CreateImpl(null, exclusions: new HashSet<RedisCommand> { RedisCommand.SUBSCRIBE, RedisCommand.PSUBSCRIBE });

it does indeed fix the issue - connections can be established. unfortunately, it breaks the app, since we do need pub/sub and been using it for some time.

please see this comment as well #2265 (comment). i think these two issues can be combined as a dup.

18:51:52.9772: Endpoint summary:
18:51:52.9788:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
18:51:52.9804:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
18:51:52.9821:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
18:51:52.9840:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Endpoint is ConnectedEstablished
18:51:52.9857:   services-0001-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
18:51:52.9872:   services-0001-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)
18:51:52.9889:   services-0002-001.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster replica (Source: From command: ECHO)
18:51:52.9910:   services-0002-002.services.rfikkv.usw2.cache.amazonaws.com:6380: Returned with success as Cluster primary (Source: From command: ECHO)

@iteplov
Copy link
Contributor Author

iteplov commented Oct 12, 2022

i took a stab at it and this "works on my machine": #2268

NickCraver added a commit that referenced this issue Oct 15, 2022
… discovery (#2268)

Fix for #2251 and #2265 ensuring subscription connections are proactively created in all cases.

Co-authored-by: Nick Craver <nrcraver@gmail.com>
@NickCraver
Copy link
Collaborator

Thanks @iteplov! Fix is on MyGet as 2.6.70 now - if you could test this version to make double sure there's nothing extra odd against AWS, I'll get it on NuGet proper right after <3

@iteplov
Copy link
Contributor Author

iteplov commented Oct 17, 2022

2.6.70 is confirmed to pass our internal tests and seems to be working fine in test environments.

Thanks a bunch for your attention to this issue and the prompt release.

@NickCraver
Copy link
Collaborator

Thanks a ton for all the help here, very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants