Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXECABORT Transaction discarded because of MOVED error in Azure Redis #1901

Open
chaochao0815 opened this issue Jul 11, 2024 · 1 comment
Open

Comments

@chaochao0815
Copy link

chaochao0815 commented Jul 11, 2024

ioredis version 5.3.2

Problem: We are using bull library which will issuing redis transaction. We have interment issue of getting transaction aborted error due to MOVED error in Azure Redis. Same code works without any issue with AWS Redis.

Sample Error message:

ReplyError: EXECABORT Transaction discarded because of previous errors.
    at parseError (/app/node_modules/redis-parser/lib/parser.js:179:12)
    at parseType (/app/node_modules/redis-parser/lib/parser.js:302:14) {
  previousErrors: [
    ReplyError: MOVED 2405 <IP>:15000
        at parseError (/app/node_modules/redis-parser/lib/parser.js:179:12)
        at parseType (/app/node_modules/redis-parser/lib/parser.js:302:14) {

After dig into the problem by adding more loggings in ioredis library, found that:

  1. For Azure Redis, sometimes the node(connection) will be removed from connectionPool, which causes the transaction goes to wrong node to query the key, below log can be found in Azure Redis. But this situation didn't happen to AWS Redis. It might be related with network issue or Azure Redis close the connection. Checking with Microsoft for the behavior.
    2024-07-11T06:59:41.140Z ioredis:redis status[<ip>:15000]: close -> end

  2. In ioredis 5, slotsRefreshInterval is by default disabled. In Redis 4, it's more unlikely to see the problem due to it's enabled by default with 5 second, refreshSlotsCache event will reset nodes in connection pool, which will get back the node in connectionPool

  3. Race condition in reaching the retry limit for MOVED error and the refreshSlotsCache. Before refreshSlotsCache get back the node in connectionPool, if unlucky it reached to the retry limit (16), then it will abort the transaction.

We have workaround to enable slotsRefreshInterval and also set value of retryDelayOnMoved to solve the problem.
But I noticed that, for MOVED that not in transaction, it will call below code to recreate node in connectionPool
_this.connectionPool.findOrCreate(_this.natMapper(key));
Is it possible to do the same thing for MOVED error in transaction. It's more efficient than wait for refreshSlotsCache to finish.

@Toritos01
Copy link

Hi @chaochao0815, could I ask which values you set for retryDelayOnMoved, slotsRefreshInterval, and slotsRefreshTimeout? Or for any other config values that you have set?
I have been facing some similar issues to this, but am still seeing the MOVED errors after setting all those configs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants