Struggle with etcd timeout, help #9159

ximenzaoshi · 2018-01-17T11:05:56Z

We use k8s in production, about 300 nodes. Serval days ago, the etcd cluster became abnormal, no leader can be elected and all the client requests timeout.
Here is the node config:

/data/bin/etcd-3.2
    --name betcd05
    --initial-cluster betcd05=http://tzk05:3380,betcd03=http://tzk03:3380,betcd01=http://tzk01:3380,betcd04=http://tzk04:3380,betcd02=http://tzk02:3380
    --initial-advertise-peer-urls http://10.3.8.18:3380
    --data-dir /data/data/etcd-bay4
    --wal-dir /data/log/etcd-bay4
    --listen-peer-urls http://10.3.8.18:3380
    --listen-client-urls http://10.3.8.18:3379,http://127.0.0.1:3379
    --advertise-client-urls http://10.3.8.18:3379
    --initial-cluster-state existing
    --quota-backend-bytes=8589934592
    --heartbeat-interval '1000'
    --election-timeout '10000'

etcd version 3.2.14
Here is the master node endpoint status:

ETCDCTL_API=3 etcdctl --endpoints=tzk01:3379 endpoint status
tzk01:3379, 6c31ba6970ba5e26, 3.2.14, 3.4 GB, true, 346, 123773332

The master node log shows that the query time too long...

2018-01-17 18:51:46.388490 W | etcdserver: apply entries took too long [31.038059099s for 1 entries]
2018-01-17 18:51:46.388505 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:51:59.666084 W | etcdserver: apply entries took too long [12.772745279s for 1 entries]
2018-01-17 18:51:59.666101 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:52:13.236024 W | etcdserver: apply entries took too long [13.016438597s for 1 entries]
2018-01-17 18:52:13.236046 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:52:26.511690 W | etcdserver: apply entries took too long [12.756021442s for 1 entries]
2018-01-17 18:52:26.511711 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:52:39.861618 W | etcdserver: apply entries took too long [12.822592077s for 1 entries]
2018-01-17 18:52:39.861636 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:52:54.475787 W | etcdserver: apply entries took too long [14.083658231s for 1 entries]
2018-01-17 18:52:54.475806 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:53:07.687890 W | etcdserver: apply entries took too long [12.71080088s for 1 entries]
2018-01-17 18:53:07.687908 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:53:20.919569 W | etcdserver: apply entries took too long [12.708351937s for 1 entries]
2018-01-17 18:53:20.919585 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:53:21.169769 W | etcdserver: apply entries took too long [103.492288ms for 1 entries]
2018-01-17 18:53:21.169795 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:53:40.588217 W | etcdserver: apply entries took too long [19.100170034s for 1 entries]
2018-01-17 18:53:40.588232 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:53:55.389231 W | etcdserver: apply entries took too long [14.270018598s for 1 entries]
2018-01-17 18:53:55.389248 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:53:55.408950 I | mvcc: store.index: compact 2476681628
2018-01-17 18:53:55.836416 W | etcdserver: apply entries took too long [427.854965ms for 1 entries]
2018-01-17 18:53:55.836435 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:54:02.478435 I | mvcc: finished scheduled compaction at 2476681628 (took 6.641971303s)
2018-01-17 18:54:09.624533 W | etcdserver: apply entries took too long [13.735092749s for 1 entries]
2018-01-17 18:54:09.624553 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:54:24.352842 W | etcdserver: apply entries took too long [14.197159572s for 1 entries]
2018-01-17 18:54:24.352859 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:54:38.258577 W | etcdserver: apply entries took too long [13.372680952s for 1 entries]
2018-01-17 18:54:38.258593 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:54:52.069577 W | etcdserver: apply entries took too long [13.249941442s for 1 entries]
2018-01-17 18:54:52.069595 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:55:06.132372 W | etcdserver: apply entries took too long [13.491814966s for 1 entries]
2018-01-17 18:55:06.132391 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:55:21.022463 I | wal: segmented wal file /data/log/etcd-bay4/0000000000000405-000000000760e9ba.wal is created
2018-01-17 18:55:25.017074 W | etcdserver: apply entries took too long [18.367109561s for 1 entries]
2018-01-17 18:55:25.017427 W | etcdserver: avoid queries with large range/delete range!
2018-01-17 18:55:38.776670 W | etcdserver: apply entries took too long [13.190163366s for 1 entries]
2018-01-17 18:55:38.776686 W | etcdserver: avoid queries with large range/delete range!

Here are the master node log and metrics files:
master-node.log
master-node-metrics.log

Disk, memory, network, all is OK. The problem was strange and made a disaster.
If you need more info, please tell me.

The text was updated successfully, but these errors were encountered:

hexfusion · 2018-01-17T13:36:56Z

One thing I notice in your master-node logs is this.

2018-01-16 00:21:31.092323 W | etcdserver: backend quota 167772160000 exceeds maximum recommended quota 8589934592

/cc @gyuho @xiang90

gyuho · 2018-01-17T18:45:27Z

Disk, memory, network, all is OK.

How do you know? Is that HDD?

xiang90 · 2018-01-17T18:52:23Z

no leader can be elected and all the client requests timeout.

from what i can see in the logging you provided, the leader is elected. the client requests timedout since the execution time is long for each request (took more than 10 seconds).

probably you need to figure out why there are expensive requests sent to etcd.

ximenzaoshi · 2018-01-17T19:11:31Z

This is a config mistake, I changed to 8589934592. @hexfusion
We have host monitor. @gyuho
After the electing config changed, as you can see, 1000ms and 10000ms, the leader can be elected, it can not be elected before changing the config. But the most import thing now is the timeout requests. Thank you for your advice, I tried but no result. Is there any tools for this, or etcd can tell which key took so much time? @xiang90

xiang90 · 2018-01-17T20:02:29Z

Is there any tools for this, or etcd can tell which key took so much time?

check out #9111

ximenzaoshi · 2018-01-17T21:34:41Z

ETCDCTL_API=3 etcdctl --endpoints=localhost:3379 put foo bar --command-timeout=100s
This update tool about 12s. As I can see, any update operation takes the same time now...
@xiang90

xiang90 · 2018-01-17T21:36:29Z

@ximenzaoshi

etcd serializes requests. you might have expensive request waiting in queue.

ximenzaoshi · 2018-01-18T00:59:59Z

What kind of requests? You mean the body is huge, or may be a transaction? @xiang90

ximenzaoshi · 2018-01-19T02:24:45Z

I compiled the master branch, and ran a single cluster with the snapshot, no any write or read. And I found the update still timeout. There are lots of 2018-01-19 10:07:38.538741 W | etcdserver: failed to revoke 754f60fad2005caf ("lease not found") logs like this. All the pending proposals are related to the lease revoking.
So why the leases to revoke can be not found, is this the reason of timeout?
@xiang90

xiang90 · 2018-01-19T02:30:18Z

@ximenzaoshi

as the error suggests you try to revoked a not exist lease. i have no idea why your code tries to do that aggressively. i am not sure if this is the reason for timeout either.

ximenzaoshi · 2018-01-19T04:29:50Z

But there is no program to visit the singe etcd cluster which is created by the snapshot before. Or you mean the etcd snapshot will store the user requests before? This is a little unreasonable. So I guess it's the etcd itself that dose the revoking as the lease is out of date. But why the outdated lease is not found, this is still a question. And after the revoking requests failed, I think it should not be retried forever, or the queue would be fulfilled with the revoking requests and leads to timeout.
Anyway, why the etcd would revoke a lease not found is the beginning question. @xiang90

xiang90 · 2018-01-19T05:35:56Z

But there is no program to visit the singe etcd cluster which is created by the snapshot before.

you did not mention this before. i thought there were clients accessing the new etcd cluster.

I am not sure why etcd can still process requests when there is no client at all. Can you confirm that is the case? we never heard about this before.

if you believe this is the case, we would like to give it a look if you can share your snapshot file, and tell us how to reproduce the issue you hit.

ximenzaoshi · 2018-01-19T05:50:41Z

Yes, I am sure that there is no client. Can heavy lease expiring influence the client request? I'm sure there is no client request. The revoking requests are from etcd itself.
Maybe there is a version problem? The origin snapshot is created by version 3.1.9. I used the latest master branch to test. Are there any important changes for recent version about lease?
The snapshot would be provided later.

xiang90 · 2018-01-19T05:53:08Z

Can heavy lease expiring influence the client request?

it can. but we fixed it by jittering revoking time. there is also a bug report on k8s repo.

what i do not understand is why etcd reports lease not found if the revoking is from etcd itself.

xiang90 · 2018-01-19T05:55:25Z

lease revoking without affecting client request is fixed in 3.2.2:https://github.com/coreos/etcd/blob/master/CHANGELOG.md#improved-2.

however, i am not sure if you hit the same issue as i mentioned above around lease not found

ximenzaoshi · 2018-01-19T06:37:47Z

The snapshot is a little large, about 1.3G. Is there a good way to share it? @xiang90

xiang90 · 2018-01-19T17:17:01Z

@ximenzaoshi upload it somewhere secure, and send the url over email to me at xiang.li@coreos.com? thanks.

ximenzaoshi · 2018-01-25T06:11:01Z

@xiang90
Problem solved. Thanks for your help.
Kubernetes created too much leases in etcd, about 20000K. Creating is faster than revoking, and after a long time, the huge leases become a big problem. It costs etcd too much time on ranging the lease map which holds a lock that has a big influence on updating.
Creating huge leases to etcd can reproduce our problem.
I made a PR(#9226) on this problem.

xiang90 · 2018-01-25T06:14:10Z

what i do not understand is why etcd reports lease not found if the revoking is from etcd itself.

i still do not understand this. also as i mentioned, k8s should not create these many leases. there is an issue in k8s repo.

xiang90 · 2018-01-25T06:14:21Z

/cc @jpbetz

ximenzaoshi · 2018-01-25T06:23:25Z

Yes, the k8s shoud not create so many leases, we will fixed the problem later. The not found problem may be also caused by the huge leases. There can be two revoking requests on the same lease in queue. Maybe this is the reason? @xiang90

gyuho · 2018-02-27T01:20:45Z

Kubernetes created too much leases in etcd, about 20000K. Creating is faster than revoking, and after a long time, the huge leases become a big problem. It costs etcd too much time on ranging the lease map which holds a lock that has a big influence on updating. Creating huge leases to etcd can reproduce our problem.

#9229 won't solve this problem.
It only makes lease lookup calls non-blocking when there's no grant, revoke routine.
Lease Grant and Revoke will still contend on the same write lock.

Related to #9360.

ximenzaoshi · 2018-03-01T03:37:20Z

Yes, you are right, lease Grant and Revoke will still contend on the same write lock. Splitting leases to different timespans may be helpful? And so much leases can not be a very common case.

gyuho · 2018-03-01T03:52:47Z

@ximenzaoshi I am planning to investigate further on this code path. Let's move this discussion to #9360.

kannanvr · 2020-05-28T10:39:33Z

How did you rsolve this issue ? I am facing similar issue ?

ximenzaoshi mentioned this issue Jan 25, 2018

lease: Lookup non-blocking with concurrent Grant/Revoke #9229

Merged

gyuho added the type/question label Feb 25, 2018

gyuho added the area/performance label Feb 27, 2018

gyuho closed this as completed Mar 1, 2018

Revolution1 mentioned this issue Dec 11, 2019

Timeout Revolution1/etcd3-py#119

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Struggle with etcd timeout, help #9159

Struggle with etcd timeout, help #9159

ximenzaoshi commented Jan 17, 2018 •

edited

Loading

hexfusion commented Jan 17, 2018

gyuho commented Jan 17, 2018

xiang90 commented Jan 17, 2018

ximenzaoshi commented Jan 17, 2018

xiang90 commented Jan 17, 2018

ximenzaoshi commented Jan 17, 2018

xiang90 commented Jan 17, 2018

ximenzaoshi commented Jan 18, 2018

ximenzaoshi commented Jan 19, 2018

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 19, 2018 •

edited

Loading

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 19, 2018

xiang90 commented Jan 19, 2018

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 19, 2018

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 25, 2018

xiang90 commented Jan 25, 2018

xiang90 commented Jan 25, 2018

ximenzaoshi commented Jan 25, 2018

gyuho commented Feb 27, 2018

ximenzaoshi commented Mar 1, 2018

gyuho commented Mar 1, 2018

kannanvr commented May 28, 2020

Struggle with etcd timeout, help #9159

Struggle with etcd timeout, help #9159

Comments

ximenzaoshi commented Jan 17, 2018 • edited Loading

hexfusion commented Jan 17, 2018

gyuho commented Jan 17, 2018

xiang90 commented Jan 17, 2018

ximenzaoshi commented Jan 17, 2018

xiang90 commented Jan 17, 2018

ximenzaoshi commented Jan 17, 2018

xiang90 commented Jan 17, 2018

ximenzaoshi commented Jan 18, 2018

ximenzaoshi commented Jan 19, 2018

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 19, 2018 • edited Loading

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 19, 2018

xiang90 commented Jan 19, 2018

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 19, 2018

xiang90 commented Jan 19, 2018

ximenzaoshi commented Jan 25, 2018

xiang90 commented Jan 25, 2018

xiang90 commented Jan 25, 2018

ximenzaoshi commented Jan 25, 2018

gyuho commented Feb 27, 2018

ximenzaoshi commented Mar 1, 2018

gyuho commented Mar 1, 2018

kannanvr commented May 28, 2020

ximenzaoshi commented Jan 17, 2018 •

edited

Loading

ximenzaoshi commented Jan 19, 2018 •

edited

Loading