-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak: looks like KeyValue Unmarshal or txReaderBuffer cause #12256
Comments
can you provide more info,such as etcd version,using v2 or v3,etcd log,etcd metrics? |
hi tangcong, etcd master branch, newer than 3.4.10. |
it seems caused by concurrently readtx buffer and larger query response. can you provide db_open_read_transactions metric value? |
How many keys does your cluster have? why there is no etcd log? slow query will print warning message. When did you restart the apiserver, process_open_fds did not drop significantly? |
this is a test etcd cluster for 10k k8s nodes, not online, so we not logger the etcd log. |
maybe I catch the bug, it's really the txn read buffer bug cause the memory leak, |
I make test, watch the memory use, I find etcd occupy the big memory by txn read buffer, if there have other read OP, it will allocate memory from [HeapSys] to [HeapInuse]/[HeapAlloc], I have 2 question,
runtime.MemStats.HeapInuse, value, 1.68GB
|
maybe the resolve method is let k8s to modify. |
Hi tangcong, I have some question, hope you can disabuse me.
|
txn read buffer only has recently write data,backend will commit data per 100ms(default) and reset read buffer,so in the case of normal commit data and db_open_read_transactions value is small , the buffer does not contain too much data. The key issue is that you have to avoid ranging all data and it costs too much memory. please follow some best practices and see k8s sla. |
Hi tangcong, If read buffer have little cache data, why pprof show the most memory allocate through func (bb *bucketBuffer) Copy()??? |
can you reproduce it stably? we can add some debug logs to see if commit is blocked and print buffer size.
|
Hi tangcong, My thinking is wrong, the most memory used is truely storeTxnRead.rangeKeys()->KeyValue.Unmarshal(). txReadBuffer.unsaftCopy()->bucketBuffer.Copy(), this we only can see in heap_alloc_space, it is an accumulated memory cost. So you are right by "The key issue is that you have to avoid ranging all data and it costs too much memory." The key point is not txn ReadBuffer Copy(), but is range all data cause storeTxnRead.rangeKeys() costs too much memory. |
good. got it. rate-limiting is necessary for expensive read and it can avoid costing too much memory. |
hi yangxuanjia I encountered a difficulty, I cannot use
please help me. thanks |
curl -k --cert client.pem --key client-key.pem 'https://172.28.200.99:2379/debug/pprof/heap' -o ppppp1.dump |
k8s range all data in lots of concurrent connections, memory raise rapidly and never decline, may be memory leak.
we see the memory raise to 110G or 240G on physical machine, and never decline.
we stop all k8s server and restart the k8s, but do nothing. the high memory always keep in etcd.
Centos8.1
ETCD master
The text was updated successfully, but these errors were encountered: