Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High GC CPU consumption when running sysbench point select #25573

Closed
sticnarf opened this issue Jun 18, 2021 · 15 comments
Closed

High GC CPU consumption when running sysbench point select #25573

sticnarf opened this issue Jun 18, 2021 · 15 comments
Assignees
Labels
sig/sql-infra SIG: SQL Infra type/enhancement The issue or PR belongs to an enhancement.

Comments

@sticnarf
Copy link
Contributor

Bug Report

1. Minimal reproduce step (Required)

3 TiDB under one haproxy, 3 TiKV

sysbench 16 tables 10,000,000 rows per table

Run sysbench point select.

2. What did you expect to see? (Required)

GC should not consume much CPU

3. What did you see instead (Required)

It is possible that gcBgMarkWorker consumes much CPU:
图片

But it cannot be reproduced all the time.

4. What is your TiDB version? (Required)

TiDB 5.1 compiled with Go 1.16.4

@sticnarf sticnarf added the type/bug The issue is confirmed as a bug. label Jun 18, 2021
@sticnarf
Copy link
Contributor Author

sticnarf commented Jun 18, 2021

In this profile zip, two TiDB instances (38_135.prof, 39_41.prof) encounter the issue, while the other one (35_167.prof) does not.
The archive also contains heap profiling result.
510_prof.zip

@zhangjinpeng87
Copy link
Contributor

any update?

@zhangjinpeng87
Copy link
Contributor

Who is following this issue?

@sticnarf
Copy link
Contributor Author

sticnarf commented Jun 21, 2021

@zhangjinpeng1987

The cause is that the strategy of triggering a GC in golang is too trivial:

A collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The default is GOGC=100.

When running a workload with low memory usage (e.g. point select), the Go runtime triggers GC collections too frequently, leading to high CPU usage.

It seems impossible to solve the problem in the real world as long as we use Go. Adjusting GOGC to a very high value is unreasonable in practice because TiDB also needs to support workloads that use lots of memory.

And Go does not support advanced strategies of GC triggering. It is by design because the Go team just wants it to be simple.

Related to golang/go#42430

@zhangjinpeng87
Copy link
Contributor

@sticnarf Can we close this issue?

@sticnarf
Copy link
Contributor Author

This is a case that needs to be optimized. But we don't have a good solution yet.

So I would like to keep this issue open, but we can lower the severity to minor because it is not a real performance regression. cc @aytrack

@tiancaiamao
Copy link
Contributor

GOGC=100 is the default value of the Go runtime.

What we should do is optimize the object allocation ...
I will take a look

@sticnarf
Copy link
Contributor Author

sticnarf commented Jun 25, 2021

@tiancaiamao
I don't find it is a real regression for TiDB 5.1. It can also happen on older versions of TiDB.

My thought is that it is actually unnecessary for the Go runtime to collect the garbage that fast. For a server-side application which usually owns all resources of the machine, it is not so meaningful to keep the actual memory usage low. In this case, lower the frequency of GC by increasing GOGC greatly improves the performance.

It is more like a workaround to optimize object allocation...

@tiancaiamao
Copy link
Contributor

Keep the GC CPU less than 25% of the total throughput is very, very important.
You can imagine the the Go runtime dedicate part of the CPU resource to the Mark&Sweep. However, when the GC speed can not catch up with the allocation speed, the runtime has to slow down the application, ask the goroutines to help with the recycling. That makes the application slower and cause latency jitter.

I find some places to optimize the allocation, but it's more important to find a way to observe those performance changes.
#25754

Without micro performance benchmark, even though we can do some tiny optimizations when the problem emerge, we can't stop it from happening. And everyone change the code base and every change might be suspectable, we can't rely on someone to optimize it occasionally.

@tiancaiamao
Copy link
Contributor

Increase GOGC is a trade-off, the process use more RAM for better GC throughput.

@dbsid
Copy link
Contributor

dbsid commented Jun 25, 2021

should we increase GOGC from the default value, in order to mitigate this issue? I think most cases we want TiDB to be more aggressive on memory allocation.

@tiancaiamao
Copy link
Contributor

should we increase GOGC from the default value, in order to mitigate this issue?

I don't think this is the best solution.
Maybe for some of the users that is a good option, but for some other users that change is not acceptable for them.

We should optimize TiDB to reduce unnecessary object allocation.

@tiancaiamao
Copy link
Contributor

tiancaiamao commented Jul 6, 2021

Some hacks to check how far we can go.
The observation is, if we can cut down the allocs/op by 43% (from 101 -> 67),
we can get a performance improvement near 35%(from 9578 -> 6195),
That is really a big deal, we should do it!

BenchmarkPreparedPointGet-16    	  115844	      9578 ns/op	    7792 B/op	     101 allocs/op


EnableCollectExecutionInfo: false
BenchmarkPreparedPointGet-16    	  132540	      8613 ns/op	    7224 B/op	      89 allocs/op


Avoid allocation in func (txn *LazyTxn) recreateTxnInfo()
BenchmarkPreparedPointGet-16    	 4501532	      7929 ns/op	    6973 B/op	      87 allocs/op
BenchmarkPreparedPointGet-16    	 4485886	      7964 ns/op	    6845 B/op	      86 allocs/op


Avoid  resolvedLocks allocation in snapshot
BenchmarkPreparedPointGet-16    	 4558528	      8039 ns/op	    6621 B/op	      81 allocs/op

Allocate KVSnapshot directly from  KVTxn
BenchmarkPreparedPointGet-16    	 4580366	      7869 ns/op	    6605 B/op	      80 allocs/op

Allocate UnionStore from KVTxn
BenchmarkPreparedPointGet-16    	 4611040	      7782 ns/op	    6613 B/op	      79 allocs/op

About ResetStmtCtx

CTEStorageMap optimize
Avoid ClearStmtVars() everytime
remove the mapping id => tableinfo in tikvTxn
BenchmarkPreparedPointGet-16    	 4602043	      7682 ns/op	    6469 B/op	      76 allocs/op

Avoid the memtracker allocation
BenchmarkPreparedPointGet-16    	 4749812	      7488 ns/op	    6437 B/op	      72 allocs/op

Avoid SysVar  Clone()
BenchmarkPreparedPointGet-16    	 4737213	      7291 ns/op	    6245 B/op	      71 allocs/op

Allocate StmtCtx directly from  SessionVars 
BenchmarkPreparedPointGet-16    	 5254296	      7008 ns/op	    5221 B/op	      70 allocs/op

Optimize planbuilder object
BenchmarkPreparedPointGet-16    	 5242168	      6925 ns/op	    5173 B/op	      69 allocs/op

memtracker do not handle oom action allocation
BenchmarkPreparedPointGet-16    	 5036514	      7116 ns/op	    5109 B/op	      67 allocs/op

adapter.go InitTxnWithStartTS     duplicates with   point_get.go Open, the both create  txn / snapshot 
BenchmarkPreparedPointGet-16    	 5577004	      6390 ns/op	    4300 B/op	      60 allocs/op

Avoid the  ExecStmt object allocation
BenchmarkPreparedPointGet-16    	 5754445	      6272 ns/op	    4092 B/op	      59 allocs/op

Cache  ExecuteStmt PlanBuilder  directly in  session
BenchmarkPreparedPointGet-16    	 5824114	      6196 ns/op	    3612 B/op	      57 allocs/op

@bb7133 bb7133 removed severity/major type/bug The issue is confirmed as a bug. labels Jul 22, 2021
@bb7133
Copy link
Member

bb7133 commented Jul 22, 2021

I do not think this is a bug and changed it to 'enhancment' instead.

@tiancaiamao
Copy link
Contributor

I think we can close it now.
We're monitoring the performance changes now, if something goes wrong, we will find it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

6 participants