High GC CPU consumption when running sysbench point select #25573

sticnarf · 2021-06-18T10:43:02Z

Bug Report

1. Minimal reproduce step (Required)

3 TiDB under one haproxy, 3 TiKV

sysbench 16 tables 10,000,000 rows per table

Run sysbench point select.

2. What did you expect to see? (Required)

GC should not consume much CPU

3. What did you see instead (Required)

It is possible that gcBgMarkWorker consumes much CPU:

But it cannot be reproduced all the time.

4. What is your TiDB version? (Required)

TiDB 5.1 compiled with Go 1.16.4

The text was updated successfully, but these errors were encountered:

sticnarf · 2021-06-18T10:46:20Z

In this profile zip, two TiDB instances (38_135.prof, 39_41.prof) encounter the issue, while the other one (35_167.prof) does not.
The archive also contains heap profiling result.
510_prof.zip

zhangjinpeng87 · 2021-06-21T02:03:09Z

any update?

zhangjinpeng87 · 2021-06-21T02:03:26Z

Who is following this issue?

sticnarf · 2021-06-21T05:23:31Z

@zhangjinpeng1987

The cause is that the strategy of triggering a GC in golang is too trivial:

A collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The default is GOGC=100.

When running a workload with low memory usage (e.g. point select), the Go runtime triggers GC collections too frequently, leading to high CPU usage.

It seems impossible to solve the problem in the real world as long as we use Go. Adjusting GOGC to a very high value is unreasonable in practice because TiDB also needs to support workloads that use lots of memory.

And Go does not support advanced strategies of GC triggering. It is by design because the Go team just wants it to be simple.

Related to golang/go#42430

zhangjinpeng87 · 2021-06-22T12:08:57Z

@sticnarf Can we close this issue?

sticnarf · 2021-06-23T05:54:40Z

This is a case that needs to be optimized. But we don't have a good solution yet.

So I would like to keep this issue open, but we can lower the severity to minor because it is not a real performance regression. cc @aytrack

tiancaiamao · 2021-06-25T01:42:03Z

GOGC=100 is the default value of the Go runtime.

What we should do is optimize the object allocation ...
I will take a look

sticnarf · 2021-06-25T04:39:41Z

@tiancaiamao
I don't find it is a real regression for TiDB 5.1. It can also happen on older versions of TiDB.

My thought is that it is actually unnecessary for the Go runtime to collect the garbage that fast. For a server-side application which usually owns all resources of the machine, it is not so meaningful to keep the actual memory usage low. In this case, lower the frequency of GC by increasing GOGC greatly improves the performance.

It is more like a workaround to optimize object allocation...

tiancaiamao · 2021-06-25T06:19:48Z

Keep the GC CPU less than 25% of the total throughput is very, very important.
You can imagine the the Go runtime dedicate part of the CPU resource to the Mark&Sweep. However, when the GC speed can not catch up with the allocation speed, the runtime has to slow down the application, ask the goroutines to help with the recycling. That makes the application slower and cause latency jitter.

I find some places to optimize the allocation, but it's more important to find a way to observe those performance changes.
#25754

Without micro performance benchmark, even though we can do some tiny optimizations when the problem emerge, we can't stop it from happening. And everyone change the code base and every change might be suspectable, we can't rely on someone to optimize it occasionally.

tiancaiamao · 2021-06-25T06:22:39Z

Increase GOGC is a trade-off, the process use more RAM for better GC throughput.

dbsid · 2021-06-25T07:22:14Z

should we increase GOGC from the default value, in order to mitigate this issue? I think most cases we want TiDB to be more aggressive on memory allocation.

tiancaiamao · 2021-06-25T07:27:03Z

should we increase GOGC from the default value, in order to mitigate this issue?

I don't think this is the best solution.
Maybe for some of the users that is a good option, but for some other users that change is not acceptable for them.

We should optimize TiDB to reduce unnecessary object allocation.

tiancaiamao · 2021-07-06T03:18:08Z

Some hacks to check how far we can go.
The observation is, if we can cut down the allocs/op by 43% (from 101 -> 67),
we can get a performance improvement near 35%(from 9578 -> 6195),
That is really a big deal, we should do it!

BenchmarkPreparedPointGet-16    	  115844	      9578 ns/op	    7792 B/op	     101 allocs/op


EnableCollectExecutionInfo: false
BenchmarkPreparedPointGet-16    	  132540	      8613 ns/op	    7224 B/op	      89 allocs/op


Avoid allocation in func (txn *LazyTxn) recreateTxnInfo()
BenchmarkPreparedPointGet-16    	 4501532	      7929 ns/op	    6973 B/op	      87 allocs/op
BenchmarkPreparedPointGet-16    	 4485886	      7964 ns/op	    6845 B/op	      86 allocs/op


Avoid  resolvedLocks allocation in snapshot
BenchmarkPreparedPointGet-16    	 4558528	      8039 ns/op	    6621 B/op	      81 allocs/op

Allocate KVSnapshot directly from  KVTxn
BenchmarkPreparedPointGet-16    	 4580366	      7869 ns/op	    6605 B/op	      80 allocs/op

Allocate UnionStore from KVTxn
BenchmarkPreparedPointGet-16    	 4611040	      7782 ns/op	    6613 B/op	      79 allocs/op

About ResetStmtCtx

CTEStorageMap optimize
Avoid ClearStmtVars() everytime
remove the mapping id => tableinfo in tikvTxn
BenchmarkPreparedPointGet-16    	 4602043	      7682 ns/op	    6469 B/op	      76 allocs/op

Avoid the memtracker allocation
BenchmarkPreparedPointGet-16    	 4749812	      7488 ns/op	    6437 B/op	      72 allocs/op

Avoid SysVar  Clone()
BenchmarkPreparedPointGet-16    	 4737213	      7291 ns/op	    6245 B/op	      71 allocs/op

Allocate StmtCtx directly from  SessionVars 
BenchmarkPreparedPointGet-16    	 5254296	      7008 ns/op	    5221 B/op	      70 allocs/op

Optimize planbuilder object
BenchmarkPreparedPointGet-16    	 5242168	      6925 ns/op	    5173 B/op	      69 allocs/op

memtracker do not handle oom action allocation
BenchmarkPreparedPointGet-16    	 5036514	      7116 ns/op	    5109 B/op	      67 allocs/op

adapter.go InitTxnWithStartTS     duplicates with   point_get.go Open, the both create  txn / snapshot 
BenchmarkPreparedPointGet-16    	 5577004	      6390 ns/op	    4300 B/op	      60 allocs/op

Avoid the  ExecStmt object allocation
BenchmarkPreparedPointGet-16    	 5754445	      6272 ns/op	    4092 B/op	      59 allocs/op

Cache  ExecuteStmt PlanBuilder  directly in  session
BenchmarkPreparedPointGet-16    	 5824114	      6196 ns/op	    3612 B/op	      57 allocs/op

bb7133 · 2021-07-22T06:01:54Z

I do not think this is a bug and changed it to 'enhancment' instead.

tiancaiamao · 2021-08-25T06:50:07Z

I think we can close it now.
We're monitoring the performance changes now, if something goes wrong, we will find it soon.

sticnarf added the type/bug The issue is confirmed as a bug. label Jun 18, 2021

aytrack added severity/major sig/sql-infra SIG: SQL Infra labels Jun 18, 2021

tiancaiamao self-assigned this Jun 25, 2021

tiancaiamao mentioned this issue Jun 25, 2021

Micro benchmark regularly and display benchmark result #25754

Closed

tiancaiamao mentioned this issue Jul 6, 2021

*: reduce ResetContextOfStmt() and NewRuntimeStatsColl() object allocation #26007

Closed

youjiali1995 mentioned this issue Jul 7, 2021

TiDB v5.1.0 consumes more CPU than v5.0.2 in ycsb workload-e #25726

Open

tiancaiamao mentioned this issue Jul 16, 2021

sessionctx/variable: avoid SysVar clone every time when visiting system variable #26308

Merged

9 tasks

bb7133 added the type/enhancement The issue or PR belongs to an enhancement. label Jul 22, 2021

bb7133 removed severity/major type/bug The issue is confirmed as a bug. labels Jul 22, 2021

tiancaiamao closed this as completed Aug 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High GC CPU consumption when running sysbench point select #25573

High GC CPU consumption when running sysbench point select #25573

sticnarf commented Jun 18, 2021

sticnarf commented Jun 18, 2021 •

edited

Loading

zhangjinpeng87 commented Jun 21, 2021

zhangjinpeng87 commented Jun 21, 2021

sticnarf commented Jun 21, 2021 •

edited

Loading

zhangjinpeng87 commented Jun 22, 2021

sticnarf commented Jun 23, 2021

tiancaiamao commented Jun 25, 2021

sticnarf commented Jun 25, 2021 •

edited

Loading

tiancaiamao commented Jun 25, 2021

tiancaiamao commented Jun 25, 2021

dbsid commented Jun 25, 2021

tiancaiamao commented Jun 25, 2021

tiancaiamao commented Jul 6, 2021 •

edited

Loading

bb7133 commented Jul 22, 2021

tiancaiamao commented Aug 25, 2021

High GC CPU consumption when running sysbench point select #25573

High GC CPU consumption when running sysbench point select #25573

Comments

sticnarf commented Jun 18, 2021

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

sticnarf commented Jun 18, 2021 • edited Loading

zhangjinpeng87 commented Jun 21, 2021

zhangjinpeng87 commented Jun 21, 2021

sticnarf commented Jun 21, 2021 • edited Loading

zhangjinpeng87 commented Jun 22, 2021

sticnarf commented Jun 23, 2021

tiancaiamao commented Jun 25, 2021

sticnarf commented Jun 25, 2021 • edited Loading

tiancaiamao commented Jun 25, 2021

tiancaiamao commented Jun 25, 2021

dbsid commented Jun 25, 2021

tiancaiamao commented Jun 25, 2021

tiancaiamao commented Jul 6, 2021 • edited Loading

bb7133 commented Jul 22, 2021

tiancaiamao commented Aug 25, 2021

sticnarf commented Jun 18, 2021 •

edited

Loading

sticnarf commented Jun 21, 2021 •

edited

Loading

sticnarf commented Jun 25, 2021 •

edited

Loading

tiancaiamao commented Jul 6, 2021 •

edited

Loading