A mish-mash of micro-optimizations #113116

nnethercote · 2023-06-28T07:29:59Z

These were aimed at speeding up LLVM codegen, but ended up affecting other places as well.

r? @bjorn3

rustbot · 2023-06-28T07:30:04Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

nnethercote · 2023-06-28T07:30:39Z

@bors try @rust-timer queue

bors · 2023-06-28T07:30:49Z

⌛ Trying commit 55e83afbd09a0533be2eb0be351787bb33cde518 with merge aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513...

bors · 2023-06-28T08:47:52Z

☀️ Try build successful - checks-actions
Build commit: aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513 (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513)

bors · 2023-06-28T08:47:52Z

☀️ Try build successful - checks-actions
Build commit: aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513 (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513)

oli-obk · 2023-06-28T08:57:53Z

@rust-timer build aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513

rust-timer · 2023-06-28T10:25:56Z

Finished benchmarking commit (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.9%	[-1.8%, -0.4%]	11
Improvements ✅ (secondary)	-1.4%	[-2.0%, -0.9%]	19
All ❌✅ (primary)	-0.9%	[-1.8%, -0.4%]	11

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.2%	[2.2%, 5.0%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.6%	[-1.6%, -1.6%]	1
All ❌✅ (primary)	3.2%	[2.2%, 5.0%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.9%	[1.9%, 1.9%]	1
Improvements ✅ (primary)	-1.9%	[-2.9%, -1.4%]	5
Improvements ✅ (secondary)	-5.6%	[-8.8%, -2.8%]	11
All ❌✅ (primary)	-1.9%	[-2.9%, -1.4%]	5

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 662.942s -> 662.822s (-0.02%)

`lookup_debug_loc` calls `SourceMap::lookup_line`, which does a binary search over the files, and then a binary search over the lines within the found file. It then calls `SourceFile::line_begin_pos`, which redoes the binary search over the lines within the found file. This commit removes the second binary search over the lines, instead getting the line starting pos directly using the result of the first binary search over the lines. (And likewise for `get_span_loc`, in the cranelift backend.)

`lookup_debug_loc` finds a file, line, and column, which requires two binary searches. But this call site only needs the file. This commit replaces the call with `lookup_source_file`, which does a single binary search.

This makes it (a) a little simpler, and (b) more similar to `SourceFile::lookup_line`.

I don't know why `SmallStr` was used here; some ad hoc profiling showed this code is not that hot, the string is usually empty, and when it's not empty it's usually very short. However, the use of a `SmallStr<1024>` does result in 1024 byte `memcpy` call on each execution, which shows up when I do `memcpy` profiling. So using a normal string makes the code both simpler and very slightly faster.

It no longer has any uses. If it's needed in the future, it can be easily reinstated. Or a crate such as `smallstr` can be used, much like we use `smallvec`.

Other callsites already do this, but these two were missed. This avoids some allocations.

They never have a length of more than two. So this commit changes them to `SmallVec<[_; 2]>`. Also, we possibly push `None` values and then filter those `None` values out again with `retain`. So this commit removes the `retain` and instead only pushes the values if they are `Some(_)`.

After the last commit, they contain `Option<&OperandBundleDef<'a>>` but the values are always `Some(_)`. This commit removes the needless `Option` wrapper. This also simplifies the type signatures of `LLVMRustBuild{Invoke,Call}`, which were relying on the fact that the represention of `Option<&T>` is the same as `&T` for non-`None` values.

`DerefChecker` can just hold a reference instead. This avoids quite a lot of allocations for some benchmarks.

nnethercote · 2023-06-29T02:15:58Z

The walltime/cycles improvements of up to 8% for tt-muncher look spurious; I think it has just become noisy on those metrics. Everything else should be real, though. I think the last commit ("Avoid cloning LocalDecls) is the most effective change.

compiler/rustc_codegen_llvm/src/builder.rs

oli-obk · 2023-06-29T10:50:45Z

r? @oli-obk

@bors r+

bors · 2023-06-29T10:50:47Z

📌 Commit 7e786e8 has been approved by oli-obk

It is now in the queue for this repository.

bors · 2023-06-30T00:35:23Z

⌛ Testing commit 7e786e8 with merge 8aed93d...

bors · 2023-06-30T02:59:29Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 8aed93d to master...

rust-timer · 2023-06-30T05:20:06Z

Finished benchmarking commit (8aed93d): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-1.7%, -0.3%]	16
Improvements ✅ (secondary)	-1.4%	[-2.3%, -0.8%]	15
All ❌✅ (primary)	-0.8%	[-1.7%, -0.3%]	16

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.4%, 1.0%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.0%	[-2.0%, -2.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-2.0%, 1.0%]	5

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.5%, 0.5%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.5%, 0.5%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 662.146s -> 661.088s (-0.16%)

A mish-mash of micro-optimizations These were aimed at speeding up LLVM codegen, but ended up affecting other places as well. r? `@bjorn3`

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 28, 2023

nnethercote marked this pull request as draft June 28, 2023 07:30

This comment has been minimized.

Sign in to view

nnethercote added 9 commits June 29, 2023 11:26

Replace a lookup_debug_loc call.

b4c6e19

`lookup_debug_loc` finds a file, line, and column, which requires two binary searches. But this call site only needs the file. This commit replaces the call with `lookup_source_file`, which does a single binary search.

Use partition_point in SourceMap::lookup_source_file_idx.

45fcd1d

This makes it (a) a little simpler, and (b) more similar to `SourceFile::lookup_line`.

Remove SmallStr.

f2d863f

It no longer has any uses. If it's needed in the future, it can be easily reinstated. Or a crate such as `smallstr` can be used, much like we use `smallvec`.

Set capacity of the string passed to push_item_name.

d20b1a8

Other callsites already do this, but these two were missed. This avoids some allocations.

Avoid cloning LocalDecls.

7e786e8

`DerefChecker` can just hold a reference instead. This avoids quite a lot of allocations for some benchmarks.

nnethercote force-pushed the codegen-opts branch from 55e83af to 7e786e8 Compare June 29, 2023 02:13

rustbot assigned bjorn3 Jun 29, 2023

nnethercote marked this pull request as ready for review June 29, 2023 02:26

nnethercote changed the title ~~Codegen optimizations~~ A mish-mash of micro-optimizations Jun 29, 2023

oli-obk reviewed Jun 29, 2023

View reviewed changes

compiler/rustc_codegen_llvm/src/builder.rs Outdated Show resolved Hide resolved

oli-obk approved these changes Jun 29, 2023

View reviewed changes

rustbot assigned oli-obk and unassigned bjorn3 Jun 29, 2023

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 29, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 30, 2023

bors merged commit 8aed93d into rust-lang:master Jun 30, 2023
11 checks passed

rustbot added this to the 1.72.0 milestone Jun 30, 2023

nnethercote deleted the codegen-opts branch July 3, 2023 06:22

matthiaskrgr mentioned this pull request Apr 28, 2024

ICE: inconsistent resolution for an import #124490

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A mish-mash of micro-optimizations #113116

A mish-mash of micro-optimizations #113116

nnethercote commented Jun 28, 2023 •

edited

Loading

rustbot commented Jun 28, 2023

nnethercote commented Jun 28, 2023

bors commented Jun 28, 2023

bors commented Jun 28, 2023

bors commented Jun 28, 2023

oli-obk commented Jun 28, 2023

This comment has been minimized.

rust-timer commented Jun 28, 2023

nnethercote commented Jun 29, 2023

oli-obk commented Jun 29, 2023

bors commented Jun 29, 2023

bors commented Jun 30, 2023

bors commented Jun 30, 2023

rust-timer commented Jun 30, 2023

A mish-mash of micro-optimizations #113116

A mish-mash of micro-optimizations #113116

Conversation

nnethercote commented Jun 28, 2023 • edited Loading

rustbot commented Jun 28, 2023

nnethercote commented Jun 28, 2023

bors commented Jun 28, 2023

bors commented Jun 28, 2023

bors commented Jun 28, 2023

oli-obk commented Jun 28, 2023

This comment has been minimized.

rust-timer commented Jun 28, 2023

Overall result: ✅ improvements - no action needed

nnethercote commented Jun 29, 2023

oli-obk commented Jun 29, 2023

bors commented Jun 29, 2023

bors commented Jun 30, 2023

bors commented Jun 30, 2023

rust-timer commented Jun 30, 2023

Overall result: ✅ improvements - no action needed

nnethercote commented Jun 28, 2023 •

edited

Loading