perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4328

Dunqing · 2024-07-17T15:30:38Z

Note by @overlookmotel 20/7/24: Even though this PR shows as merged, that was by some weird malfunction of Github. It was immediately reverted, and the merge scrubbed from history of main branch. #4367 continues the work.

codspeed-hq · 2024-07-17T15:39:08Z

CodSpeed Performance Report

Merging #4328 will improve performances by 30.11%

_{Comparing test-visit (8a2971b) with main (f68b659)}

Summary

⚡ 12 improvements
✅ 20 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`test-visit`	Change
⚡	`minifier[antd.js]`	344.3 ms	304.6 ms	+13.05%
⚡	`minifier[react.development.js]`	3.9 ms	3.5 ms	+10.46%
⚡	`minifier[typescript.js]`	692 ms	616.4 ms	+12.27%
⚡	`semantic[RadixUIAdoptionSection.jsx]`	108.8 µs	102 µs	+6.71%
⚡	`semantic[antd.js]`	175.7 ms	135 ms	+30.11%
⚡	`semantic[cal.com.tsx]`	56.4 ms	48.8 ms	+15.47%
⚡	`semantic[checker.ts]`	104.9 ms	83.5 ms	+25.56%
⚡	`semantic[pdf.mjs]`	26.5 ms	22.2 ms	+19.36%
⚡	`transformer[antd.js]`	298.5 ms	258 ms	+15.7%
⚡	`transformer[cal.com.tsx]`	75.4 ms	65.4 ms	+15.34%
⚡	`transformer[checker.ts]`	170.2 ms	152.1 ms	+11.91%
⚡	`transformer[pdf.mjs]`	45.9 ms	39.9 ms	+15.21%

overlookmotel · 2024-07-17T15:46:26Z

@Dunqing Is this related to oxc-project/backlog#31?

If so, a couple of thoughts:

I'd imagined counting them in parser, rather than doing a whole extra AST pass in Semantic.
But even doing it in Semantic, as in this PR, if you use the collected counts to initialize ScopeTree, SymbolTable and Nodes's Vecs with sufficient capacity, it may turn out that delivers a perf boost which is greater than the loss shown at present on this PR. Resizing the Vecs is really costly. So overall it could be positive.

graphite-app · 2024-07-17T15:50:43Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Dunqing · 2024-07-17T15:59:46Z

@Dunqing Is this related to oxc-project/backlog#31?

If so, a couple of thoughts:

I'd imagined counting them in parser, rather than doing a whole extra AST pass in Semantic.

But even doing it in Semantic, as in this PR, if you use the collected counts to initialize ScopeTree, SymbolTable and Nodes's Vecs with sufficient capacity, it may turn out that delivers a perf boost which is greater than the loss shown at present on this PR. Resizing the Vecs is really costly. So overall it could be positive.

Yes, We'd better be able to do this in parser, and can expect this to be a sizeable performance gain

overlookmotel · 2024-07-17T18:10:42Z

Wow! This is banging! Much more than I expected.

Only problem is semantic[RadixUIAdoptionSection.jsx]. Personally, I think that particular benchmark is quite important because it's the most representative of actual application code.

I think the reason it's suffering is:

The loss is the extra traversal in Collect.
The gain is the Vecs not growing, and avoiding the memory copying involved.
The cost of growing the Vecs is massive when they're big, not so much when they're small.
So for the large file benchmarks, the gain outweighs the loss, but for little RadixUIAdoptionSection.jsx, the loss is the stronger factor.

In my view we should not merge this as is because of the significant negative effect on that 1 benchmark. But what this experiment has done is show us there's a massive speed boost there for the taking, and that we should probably prioritize it above other perf work.

Boshen · 2024-07-17T18:17:32Z

I'm in favor of merging this and have a new issue on removing this ast pass, which is a clearer task.

overlookmotel · 2024-07-17T18:22:15Z

Does Rolldown's process involve running Semantic at all? If it does, merging this I'd imagine could hurt them, because RadixUIAdoptionSection.jsx is probably the closest of our benchmarks to real-world app code Rolldown processes.

A compromise would be to use a heuristic to enable/not enable the extra pass. e.g. only run on source files larger than 10 KB. That'd probably put us in the green for all benchmarks. What do you think of that option?

DonIsaac · 2024-07-17T21:24:17Z

I've opened a similar PR tackling this in a slightly different way: #4332. Sorry I didn't know what this one was doing, I was a little confused by the title! #4332 uses source text length as a heuristic for reserving memory instead of performing a traversal. It seems to perform worse overall than @Dunqing's approach, which I find quite interesting.

overlookmotel · 2024-07-17T21:39:52Z

Don's approach of estimating the required size of the Vecs from source text size has pluses and minuses:

Advantage: Avoid the overhead of the Collect pass.
Disadvantage: Potentially getting the estimate wrong.

He's getting even better results on some benchmarks than this PR, but worse on others.

If I can throw in my 2 cents as well...

From my (fairly little) knowledge of OS-level memory stuff, I believe that allocating too much memory is not necessarily a problem. On architectures with virtual memory (which includes all the ones Oxc targets except WASM), allocating a huge slab of memory (e.g. 1 GB) doesn't actually consume 1 GB of physical memory. It only consumes 1 GB of virtual memory space. Pages of physical memory to "back" that virtual memory will only be allocated when that chunk of the memory is first written to. Like I said, I could be wrong about this - it's not an area I really know - but that's my rough understanding.

Which is all a long way of saying: possibly the best strategy (except on WASM) is to estimate too high because that costs very little, whereas estimating too low will lead to the Vecs reallocating, which is really expensive when they're large.

The big question though is where is this -6% perf penalty on the RadixUIAdoptionSection.jsx benchmark coming from? Don's approach is also getting -6%, despite taking a different approach. So the explanation I gave in comment above is probably wrong.

I don't really have much idea on that one. A good first step might be to figure out what the optimal size of all these Vecs is for each benchmark (i.e. how many scopes/symbols/nodes are actually needed for each) and test initializing the Vecs to exactly those sizes. That'd show us the upper bound on perf improvement we can get, and then we could measure different strategies against that, and try to figure out where the -6% is coming from. @DonIsaac would you like to test that out on your PR?

DonIsaac · 2024-07-18T01:05:16Z

Sure, I can try that out. Another possible option is to record symbols/AST nodes/etc that we find during the parse step. Thoughts?

overlookmotel · 2024-07-18T07:13:48Z

Sure, I can try that out. Another possible option is to record symbols/AST nodes/etc that we find during the parse step. Thoughts?

Yes, that is a good option. Please see my comment above. But:

Your PR perf(semantic): initialize builder storage with pre-allocated capacity #4332 has raised another way of doing it which I think is also worth investigating.
Counting everything in the parser is going to be complicated, and we should do that as a follow-up.

Sorry, I think I've made everything too complicated! As Boshen said above, let's get this PR merged, and then iterate on it.

So I propose:

Merge this PR now.
Follow-up PR ASAP to add a guard which disables the Collect pass if file is below a certain size, to get the RadixUIAdoptionSection benchmark back up to where it was before (which we must merge before next Oxc release, to avoid hurting downstream consumers).
Open another issue to discuss further improvement, where we can also consider your approach from perf(semantic): initialize builder storage with pre-allocated capacity #4332 and/or counting in parser.

@Dunqing, @DonIsaac Does that sound OK to you?

Dunqing · 2024-07-18T08:15:57Z

Merge this PR now.

Follow-up PR ASAP to add a guard which disables the Collect pass if file is below a certain size, to get the RadixUIAdoptionSection benchmark back up to where it was before (which we must merge before next Oxc release, to avoid hurting downstream consumers).

Open another issue to discuss further improvement, where we can also consider your approach from perf(semantic): initialize builder storage with pre-allocated capacity #4332 and/or counting in parser.

@Dunqing, @DonIsaac Does that sound OK to you?

Sounds good to me. Both of these approaches look good. Which one you would favor, I have no opinion on that.

Boshen · 2024-07-18T16:29:28Z

109.1 µs -> 116.6 µs

Don't forget to look at the absolute numbers, -6% is huge but 7 µs is small.

Boshen · 2024-07-18T16:30:32Z

crates/oxc_semantic/src/builder.rs

@@ -100,6 +100,39 @@ pub struct SemanticBuilderReturn<'a> {
    pub errors: Vec<OxcDiagnostic>,
 }

+#[derive(Default)]
+pub struct Collector {


We should move this to another file and have some doc comments explaning the why before merge.

Boshen · 2024-07-18T16:33:02Z

Counting everything in the parser is going to be complicated

AST builder can do the counting :-)

overlookmotel · 2024-07-18T19:06:22Z

AST builder can do the counting :-)

Yes, but currently AstBuilder is Copy which we'll need to change to make it stateful, some places bypass the builder, and counting scopes is dependent on some surrounding context because some scopes are conditional. All totally doable, but not a completely trivial amount of work.

Don't forget to look at the absolute numbers, -6% is huge but 7 µs is small.

I thought % change was more important, on grounds that a large application might contain 10,000+ files this size (particularly if you include node_modules), so the perf impact will scale. Is my logic off?

overlookmotel · 2024-07-19T11:32:35Z

@Dunqing I see you're still working on this, so going to leave you alone to get on with it. Please shout when you're happy with it and would like a review.

Dunqing · 2024-07-19T11:49:46Z

I corrected the issue with the Collector's incorrect count. Currently 100% accurate.

Dunqing · 2024-07-19T11:51:04Z

Maybe we can do more, even count the number of each scope bindings?

overlookmotel · 2024-07-19T12:09:44Z

I corrected the issue with the Collector's incorrect count. Currently 100% accurate.

I assume you're only referring to nodes count being accurate here?

Is symbol count for checker.ts right now? Previously it was an under-estimate, which is a problem.

I don't think we can make symbol (aka binding) count completely accurate without building scope tree (which would defeat the point of Collector) because of var x; var x; (2 x binding identifiers, but only 1 symbol).

As I said above, I don't think we necessarily need to make the counts 100% accurate. Over-estimating is fine. It may be that the cost of complicating Collector to make an accurate count (and so making it a bit slower) is worse than the cost of allocating too much space in the Vecs. It's under-estimating that we absolutely need to avoid.

Maybe we can do more, even count the number of each scope bindings?

Yes, maybe that would be good - could size the bindings hash map for each scope correctly up front. But where do we store those "bindings per scope" counts? Would have to be in a Vec which we don't know the right size for until Collector pass has finished, so it'd resize all the time - i.e. that creates the same kind of problem again that this PR solves! Maybe there's a way around that, but it's a bit more complicated. I'd suggest leaving that to a follow-up.

Dunqing · 2024-07-19T12:36:50Z

I assume you're only referring to nodes count being accurate here?

Is symbol count for checker.ts right now? Previously it was an under-estimate, which is a problem.

I don't think we can make symbol (aka binding) count completely accurate without building scope tree (which would defeat the point of Collector) because of var x; var x; (2 x binding identifiers, but only 1 symbol).

As I said above, I don't think we necessarily need to make the counts 100% accurate. Over-estimating is fine. It may be that the cost of complicating Collector to make an accurate count (and so making it a bit slower) is worse than the cost of allocating too much space in the Vecs. It's under-estimating that we absolutely need to avoid.

I got you mean. You are right. The counting is not complicated as it stands, we are not dealing with var x, var x so comparing the actual number is an overestimate

Dunqing · 2024-07-19T12:42:38Z

Rolldown's codspeed shows 1% ~ 4% faster than before

overlookmotel · 2024-07-19T13:36:40Z

I checked counts on latest version (2nd fix: stack overflow commit):

Radix
- All estimated exactly.
antd
- Symbols over-estimated a bit (no problem).
cal.com:
- Nodes slightly over-estimated (shouldn't happen).
- Symbols very over-estimated (no problem, expected).
- References over-estimated a bit (shouldn't happen but not a big deal).
checker:
- Nodes slightly over-estimated (shouldn't happen).
- Symbols slightly under-estimated (problem).
- References over-estimated a bit (shouldn't happen but not a big deal).
pdf.mjs
- All estimated exactly.

The most problematic one is Collector undercounting scopes in checker.ts. That means the scopes Vecs are re-allocating during SemanticBuilder's traversal, which will be a bit of a slow-down.

Dunqing · 2024-07-19T14:08:32Z

The most problematic one is Collector undercounting scopes in checker.ts. That means the scopes Vecs are re-allocating during SemanticBuilder's traversal, which will be a bit of a slow-down.

I found nodes over-estimated caused by visit_arrow_function missing visit return_type.

When #4366 is merged. Only symbols are very over-estimated.

[crates/oxc_semantic/src/builder.rs:289:13] collector = Collector {
    node: 314759,
    scope: 10554,
    symbol: 15834,
    reference: 78443,
}
[crates/oxc_semantic/src/builder.rs:292:13] Collector {
    node: self.nodes.len(),
    scope: self.scope.len(),
    symbol: self.symbols.len(),
    reference: self.symbols.references.len(),
} = Collector {
    node: 314759,
    scope: 10554,
    symbol: 15674,
    reference: 78443,
}
[Finished running. Exit status: 0]

Found in #4328 Same as #4248

Dunqing · 2024-07-19T15:34:25Z

I don’t why this PR is merged !!!! I didn't click the merge button. And I don't see a revert button

Dunqing · 2024-07-19T15:46:43Z

Reverted, reopen in #4367

overlookmotel · 2024-07-19T16:04:26Z

I don’t why this PR is merged !!!! I don't click the merge button. And I don't see a revert button

LOLs! Github loves your perf work, it couldn't resist merging.

**Related issue:** - Inspiration: oxc-project/oxc#4328

milesj · 2024-07-20T05:50:21Z

Y'all are CRACKED

…s before visiting AST (#4367) context: #4328

test: visit

c4fbba1

github-actions bot added the A-semantic Area - Semantic label Jul 17, 2024

Dunqing closed this Jul 17, 2024

feat: reserve

59e4c74

Dunqing reopened this Jul 17, 2024

overlookmotel mentioned this pull request Jul 17, 2024

perf(semantic): initialize builder storage with pre-allocated capacity #4332

Closed

overlookmotel mentioned this pull request Jul 18, 2024

Telemetry oxc-project/backlog#73

Open

Boshen reviewed Jul 18, 2024

View reviewed changes

Dunqing added 7 commits July 19, 2024 08:34

feat: improve small file

f573c34

Merge remote-tracking branch 'origin/main' into test-visit

bbf8dc3

fix

a698196

Fix

dc6b249

Fix

53bc5f5

avoid reserve for symbols

e68d3fd

U

221674f

fix: stack overflow

2992c54

fix: stack overflow

9d58b6c

Merge remote-tracking branch 'origin/main' into test-visit

a615891

kdy1 mentioned this pull request Jul 19, 2024

perf(es/minifier): Pre-allocate collections swc-project/swc#9289

Merged

fix: symbols less than collected by collector

d7b8072

Dunqing mentioned this pull request Jul 19, 2024

refactor(ast)!: reorder fields of ArrowFunctionExpression #4364

Merged

chore: add debug assert

1cec151

Boshen pushed a commit that referenced this pull request Jul 19, 2024

refactor(ast)!: reorder fields of ArrowFunctionExpression (#4364)

f68b659

Found in #4328 Same as #4248

Dunqing added 2 commits July 19, 2024 22:17

Merge branch 'main' into test-visit

062e3b3

Merge branch 'main' into test-visit

8a2971b

Dunqing merged commit 8a2971b into main Jul 19, 2024
22 of 26 checks passed

Dunqing deleted the test-visit branch July 19, 2024 15:27

Dunqing restored the test-visit branch July 19, 2024 15:27

Dunqing mentioned this pull request Jul 19, 2024

perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4367

Merged

kdy1 added a commit to swc-project/swc that referenced this pull request Jul 20, 2024

perf(es/minifier): Pre-allocate collections (#9289)

76fe139

**Related issue:** - Inspiration: oxc-project/oxc#4328

Dunqing added a commit that referenced this pull request Jul 22, 2024

perf(semantic): calculate number of nodes, scopes, symbols, reference…

40f9356

…s before visiting AST (#4367) context: #4328

This was referenced Jul 23, 2024

Count nodes, scopes, symbols, references in parser oxc-project/backlog#74

Open

Pre-calculate capacity required in bindings hashmaps for each scope oxc-project/backlog#76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4328

perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4328

Dunqing commented Jul 17, 2024 •

edited by overlookmotel

Loading

codspeed-hq bot commented Jul 17, 2024 •

edited

Loading

overlookmotel commented Jul 17, 2024 •

edited

Loading

graphite-app bot commented Jul 17, 2024

Dunqing commented Jul 17, 2024

overlookmotel commented Jul 17, 2024 •

edited

Loading

Boshen commented Jul 17, 2024

overlookmotel commented Jul 17, 2024

DonIsaac commented Jul 17, 2024

overlookmotel commented Jul 17, 2024 •

edited

Loading

DonIsaac commented Jul 18, 2024

overlookmotel commented Jul 18, 2024 •

edited

Loading

Dunqing commented Jul 18, 2024

Boshen commented Jul 18, 2024

Boshen Jul 18, 2024

Boshen commented Jul 18, 2024

overlookmotel commented Jul 18, 2024

overlookmotel commented Jul 19, 2024

Dunqing commented Jul 19, 2024

Dunqing commented Jul 19, 2024

overlookmotel commented Jul 19, 2024 •

edited

Loading

Dunqing commented Jul 19, 2024

Dunqing commented Jul 19, 2024

overlookmotel commented Jul 19, 2024

Dunqing commented Jul 19, 2024 •

edited

Loading

Dunqing commented Jul 19, 2024 •

edited

Loading

Dunqing commented Jul 19, 2024 •

edited

Loading

overlookmotel commented Jul 19, 2024

milesj commented Jul 20, 2024

perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4328

perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4328

Conversation

Dunqing commented Jul 17, 2024 • edited by overlookmotel Loading

codspeed-hq bot commented Jul 17, 2024 • edited Loading

CodSpeed Performance Report

Merging #4328 will improve performances by 30.11%

Summary

Benchmarks breakdown

overlookmotel commented Jul 17, 2024 • edited Loading

graphite-app bot commented Jul 17, 2024

Your org has enabled the Graphite merge queue for merging into main

Dunqing commented Jul 17, 2024

overlookmotel commented Jul 17, 2024 • edited Loading

Boshen commented Jul 17, 2024

overlookmotel commented Jul 17, 2024

DonIsaac commented Jul 17, 2024

overlookmotel commented Jul 17, 2024 • edited Loading

DonIsaac commented Jul 18, 2024

overlookmotel commented Jul 18, 2024 • edited Loading

Dunqing commented Jul 18, 2024

Boshen commented Jul 18, 2024

Boshen Jul 18, 2024

Choose a reason for hiding this comment

Boshen commented Jul 18, 2024

overlookmotel commented Jul 18, 2024

overlookmotel commented Jul 19, 2024

Dunqing commented Jul 19, 2024

Dunqing commented Jul 19, 2024

overlookmotel commented Jul 19, 2024 • edited Loading

Dunqing commented Jul 19, 2024

Dunqing commented Jul 19, 2024

overlookmotel commented Jul 19, 2024

Dunqing commented Jul 19, 2024 • edited Loading

Dunqing commented Jul 19, 2024 • edited Loading

Dunqing commented Jul 19, 2024 • edited Loading

overlookmotel commented Jul 19, 2024

milesj commented Jul 20, 2024

Dunqing commented Jul 17, 2024 •

edited by overlookmotel

Loading

codspeed-hq bot commented Jul 17, 2024 •

edited

Loading

overlookmotel commented Jul 17, 2024 •

edited

Loading

overlookmotel commented Jul 17, 2024 •

edited

Loading

overlookmotel commented Jul 17, 2024 •

edited

Loading

overlookmotel commented Jul 18, 2024 •

edited

Loading

overlookmotel commented Jul 19, 2024 •

edited

Loading

Dunqing commented Jul 19, 2024 •

edited

Loading

Dunqing commented Jul 19, 2024 •

edited

Loading

Dunqing commented Jul 19, 2024 •

edited

Loading