Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-calculate capacity required in bindings hashmaps for each scope #76

Open
overlookmotel opened this issue Jul 23, 2024 · 1 comment

Comments

@overlookmotel
Copy link

overlookmotel commented Jul 23, 2024

oxc-project/oxc#4367 added an AST traversal at start of building Semantic, to count how many nodes, scopes, symbols, and references there are in the AST.

It uses these counts to pre-allocate sufficient capacity in AstNodes, ScopeTree and SymbolTable before the main traversal which populates these structures with data. This avoids these structures needing to grow during the main traversal, with all the memory-copying involved in that. The result was a big boost for performance.

@Dunqing suggested:

Maybe we can do more, even count the number of each scope bindings?

I replied:

Yes, maybe that would be good - could size the bindings hash map for each scope correctly up front. But where do we store those "bindings per scope" counts? Would have to be in a Vec which we don't know the right size for until Counter pass has finished, so it'd resize all the time - i.e. that creates the same kind of problem again that this PR solves!

I can now see a way around this problem.

ScopeId is just a wrapper around u32, and scope_id fields of AST nodes are unused at this point. So we could store the binding counts in scope_id fields. In SemanticBuilder's main AST pass, it'd retrieve the binding count from scope_id field, use that count to initialize Bindings for that scope with appropriate capacity, and then write actual ScopeId into scope_id field.

A bit hacky to misuse scope_id fields in this way, but should work.

What we don't know is whether the gain of pre-allocating Bindings hashmaps to required capacity will outweigh the cost of calculating the counts. Maybe it won't because most scopes contain less than 4 bindings anyway (which is probably default capacity of HashMap), and these hashmaps are usually fairly small so growth is not so costly. But probably worth trying and measuring it.

NB: If we do try this, we may need to take into account the load factor of HashMap. To be able to store 8 items in a hashmap requires the hashmap to have space for more than 8 elements. Docs for HashMap::with_capacity suggest it takes load factor into account, but we should make sure or we may misread benchmark results as it won't be doing quite what we think it is.

@Dunqing
Copy link
Member

Dunqing commented Jul 23, 2024

I can now see a way around this problem.

ScopeId is just a wrapper around u32, and scope_id fields of AST nodes are unused at this point. So we could store the binding counts in scope_id fields. In SemanticBuilder's main AST pass, it'd retrieve the binding count from scope_id field, use that count to initialize Bindings for that scope with appropriate capacity, and then write actual ScopeId into scope_id field.

A bit hacky to misuse scope_id fields in this way, but should work.

What we don't know is whether the gain of pre-allocating Bindings hashmaps to required capacity will outweigh the cost of calculating the counts. Maybe it won't because most scopes contain less than 4 bindings anyway (which is probably default capacity of HashMap), and these hashmaps are usually fairly small so growth is not so costly. But probably worth trying and measuring it.

NB: If we do try this, we may need to take into account the load factor of HashMap. To be able to store 8 items in a hashmap requires the hashmap to have space for more than 8 elements. Docs for HashMap::with_capacity suggest it takes load factor into account, but we should make sure or we may misread benchmark results as it won't be doing quite what we think it is.

Clever! I'm looking forward to seeing you try this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants