Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(sourcemap): use simd to escape JSON string #4487

Merged

Conversation

Brooooooklyn
Copy link
Contributor

@Brooooooklyn Brooooooklyn commented Jul 26, 2024

Also optimize the memory allocation in string escape. The default size in serde_json is 1024 for String type, we pre allocate string.len() * 2 + 2 for every string to reduce re-allocate in escaping.

I've tried to hand write SIMD implementation, but it's too complex, so I uses the v_jsonescape here. But it doesn't support aarch64 and wasm32 simd implementation, we need to contribute to it!

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @Brooooooklyn and the rest of your teammates on Graphite Graphite

Copy link

graphite-app bot commented Jul 26, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch from ee510b2 to e989f09 Compare July 26, 2024 15:54
Copy link

codspeed-hq bot commented Jul 26, 2024

CodSpeed Performance Report

Merging #4487 will improve performances by 3.56%

Comparing 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible (1fd9dd0) with main (affd768)

Summary

⚡ 1 improvements
✅ 31 untouched benchmarks

Benchmarks breakdown

Benchmark main 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible Change
sourcemap[cal.com.tsx] 77.2 ms 74.5 ms +3.56%

@Brooooooklyn Brooooooklyn marked this pull request as ready for review July 26, 2024 16:03
@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch 4 times, most recently from 7d44f5c to 1190237 Compare July 27, 2024 03:58
Copy link
Member

@Dunqing Dunqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflict

@overlookmotel
Copy link
Collaborator

overlookmotel commented Jul 27, 2024

SIMD! Now we're talking!

I've made a couple of quick comments, but not reviewed in full. Please don't merge before I've had a chance to do that. I'm afraid I'm tied up this weekend, so that won't be until Monday.

@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch 6 times, most recently from 284676b to 6115cc1 Compare July 28, 2024 13:06
@github-actions github-actions bot added the A-codegen Area - Code Generation label Jul 28, 2024
@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch from 6115cc1 to c6d1c3d Compare July 28, 2024 13:12
@Brooooooklyn Brooooooklyn changed the title perf(sourcemap): avoid allocate in encode if possible perf(sourcemap): use simd to escape JSON string Jul 28, 2024
@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch 2 times, most recently from 19e852f to 3957e5f Compare July 28, 2024 13:47
@Brooooooklyn Brooooooklyn marked this pull request as draft July 28, 2024 14:51
@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch 2 times, most recently from 194c148 to cd2fced Compare July 28, 2024 16:32
@Brooooooklyn Brooooooklyn marked this pull request as ready for review July 28, 2024 17:16
@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch 3 times, most recently from 673ea5a to 32fa9ec Compare July 28, 2024 17:19
@Brooooooklyn Brooooooklyn requested a review from Boshen July 28, 2024 17:19
crates/oxc_sourcemap/src/encode.rs Outdated Show resolved Hide resolved
@overlookmotel
Copy link
Collaborator

overlookmotel commented Jul 29, 2024

10% speed up!

Unfortunately, I think the way we're using v_jsonescape here may be dangerous. The crate is not well-documented, but from what I can see, it's determining at build time what CPU features (SSE/AVX) are available, and then it assumes that the CPU where the code runs has those same features.

If you build on same machine as you run the code on, no problem. But if you build a binary on a machine which has AVX2 support, that binary will crash if run on another x86 machine which only has SSE2. This seems like a bad idea - people usually expect they're building portable binaries.

build.rs:

fn main() {
    #[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
    {
        if is_x86_feature_detected!("sse2") {
            println!("cargo:rustc-cfg=v_escape_sse");
        }
        if is_x86_feature_detected!("avx2") {
            println!("cargo:rustc-cfg=v_escape_avx");
        }
    }
}

Runtime code:

pub fn b_escape<B: buf_min::Buffer>(s: &[u8], buf: &mut B) {
    #[allow(unused_unsafe)]
    unsafe {
        _b_escape(s, buf)
    }
}

Runtime code:

pub unsafe fn _b_escape<B: buf_min::Buffer>(bytes: &[u8], buf: &mut B) {
    #[cfg(not(v_escape_avx))]
    {
        #[cfg(not(v_escape_sse))]
        {
            scalar::b_escape(bytes, buf)
        }
        #[cfg(v_escape_sse)]
        {
            ranges::sse::b_escape(bytes, buf)
        }
    }
    #[cfg(v_escape_avx)]
    {
        ranges::avx::b_escape(bytes, buf)
    }
}

This problem is mentioned in zzau13/v_escape#54.

It looks like we'd be better off using v_jsonescape::encode which uses runtime CPU feature detection.

Also, do we feel confident depending on this crate? Its tests only run on whatever CI machine Github Actions gives it, so (assuming those machines are all AVX2-capable), not all the SSE2 code is being tested. I think probably this is fine, but should at least ask the question... I was wrong - the SSE2 code paths are being tested.

@overlookmotel
Copy link
Collaborator

overlookmotel commented Jul 29, 2024

Also, perhaps we should merge #4528 first (if it's correct) so we can see effect of this PR on benchmarks once the benchmark is measuring what it's meant to.

@Brooooooklyn
Copy link
Contributor Author

Unfortunately, I think the way we're using v_jsonescape here may be dangerous. The crate is not well-documented, but from what I can see, it's determining at build time what CPU features (SSE/AVX) are available, and then it assumes that the CPU where the code runs has those same features.

I agree; I can't even run cargo test successfully in the v_escape repo. I think it's worth maintaining our own simd JSON escape implementation and supporting the aarch64/wasm32 backend in it.

So let's just pre-allocate in this pr and implement JSON escape in upcoming separated PRs.

@overlookmotel
Copy link
Collaborator

overlookmotel commented Jul 29, 2024

I think we can go forwards with this PR if we switch to using v_jsonescape::encode instead of b_encode. encode uses runtime CPU feature detection, so doesn't have the problem that b_escape does.

I agree that longer term, we'd be better off maintaining our own crate for SIMD ops (lexer and various other parts of Oxc could also benefit a lot from SIMD) and support aarch64 + wasm32 (and also AVX512, which would likely be almost x2 speed boost for some workloads on CPUs that support it).

But for now, this PR is producing a significant speed boost for common case of x86_64, so I think it's worthwhile getting these changes in to Oxc (assuming we can remove the dodgy b_encode usage).

Personally, I think implementing a SIMD crate will not be a straightforward task. Not so much the code itself, but building the testing infra around it to make sure it's completely solid on all targets (some discussion of this in #2285). But maybe you're familiar @Brooooooklyn with the cross-platform testing side of things from napi-rs?

So let's just pre-allocate in this pr and implement JSON escape in upcoming separated PRs.

Yes, I agree it would be a good idea to split the pre-allocation optimization and the switch to v_jsonescape into separate PRs so we can measure the effect of those 2 changes individually.

@Brooooooklyn Brooooooklyn force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch 3 times, most recently from b3d8231 to 506565c Compare July 29, 2024 11:42
crates/oxc_sourcemap/src/encode.rs Outdated Show resolved Hide resolved
crates/oxc_sourcemap/Cargo.toml Outdated Show resolved Hide resolved
@overlookmotel overlookmotel dismissed Dunqing’s stale review July 29, 2024 12:41

The problem Dunqing raised is now solved.

@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label Jul 29, 2024
Copy link

graphite-app bot commented Jul 29, 2024

Merge activity

  • Jul 29, 8:41 AM EDT: The merge label 'merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Jul 29, 8:41 AM EDT: overlookmotel added this pull request to the Graphite merge queue.
  • Jul 29, 8:45 AM EDT: overlookmotel merged this pull request with the Graphite merge queue.

Also optimize the memory allocation in string escape. The default size in `serde_json` is 1024 for String type, we pre allocate `string.len() * 2 + 2` for every string to reduce re-allocate in escaping.

I've tried to hand write SIMD implementation, but it's too complex, so I uses the `v_jsonescape` here. But it doesn't support `aarch64` and `wasm32` simd implementation, we need to contribute to it!
@overlookmotel overlookmotel force-pushed the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch from c01ec4b to 1fd9dd0 Compare July 29, 2024 12:42
@overlookmotel
Copy link
Collaborator

overlookmotel commented Jul 29, 2024

Merging! Hooray. @Brooooooklyn I hope you haven't found my approach to review too annoyingly nit-picky. If it is, please say so. I'm new to being maintainer on a popular project. I know Boshen takes a much more light-touch "if the tests pass, merge it" approach, so I'm not sure if I'm out of line, and would appreciate any feedback.

@graphite-app graphite-app bot merged commit 1fd9dd0 into main Jul 29, 2024
23 checks passed
@graphite-app graphite-app bot deleted the 07-26-perf_sourcemap_avoid_allocate_in_encode_if_possible branch July 29, 2024 12:45
@Brooooooklyn
Copy link
Contributor Author

@overlookmotel don't worry about that, you are the best

@oxc-bot oxc-bot mentioned this pull request Aug 1, 2024
Boshen added a commit that referenced this pull request Aug 1, 2024
## [0.23.0] - 2024-08-01

- 27fd062 sourcemap: [**BREAKING**] Avoid passing `Result`s (#4541)
(overlookmotel)

### Features

- a558492 codegen: Implement `BinaryExpressionVisitor` (#4548) (Boshen)
- 7446e98 codegen: Align more esbuild implementations (#4510) (Boshen)
- 35654e6 codegen: Align operator precedence with esbuild (#4509)
(Boshen)
- b952942 linter: Add eslint/no-unused-vars (⭐ attempt 3.2) (#4445)
(DonIsaac)
- 85e8418 linter: Add react/jsx-curly-brace-presence (#3949) (Don Isaac)
- cf1854b semantic: Remove `ReferenceFlags::Value` from non-type-only
exports that referenced type binding (#4511) (Dunqing)

### Bug Fixes

- b58ed80 codegen: Enable more test cases (#4585) (Boshen)
- 6a94e3f codegen: Fixes for esbuild test cases (#4503) (Boshen)
- d5c4b19 parser: Fix enum member parsing (#4543) (DonIsaac)

### Performance

- 4c6d19d allocator: Use capacity hint (#4584) (Luca Bruno)
- 7585e16 linter: Remove allocations for string comparisons (#4570)
(DonIsaac)
- 55a8763 parser: Faster decoding unicode escapes in identifiers (#4579)
(overlookmotel)
- ae1d38f parser: Fast path for ASCII when checking char after numeric
literal (#4577) (overlookmotel)
- 56ae615 parser: Make not at EOF the hot path in `Source` methods
(#4576) (overlookmotel)
- 25679e6 parser: Optimize `Lexer::hex_digit` (#4572) (overlookmotel)
- bb33bcc parser: Speed up lexing non-decimal numbers (#4571)
(overlookmotel)
- ab8509e parser: Use `-` not `saturating_sub` (#4561) (overlookmotel)
- c9c38a1 parser: Support peeking over bytes (#4304) (lucab)
- 0870ee1 parser: Get and check lookahead token (#4534) (lucab)
- d00014e sourcemap: Elide bounds checks in VLQ encoding (#4583)
(overlookmotel)
- 1fd9dd0 sourcemap: Use simd to escape JSON string (#4487)
(Brooooooklyn)

### Documentation

- 0914e47 ast: Add doc comments to literal nodes (#4551) (DonIsaac)
- c6a11be ast: Auto-generate doc comments for AstBuilder methods (#4471)
(DonIsaac)

### Refactor

- e68ed62 parser: Convert lexer byte handler for `|` to a single match
(#4575) (overlookmotel)
- bba824b parser: Convert `Lexer::read_minus` to a single match (#4574)
(overlookmotel)
- ef5418a parser: Convert `Lexer::read_left_angle` to a single match
(#4573) (overlookmotel)
- 9e5be78 parser: Add `Lexer::consume_2_chars` (#4569) (overlookmotel)
- 649913e parser: Extract `u8` not `&u8` when iterating over bytes
(#4568) (overlookmotel)
- 59f00c0 parser: Rename function (#4566) (overlookmotel)
- 8e3e910 parser: Rename vars (#4565) (overlookmotel)
- 0c0601f parser: Rename function (#4564) (overlookmotel)
- 0acc4a7 parser: Fetch 2 bytes in `?` byte handler (#4563)
(overlookmotel)
- 565eccf parser: Shorten lexer code (#4562) (overlookmotel)
- 148bdb5 parser: Adjust function inlining (#4530) (overlookmotel)
- 16c7b98 semantic: Move CatchClause scope binding logic to
visit_block_statement (#4505) (Dunqing)
- d6974d4 semantic: `AstNodeParentIter` fetch nodes lazily (#4533)
(overlookmotel)
- d914b14 semantic: Reusing the same reference (#4529) (Dunqing)
- 7b5e1f5 semantic: Use `is_empty()` instead of `len() == 0` (#4532)
(overlookmotel)
- 9db4259 semantic: Inline trivial methods (#4531) (overlookmotel)
- 7c42ffc sourcemap: Align Base64 chars lookup table to cache line
(#4535) (overlookmotel)
- 96602bf transformer/typescript: Determine whether to remove
`ExportSpeicifer` by `ReferenceFlags` (#4513) (Dunqing)
- e6a8af6 traverse: Speed up tests (#4538) (overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-merge Merge with Graphite Merge Queue A-codegen Area - Code Generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants