Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Arm's NEON vectorization #34

Closed
eirnym opened this issue Feb 11, 2024 · 8 comments · Fixed by #35
Closed

Add Arm's NEON vectorization #34

eirnym opened this issue Feb 11, 2024 · 8 comments · Fixed by #35

Comments

@eirnym
Copy link

eirnym commented Feb 11, 2024

Could you please enable optimizations for macbooks by default as you've did for x86_64 CPUs

@DoumanAsh
Copy link
Owner

Please understand that current implementation only supports AVX2 and SSE2, therefore it is impossible to enable by default, as there is no NEON implementation

Now for matter of default in general
NEON cannot be assumed to be default in general, but I believe all mac OS chips do so, so in theory I could assume that, but only for Mac OS.

Problem is that when I started this library NEON support in Rust's std was lacking and I'm not sure if they filled gaps yet to implement it
I will try to take a look again later

@eirnym
Copy link
Author

eirnym commented Feb 12, 2024

Most of features supported by LLVM has been implemented. Remaining unsupported features has not been implemented in LLVM as far as I understood the thread.

Documentation also describes many neon instructions, some of them available since Rust 1.59.0

https://doc.rust-lang.org/core/arch/arm/index.html
https://doc.rust-lang.org/core/arch/aarch64/index.html

@DoumanAsh DoumanAsh changed the title Optimizations for aarch64 macOS Add Arm's NEON vectorization Feb 12, 2024
@DoumanAsh
Copy link
Owner

@eirnym Can you please give me output of rustc --print cfg on your M1 laptop?
I'm curious if Neon is enabled by default on Mac

If so you can try to test my branch #35

@eirnym
Copy link
Author

eirnym commented Feb 16, 2024

I have macOS M2 laptop:

$ rustc --print cfg
debug_assertions
panic="unwind"
target_arch="aarch64"
target_endian="little"
target_env=""
target_family="unix"
target_feature="aes"
target_feature="crc"
target_feature="dit"
target_feature="dotprod"
target_feature="dpb"
target_feature="dpb2"
target_feature="fcma"
target_feature="fhm"
target_feature="flagm"
target_feature="fp16"
target_feature="frintts"
target_feature="jsconv"
target_feature="lor"
target_feature="lse"
target_feature="neon"
target_feature="paca"
target_feature="pacg"
target_feature="pan"
target_feature="pmuv3"
target_feature="ras"
target_feature="rcpc"
target_feature="rcpc2"
target_feature="rdm"
target_feature="sb"
target_feature="sha2"
target_feature="sha3"
target_feature="ssbs"
target_feature="vh"
target_has_atomic="128"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_os="macos"
target_pointer_width="64"
target_vendor="apple"
unix

@eirnym
Copy link
Author

eirnym commented Feb 16, 2024

my test:

Cargo.toml:

[package]
name = "public-id"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
base64 = "0.21.7"
uuid = { version = "1.7.0", features = ["v4", "v7", "v8"] }
#xxhash-rust = { version = "0.8.8", features = ["xxh3"] }
xxhash-rust = { git="https://github.com/DoumanAsh/xxhash-rust.git", branch="neon", features = ["xxh3"] }

src/main.rs:

use base64::{engine::general_purpose::URL_SAFE, Engine as _};

fn main() {
    let v: u64 = xxhash_rust::xxh3::xxh3_64(uuid::Uuid::new_v4().as_bytes());
    let b64 = URL_SAFE.encode(v.to_le_bytes());
    println!("Hello, world! {}", b64);
}

both apps (with and without neon optimizations) are compiled with --release, Cargo.lock is removed and fd xxhash-rust . -x rm -rf is run in ~/.cargo

hyperfine output:

$ hyperfine --warmup 1000 -N -u microsecond './public-id-neon-optimizations' ./public-id-no-optimizations

Benchmark 1: ./public-id-neon-optimizations
  Time (mean ± σ):     728.7 µs ±  16.9 µs    [User: 356.3 µs, System: 186.6 µs]
  Range (min … max):   697.4 µs … 1069.2 µs    4060 runs
 
Benchmark 2: ./public-id-no-optimizations
  Time (mean ± σ):     724.8 µs ±  15.2 µs    [User: 355.4 µs, System: 184.2 µs]
  Range (min … max):   692.9 µs … 920.6 µs    4129 runs
 
Summary
  ./public-id-no-optimizations ran
    1.01 ± 0.03 times faster than ./public-id-neon-optimizations

@DoumanAsh
Copy link
Owner

Well it is good that Mac has Neon enabled by default
I will merge and release new version later

@eirnym
Copy link
Author

eirnym commented Feb 16, 2024

stats for 256Mb of random data:

hyperfine --warmup 1000 -N -u microsecond './public-id-neon-optimizations' ./public-id-no-optimizations          
Benchmark 1: ./public-id-neon-optimizations
  Time (mean ± σ):     66061.1 µs ± 1809.4 µs    [User: 14959.7 µs, System: 50626.2 µs]
  Range (min … max):   63642.4 µs … 73034.2 µs    44 runs
 
Benchmark 2: ./public-id-no-optimizations
  Time (mean ± σ):     75613.7 µs ± 7321.5 µs    [User: 22832.6 µs, System: 51530.6 µs]
  Range (min … max):   70870.0 µs … 115103.5 µs    41 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  ./public-id-neon-optimizations ran
    1.14 ± 0.12 times faster than ./public-id-no-optimizations

@DoumanAsh
Copy link
Owner

Release 0.8.9 with Neon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants