Crandall primes #445

mratsim · 2024-07-27T15:36:50Z

This closes #11 for primes of form 2ᵐ-c (Crandall primes / pseudo-Mersenne primes), such as the one used for Curve25519 and secp256kq (Ethereum/ Bitcoin).

Bench Fp vs Constantine master

Current

Analysis

Fp[Edwards25519] mul 1.27x improvement
Fp[Edwards25519] square 1.43x improvement
Fp[Secp256k1] mul 1.94x improvement
Fp[Secp256k1] square 1.28x improvement

Bench EC vs Constantine master

Current

Analysis

EC add projective constant-time improved by 1.36x
EC add jacobian constant-time improved by 1.34x
EC add projective vartime improved by 1.24x
EC add jacobian vartime improved by 1.37x
EC dbl projective constant-time improved by 1.31x
EC dbl jacobian constant-time improved by 1.06x

Bench vs bitcoin/secp256k1

field_sqr 12.4ns vs 8ns -> 1.55x
field_mul 15.8ns vs 10ns -> 1.58x
field_inv_ct 1410ns vs 1203ns -> 1.17x
field_inv_vt 820ns vs 848ns -> 0.97x
EC add jacobian var 247ns vs 97ns -> 2.55x
EC dbl jacobian var 97.8ns vs 145 -> 0.67x
EC mixed add ct 189ns vs 225ns -> 0.84x
EC mixed add var 173ns vs 98ns -> 1.77x

EC scalar-mul ct 28100ns vs 40196 ns -> 0.70x

Analysis

The fact that field operations are 1.5x faster BUT the elliptic curve operations are sometimes slower is suspicious. We probably need to check the EC formulae

TODO

fix windows
bound checks for lazy reduce and lazy reduced field exponentiation for 256-bit as eprint/iacr 2018/985
indicates in Theorem 4 that their partial reduction may grow by 1 bit if 256-bit.
optimize EC impl to avoid if/else check for ADX and limit input/output movement
optimized mixed add

… Prime fast reduction - closes #11

…t, renaming of lazy reduction both in Montgomery and Crandall to lazyReduction

…k1, failing edwards25519

…duce temporarily

mratsim · 2024-07-27T16:20:03Z

Bench vs RustCrypto/elliptic-curves

https://github.com/RustCrypto/elliptic-curves/ is the current record holder of https://programming-language-benchmarks.vercel.app/problem/secp256k1

We modify it to bench some of the internals

Field implementation

cargo bench --features expose-field -- field

with an extra

fn bench_field_element_10adds<'a, M: Measurement>(group: &mut BenchmarkGroup<'a, M>) {
    let x = test_field_element_x();
    let y = test_field_element_y();
    group.bench_function("10 adds", |b| b.iter(
        || {
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y)
        }
    ));
}

10 adds: 25ns vs 12ns - 2.08x
mul (partially normalized in k256): 17.825ns vs 10ns - 1.78x
sqr (partially normalized in k256): 13.846ns vs 8ns - 1.73x

EC implementation (projective with Renes2015 formulae)

use criterion::{
    black_box, criterion_group, criterion_main, measurement::Measurement, BenchmarkGroup, Criterion,
};
use k256::ProjectivePoint;
use elliptic_curve::{
    rand_core::SeedableRng,
    group::Group,
};
use rand_xorshift::XorShiftRng;

fn bench_ec_add<'a, M: Measurement>(group: &mut BenchmarkGroup<'a, M>) {
    let mut rng = XorShiftRng::seed_from_u64(1234u64);
    let p = ProjectivePoint::random(&mut rng);
    let q = ProjectivePoint::random(&mut rng);
    group.bench_function("EC Add", |b| {
        b.iter(|| &black_box(p) + &black_box(q))
    });
}

fn bench_ec_dbl<'a, M: Measurement>(group: &mut BenchmarkGroup<'a, M>) {
    let mut rng = XorShiftRng::seed_from_u64(1234u64);
    let p = ProjectivePoint::random(&mut rng);
    group.bench_function("EC Dbl", |b| {
        b.iter(|| black_box(p).double())
    });
}

fn bench_ec(c: &mut Criterion) {
    let mut group = c.benchmark_group("EC operations");
    bench_ec_add(&mut group);
    bench_ec_dbl(&mut group);
    group.finish();
}

criterion_group!(benches, bench_ec);
criterion_main!(benches);

EC add proj ct: 195.83ns vs 232ns - 0.84x
EC dbl proj ct: 130.83ns vs 153ns - 0.86x

Analysis

The fact that field operations are 1.7x to 2x faster BUT the elliptic curve operations are 0.85x slower is extremely suspicious. Especially when we implement the same formulae from Renes2015 paper.

There might be useless copies or parameter passing overhead similar to #21 and #146

mratsim added 15 commits July 27, 2024 01:08

feat(special primes accel): Support Crandall primes / Pseudo-Mersenne…

4bb42a6

… Prime fast reduction - closes #11

feat(special primes accel): refactoring: p-1 support ompiles on 64-bi…

6f7b456

…t, renaming of lazy reduction both in Montgomery and Crandall to lazyReduction

feat(special primes accel): support 32-bit

e9ed729

chore: lazyReduction->lazyReduce

7aa90c4

fix: fp mulsquare test

d18a825

feat: Crandall exponentiation

7a2a4f9

feat: initial commit assembly for Crandall reduction, passing secp256…

df112c6

…k1, failing edwards25519

feat(asm-crandall): actually use the assembly

b3b498d

feat(asm-crandall): fix sqrt test and short immediate

f954f4f

feat(crandall reduction): x86-adx reduction

7ef5b08

feat(crandall reduction): add final reduce, deactivate adx partial re…

4308ff0

…duce temporarily

feat(crandall reduction): fix adx partial reduce

15788c8

feat(crandall reduction): prevent asm for mul on 32-bit

49f0646

feat(bench): check overhead of field calls

0f8f289

feat(crandall reduction): prevent asm for mul on 32-bit reloaded

57911e8

mratsim added the performance 🏁 label Jul 27, 2024

This was referenced Jul 27, 2024

Low-level: discrepancy between field arithmetic performance and elliptic curve performance #446

Open

Windows: Secp256k1 tests assembly test frozen #448

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crandall primes #445

Crandall primes #445

mratsim commented Jul 27, 2024 •

edited

Loading

mratsim commented Jul 27, 2024 •

edited

Loading

Crandall primes #445

Are you sure you want to change the base?

Crandall primes #445

Conversation

mratsim commented Jul 27, 2024 • edited Loading

Bench Fp vs Constantine master

Previous

Current

Analysis

Bench EC vs Constantine master

Previous

Current

Analysis

Bench vs bitcoin/secp256k1

Analysis

TODO

mratsim commented Jul 27, 2024 • edited Loading

Bench vs RustCrypto/elliptic-curves

Field implementation

EC implementation (projective with Renes2015 formulae)

Analysis

mratsim commented Jul 27, 2024 •

edited

Loading

mratsim commented Jul 27, 2024 •

edited

Loading