Missing early clobber #766

ehoffman2 · 2020-07-14T17:42:10Z

In scalar_4x64_impl.h, function secp256k1_scalar_reduce_512. The first asm block is missing early clobber for m0~m2.

The input %rsi is used after writing to m0, m1 and m2. Any of those write can thus re-use %rsi (which will most likely result in segfault).

Also, field_5x52_asm_impl.h is missing similar early clobbers for tmp1~tmp3

real-or-random · 2020-07-14T22:53:23Z

My understanding of the GCC ASM extensions is very limited but I think you're right. Would you be willing to open a PR?

ehoffman2 · 2020-07-15T02:30:37Z

Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself. I actually found this bug because I was re-using code from the library in a small home project, where I have field elements not using nails (packed in 32 bytes, like the scalars). I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).

So, I had to modify the field code to look like the scalar code, but using modulo P instead of modulo N. Most code mainly just involved using P instead of N as a constant, but I had to re-write the asm for the 512-bit reduction (much simpler as you only have one 64-bit single-precision multiplication). However, at some point, I hit the issue with the clobber list as it was defined in scalar code (segfault). Looking at asm output, I saw that the compiler did put the 'extract to m2' to %rsi, and since %rsi was used as indexed memory source for the next instruction, this caused segfault. Just defining m0~m2 output list as early-clobber (since the %rsi input is not 'consumed' before those outputs are set) fixed the issue.

real-or-random · 2020-07-15T14:34:22Z

Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself.

Well, neither do I (or anyone else here). We haven't seen bugs due to this but I guess then we were just lucky that compilers got it right arbitrarily.

I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).

Can I ask what your project is about? I'm just curious, it's not related to this issue.

In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766

In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core/secp256k1#766

In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766

real-or-random mentioned this issue Aug 7, 2020

Remove lax DER parsing functions from contrib #781

Draft

sipa mentioned this issue May 12, 2023

Mark more assembly outputs as early clobber #1307

Merged

jonasnick closed this as completed in #1307 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing early clobber #766

Missing early clobber #766

ehoffman2 commented Jul 14, 2020

real-or-random commented Jul 14, 2020

ehoffman2 commented Jul 15, 2020

real-or-random commented Jul 15, 2020

Missing early clobber #766

Missing early clobber #766

Comments

ehoffman2 commented Jul 14, 2020

real-or-random commented Jul 14, 2020

ehoffman2 commented Jul 15, 2020

real-or-random commented Jul 15, 2020