Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing early clobber #766

Closed
ehoffman2 opened this issue Jul 14, 2020 · 3 comments · Fixed by #1307
Closed

Missing early clobber #766

ehoffman2 opened this issue Jul 14, 2020 · 3 comments · Fixed by #1307

Comments

@ehoffman2
Copy link

In scalar_4x64_impl.h, function secp256k1_scalar_reduce_512. The first asm block is missing early clobber for m0~m2.

The input %rsi is used after writing to m0, m1 and m2. Any of those write can thus re-use %rsi (which will most likely result in segfault).

Also, field_5x52_asm_impl.h is missing similar early clobbers for tmp1~tmp3

@real-or-random
Copy link
Contributor

My understanding of the GCC ASM extensions is very limited but I think you're right. Would you be willing to open a PR?

@ehoffman2
Copy link
Author

Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself. I actually found this bug because I was re-using code from the library in a small home project, where I have field elements not using nails (packed in 32 bytes, like the scalars). I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).

So, I had to modify the field code to look like the scalar code, but using modulo P instead of modulo N. Most code mainly just involved using P instead of N as a constant, but I had to re-write the asm for the 512-bit reduction (much simpler as you only have one 64-bit single-precision multiplication). However, at some point, I hit the issue with the clobber list as it was defined in scalar code (segfault). Looking at asm output, I saw that the compiler did put the 'extract to m2' to %rsi, and since %rsi was used as indexed memory source for the next instruction, this caused segfault. Just defining m0~m2 output list as early-clobber (since the %rsi input is not 'consumed' before those outputs are set) fixed the issue.

@real-or-random
Copy link
Contributor

Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself.

Well, neither do I (or anyone else here). We haven't seen bugs due to this but I guess then we were just lucky that compilers got it right arbitrarily.

I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).

Can I ask what your project is about? I'm just curious, it's not related to this issue.

sipa added a commit to sipa/secp256k1 that referenced this issue May 12, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core#766
sipa added a commit to sipa/secp256k1 that referenced this issue May 12, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core#766
real-or-random pushed a commit to real-or-random/secp256k1 that referenced this issue May 12, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core#766
jonasnick pushed a commit to jonasnick/secp256k1-zkp that referenced this issue May 14, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core/secp256k1#766
dderjoel pushed a commit to dderjoel/secp256k1 that referenced this issue May 23, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core#766
dderjoel pushed a commit to dderjoel/secp256k1 that referenced this issue May 23, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core#766
dderjoel pushed a commit to dderjoel/secp256k1 that referenced this issue Jun 21, 2023
In the existing code, the compiler is allowed to allocate the RSI register
for outputs m0, m1, or m2, which are written to before the input in RSI is
read from. Fix this by marking them as early clobber.

Reported by ehoffman2 in bitcoin-core#766
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants