Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate all floating-point status register functions across architectures #1479

Open
RalfJung opened this issue Oct 9, 2023 · 10 comments
Open

Comments

@RalfJung
Copy link
Member

RalfJung commented Oct 9, 2023

We did this for x86 in #1454 and #1471 (thanks @eduardosm for catching that!), and for RISC-V in #1478. Chances are other architectures will have similar functions; they should all be treated the same way.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 4, 2023

In particular, does anyone know if ARM/aarch64 has intrinsics that read or write something like a "floating point status register", something that alters the behavior of floating-point instructions or can detect whether a floating-point instruction ran that caused a particular side-effect? If so, we should deprecate them as well. (And same if there's anything else in the ARM intrinsics that does more than just work on some values locally in some registers. Anything that affects or is affected by previous or later operations is suspicious.)

Let's see if ping groups work here... @rustbot ping arm

@rustbot
Copy link
Collaborator

rustbot commented Nov 4, 2023

Error: The feature ping is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 4, 2023

Ah, bummer. Let's do it by hand then.
@Stammark @joaopaulocarreiro @raw-bin @hug-dev

@jacobbramley
Copy link
Contributor

Sorry, we didn't see this!

Arm architectures support this behaviour using special register (e.g. FPCR/FPSR on AArch64). However, neither ACLE (C) nor Rust's implementation expose these directly. For example, in C, implementations that want to configure rounding are supposed to do so using fesetround(), but there's no equivalent in Rust.

I think the comments in #1454 about assembly blocks apply to Arm too; people may need to change FP behaviours in asm! blocks (or FFI functions), and that should be fine as long as they put them back.

I've noticed one notable issue: the existing transactional memory extension intrinsics (#855) rely on start/commit bracketing around other accesses. I think such accesses may even be implicit (e.g. stack accesses), so this could cause problems if the transaction is cancelled. This probably needs a closer look; I suspect this might need to be wrapped in a single assembly block, and in which case we should deprecate the TME intrinsics.

And same if there's anything else in the ARM intrinsics that does more than just work on some values locally in some registers. Anything that affects or is affected by previous or later operations is suspicious.

That's a rather more general concern. If I understand correctly, the main problem here is with intrinsics that change the implicit behaviour of other instructions that Rust might use itself. Is that accurate? I'm not aware of anything (other than FPCR) than does that today. However, we will have intrinsics that affect the behaviour of other intrinsics. Examples:

  1. SME changes the behaviour of some SVE intrinsics in a modal fashion. We've not yet proposed a way to handle this, but this modality is already high on our list of concerns.
  2. SVE itself has some implicit behaviours in places. For example, its mechanism for handling fault-tolerant loads works by setting an implicit register to mark elements that weren't loaded. Some subsequent SVE intrinsics then behave differently, according to this register. However, it shouldn't affect routine Rust code. Is that Ok?

We also have intrinsics that need to remain ordered with respect to others, such as barriers.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 3, 2024

I've noticed one notable issue: the existing transactional memory extension intrinsics (#855) rely on start/commit bracketing around other accesses. I think such accesses may even be implicit (e.g. stack accesses), so this could cause problems if the transaction is cancelled. This probably needs a closer look; I suspect this might need to be wrapped in a single assembly block, and in which case we should deprecate the TME intrinsics.

Ah! Good catch. If these intrinsics can "un-do" other memory accesses then indeed they can only be used from inline asm.

This ofc applies to transactional memory intrinsics across all targets. I opened an issue for that: #1521.

If I understand correctly, the main problem here is with intrinsics that change the implicit behaviour of other instructions that Rust might use itself.

Yes, that sounds like a good characterization of "FP control bits"-style problems.

There's also "intrinsics that make observable the implicit behavior of other instructions Rust might use itself", which are "FP status bits"-style problems. Those are less problematic since they cannot cause UB, but they still produce unreliable results so we need to watch out for them.

SVE itself has some implicit behaviours in places. For example, its mechanism for handling fault-tolerant loads works by setting an implicit register to mark elements that weren't loaded. Some subsequent SVE intrinsics then behave differently, according to this register. However, it shouldn't affect routine Rust code. Is that Ok?

I know ~nothing about how SVE works on the asm level, so I can't say unfortunately. Is there a good explanation for people without a hardware/ISA background -- something focusing on the abstract high-level behavior in a programming language, and relating the asm-level behavior to that?
If the Rust-generated code is written in a way that "no matter the status of that register, the resulting behavior will be the same", then that should be okay, but will need careful documentation so people know what they can and cannot do in inline assembly blocks.

We also have intrinsics that need to remain ordered with respect to others, such as barriers.

Ordering constraints should be fine, as long as LLVM respects the ordering as well.

However, this might still be a symptom of an actual issue. For instance, if you use Rust atomics, I am not sure how much we guarantee about how exactly they get turned into assembly -- and in particular, we are allowed to optimize them away entirely in some conditions. So mixing Rust atomics with direct use of target-specific hardware intrinsics is probably a bad idea.

As another example of an ordering-related problem, "non-temporal / streaming stores" on x86 are causing major headaches.

@jacobbramley
Copy link
Contributor

However, it shouldn't affect routine Rust code. Is that Ok?

I know ~nothing about how SVE works on the asm level ...
If the Rust-generated code is written in a way that ...

I think that's the crux of my point: we might have intrinsics that affect the behaviour of other intrinsics (or assembly), but can't affect the behaviour of normal Rust code. However, I now wonder if that ever actually happens reliably.

  • My example of SVE fault-tolerant loads appears to have been incorrect anyway, because the ABI doesn't work the way I had assumed. I need to dig a little further but will come back here once I have done so.
  • Even if we can be confident that no Rust-originated code would be affected, these global changes might be forward-compatibility hazards. For example, controlling FP16-specific behaviours might be considered to be Ok today, since no Rust code implicitly uses them, but if we permit this now then it'd block future merging of the f16 support.
    • Even then, it could change the behaviour of inline assembly in pure Rust functions, so I'm not convinced that it's Ok today, even if purely-safe Rust code would be unaffected.

Do we need to go as far as to say "do not modify any global state"? Probably not, because most architectures have global condition flags, and setting them is often unavoidable. Do we need to go as far as describing exactly what can and can't be modified for each architecture? Maybe, but it might be a big, difficult task. At least for Arm, the ABI describes whether or not these modal registers are caller- or callee-saved. For AArch64 Linux, it'd be in AAPCS64. I'd like to think that there's a suitable middle-ground, or a statement of policy that we can make, but I haven't thought of one yet.

Is there a good explanation for people without a hardware/ISA background

The ACLE (C/C++) documentation might help, though it's hard to completely get away from the ISA-level details. There's a section specifically about FFR; actually I think I need to read through that myself, to understand that FFRT mechanism and how (or if) it can help us in Rust.

@bjorn3
Copy link
Member

bjorn3 commented Feb 6, 2024

Even then, it could change the behaviour of inline assembly in pure Rust functions, so I'm not convinced that it's Ok today, even if purely-safe Rust code would be unaffected.

Does the ABI guarantee a specific value for those control registers? If not, inline asm can't assume a specific value either.

For things like flags inline asm can't assume a specific input value either and can produce any output value unless you explicitly declare that you don't touch flags.

@jacobbramley
Copy link
Contributor

Does the ABI guarantee a specific value for those control registers?

Strictly, no, and (if I understand correctly) that's Ok for C because it doesn't precisely define FP behaviours, at least not like Rust does. However, Rust does require specific FPCR settings, and it would make sense that it would want to extend that requirement to the f16-specific bits.

@bjorn3
Copy link
Member

bjorn3 commented Feb 6, 2024

C requires a fixed fpenv unless you use the pragma to respect the fpenv, right?

@RalfJung
Copy link
Member Author

Probably not, because most architectures have global condition flags, and setting them is often unavoidable.

AFAIK we allow the asm block to change the condition flags arbitrarily, and for intrinsics LLVM needs to just know which of them clobber these flags. But yes ideally this would be documented in the asm! docs. I don't think it's a concern for intrinsics though, unless LLVM makes the wrong assumptions (which I don't think is an issue we had in the past).

At least for Arm, the ABI describes whether or not these modal registers are caller- or callee-saved.

Intrinsics (and asm! blocks) are not function calls though so these ABI docs don't necessarily apply.

Even if we can be confident that no Rust-originated code would be affected, these global changes might be forward-compatibility hazards. For example, controlling FP16-specific behaviours might be considered to be Ok today, since no Rust code implicitly uses them, but if we permit this now then it'd block future merging of the f16 support.

Yeah, good point -- which state is "Rust-observable" can change with future versions of Rust. Without an explicit statement that certain state is definitely not affecting Rust code, there's always a risk that modifying some state can cause issues in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants