Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve #[may_dangle] for type parameters #3417

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lcnr
Copy link
Contributor

@lcnr lcnr commented Apr 17, 2023

based on the discussion from rust-lang/rust#103413 and https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/deprecating.20spooky-dropck-at-a-distance/near/349392447. A big thanks to @SoniEx2 for their valuable input in these threads.

I consider this to be solely the responsibility of t-types, but cc @rust-lang/lang, @RalfJung

rendered

@lcnr lcnr added T-types Relevant to the types team, which will review and decide on the RFC. A-drop Proposals relating to the Drop trait or drop semantics labels Apr 17, 2023
0000-template.md Outdated Show resolved Hide resolved
- Regardless of whether `T` implements `Drop`, recurse into all types *owned* by `T`:
- references, raw pointers, function pointers, function items and scalars do not own
anything. They can be trivially dropped.
- tuples and arrays consider their element types to be owned.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for arrays this is currently also a bit inconsistent, see rust-lang/rust#110288 and https://rust-lang.zulipchat.com/#narrow/stream/326866-t-types.2Fnominated/topic/.23110288.3A.20.60.5BT.3B.200.5D.60.20adding.20outlives.20requirements.20to.20dropck for details about this.

Shouldn't matter for this RFC apart from deciding whether to add "[T; 0] does not consider T to be owned".

text/3417-dropck-eyepatch-v3.md Outdated Show resolved Hide resolved
@SoniEx2
Copy link

SoniEx2 commented Apr 17, 2023

it's worth pointing out that you can't drop a lifetime. are these attributes equivalent for lifetimes, should we pick one of them, or split lifetimes out into their own attribute?

@lcnr
Copy link
Contributor Author

lcnr commented Apr 17, 2023

yeah, thought a bit about this myself, and maybe keeping #[may_dangle] for lifetimes and requiring either #[may_dangle(can_drop)] and #[may_dangle(must_not_use)] for types would be good 🤷

@lcnr
Copy link
Contributor Author

lcnr commented Apr 18, 2023

went with #[may_dangle(droppable)] vs #[may_dangle(must_not_use)].

with this the more dangerous variant has a scarier and longer name, which should hopefully push users towards the more correct one even more.

@lcnr lcnr changed the title replace #[may_dangle] with 2 attributes improve #[may_dangle] for type parameters Apr 18, 2023
The status quo of `#[may_dangle]` and "spooky-dropck-at-a-distance" is far from ideal and has already
resulted in unsound issues. Documenting the current behavior makes is more difficult to change later
while not officially documenting it is bound to lead to more issues and confusion going forward.
It is therefore quite important to improve the status quo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy this as an argument against documenting the status quo, given that this is all unstable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that it is not, or well #[may_dangle] is unstable but "spooky-dropck-at-a-distance" affects stable code:

We have the stable behavior that (PhantomData<PrintOnDrop<'s>>, String) requires 's to be live. Similar for ADTs: playground

use std::marker::PhantomData;
struct MyType<T, U> {
    // _marker: PhantomData<T>, // errors
    _marker: PhantomData<*const T>, // doesn't error
    other: U,
}

impl<T, U> MyType<T, U> {
    fn mk(_: T, other: U) -> Self {
        MyType {
            _marker: PhantomData,
            other,
        }
    }
}

struct PrintOnDrop<'s>(&'s str);
impl<'s> Drop for PrintOnDrop<'s> {
    fn drop(&mut self) {
        println!("dropping `{}`", self.0)
    }
}

fn main() {
    let x;
    {
        let temp = String::from("temporary");
        x = MyType::mk(PrintOnDrop(temp.as_str()), String::from("needs_drop"));
    }
}

While it should be fine™ to change this stable behavior even after documenting it as nobody should rely on it, it does make me uncomfortable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see.
Honestly that seems like cheating to me -- the behavior hasn't been documented, but it has existed on stable for years, and it's not a bug -- this is how things were designed to behave, IIUC. Using our lack of documentation as a loophole feels a bit backhanded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly that seems like cheating to me -- the behavior hasn't been documented, but it has existed on stable for years, and it's not a bug -- this is how things were designed to behave, IIUC. Using our lack of documentation as a loophole feels a bit backhanded.

do you mean "we should keep the current behavior because it's a breaking change not to"? Or more "whether the current behavior is documented shouldn't change our decision on whether we should change it"?

this is how things were designed to behave, IIUC

wrt to this I pretty clearly feel like this change is a clarification of underspecified language semantics . I don't think we ever intended to have "does not need Drop but when part of something that does, add outlives requirements" as the behavior for PhantomData.

"whether the current behavior is documented shouldn't change our decision on whether we should change it"

I personally disagree with this perspective if that's what you intended to say. By documenting something publicly we declare that the current behavior is intended as is and something users may rely on.

I do believe that the current state here is a bug. Dropping fields separately should be equivalent to dropping them as a pair. Once this lands I intend to add a check that dropck_outlives_requirements.is_empty() implies needs_drop. i.e. types without drop glue are trivial to drop.

Copy link
Member

@RalfJung RalfJung Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean "we should keep the current behavior because it's a breaking change not to"? Or more "whether the current behavior is documented shouldn't change our decision on whether we should change it"?

The latter. I am totally fine with changing the behavior. But I don't like "we failed to document it" as justification.

Anyway this is probably more of a discussion for rust-lang/rust#103413?


The current rules around dropck and `#[may_dangle]` are confusing and have even resulted in
unsoundness [multiple](https://github.com/rust-lang/rust/issues/76367)
[times](https://github.com/rust-lang/rust/issues/99408). Even without `#[may_dangle]`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-phrasing the key point of this RFC at a slightly higher level, the argument seems to be: inferring the exact effect of may_dangle through field ownership is subtle and easy to get wrong, we should instead explicitly annotate what we want.

I generally agree with this direction. However, this is a departure from what seems to me as a pretty fundamental design decision in Rust: to infer all sorts of properties of a type based on "which other types it owns". We do this for dropck, but also for auto traits and variance. This decisions leads to issues such as rust-lang/rust#99408 where changing the type of a field accidentally also changes this inferred property (that issue was T to ManuallyDrop<T> affecting dropck; I think we also had &mut T to *mut T affecting auto traits -- lucky enough erring on the safe side -- and it's easy to imagine similar situations with variance).

I am not sure how I feel about attacking this fundamental problem in a piecemeal fashion. At the very least, this RFC should state somewhere that the question of inferring may_dangle dropck from field ownership is closely related to inferring auto traits and variance. And long-term, IMO it would be very odd to have this explicit annotation for may_dangle without also having something similar for auto traits and variance. Admittedly that's a lot harder, (a) since those are stable and (b) since we only want this more explicit style when unsafe code is involved -- which for may_dangle is 100% of the cases, but not so for auto traits and variance.

Put differently: if PhantomData is not fit for the purpose of may_dangle, why is it ever fit for any purpose? Given the arguments in this RFC, it seems wrong to keep that type around (well it will stay for backwards compatibility obviously). IMO we should only accept the RFC if we generally agree that PhantomData should be deprecated and replaced by more explicit annotations.

Copy link

@SoniEx2 SoniEx2 Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we feel like there's no room for error with unsafe. inference isn't inherently wrong, but it should fail-safe if you forget something. that appears to hold for Send and Sync but not so much for may_dangle.

an alternative would be to validate the Drop impl based on the inferred bounds/outlives requirements. but that seems more awkward to use. and it doesn't solve "spooky-dropck-at-a-distance" (which is not unstable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that appears to hold for Send and Sync

I don't think so. If you have a *mut T field and later decide to make it &mut, you might be accidentally adding Send/Sync to your type without realizing. Similar examples exist for variance.

Copy link

@SoniEx2 SoniEx2 Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not an example, that's an argument.

we can't think of an example where changing *mut T to &mut T causes it to become unsound due to Send/Sync, but plenty of examples where it becomes unsound due to aliasing. we would love to see such an example tho.

on the other hand variance is a footgun, from experience. at the same time, variance is fairly unintuitive, extremely verbose, and it shows up everywhere, so there's a huge added value from having it inferred by default. this is in contrast to #[may_dangle] which (on stable) only shows up on std collections and pointer wrappers.

further, #[may_dangle] opts-in to inference (impl Drop is the opt-out), whereas there's no way to opt-out of variance.

but the main issue is still the mismatch between needs_drop and outlives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't think of an example where changing *mut T to &mut T causes it to become unsound due to Send/Sync, but plenty of examples where it becomes unsound due to aliasing. we would love to see such an example tho.

There is such an example: rust-lang/rust#41622

Put differently: if PhantomData is not fit for the purpose of may_dangle, why is it ever fit for any purpose? Given the arguments in this RFC, it seems wrong to keep that type around (well it will stay for backwards compatibility obviously). IMO we should only accept the RFC if we generally agree that PhantomData should be deprecated and replaced by more explicit annotations.

Going to think more about that and will try to come back with an answer later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't a regression, but it was a case of accidental Sync inference: MutexGuard used &mut which propagates Sync; if it had used *mut (which arguably would be more correct as well since the lifetime of the reference is not accurate) the bug would have been avoided.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so maybe we should have more opt-outs, but having an opt-out mixed with a partial opt-in seems like a massive footgun, especially if it's not actually checked for correctness.

But also if custom auto traits are ever to be a thing, how do they fit into all this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, here are my thoughts after thinking about this during lunch:

Put differently: if PhantomData is not fit for the purpose of may_dangle, why is it ever fit for any purpose? Given the arguments in this RFC, it seems wrong to keep that type around (well it will stay for backwards compatibility obviously). IMO we should only accept the RFC if we generally agree that PhantomData should be deprecated and replaced by more explicit annotations.

PhantomData<T> is not appropriate for dropck is because it always implements Copy.

I think it could be appropriate if it were to only implement Copy if T is Copy. In this case for any T with drop glue we could require noop drop glue for PhantomData with the outlives requirements of T. However, such a Copy impl would be limiting other uses of PhantomData as it's quite useful to freely copy it around.

Because it implements Copy we're currently stuck with a type which has liveness requirements when involved in a drop, even though it doesn't get dropped itself. I personally label this behavior to clearly be a bug, hence the effort to change it via this RFC (and rust-lang/rust#110288 for [T; 0]).

Copy link
Member

@RalfJung RalfJung Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're basically saying, PhantomData for dropck is 'even weirder' than for the other purposes. I can't disagree with that, for the specific Copy case you mention. However that's not the argument the RFC is making, so the RFC should be amended. It also still seems worth mentioning the relationship with auto traits and variance, even if the issue is more pressing for dropck.

(and rust-lang/rust#110288 for [T; 0]).

[T; 0] isn't copy so I don't follow that argument. I'd rather special-case array length 0 in fewer cases than more. But anyway that's not part of this RFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the motivation and mentioned auto traits + variance in the future possibilities section.

@nikomatsakis
Copy link
Contributor

How does this relate to #3414 ?

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on how to improve RFC -- don't yet have an opinion on whether we should do this. I agree some changes in this area are good. Not sure if I feel this is the right change, though perhaps it's a harmless step forward. If nothing else, great to accumulate use cases.

In general I consider PhantomData to be something of a failure. The idea was originally to have a more intuitive version of variance but (a) I don't think it's all that intuitive, quite the leaky abstraction; and (b) we've grown the scope to include dropck. Overall I think it's probably a good idea to move towards something else, though I don't entirely know what I think that should be.

I still like the intuition of "phantomdata means 'act as thought this data is present'", however. But I guess that is past.

[times](https://github.com/rust-lang/rust/issues/99408). `#[may_dangle]` restricts
the impl of `fn Drop::drop`. Whether this method is allowed to drop `T` depends
on the fields of the ADT. While changing `#[may_dangle]` to explicitly state its intended
behavior is necessary due to the change to `PhantomData`, this also simplify its usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
behavior is necessary due to the change to `PhantomData`, this also simplify its usage.
behavior is necessary due to the change to `PhantomData`, this also simplifies its usage.

`#[may_dangle]` is currently difficult to use correctly. It has resulted in
unsoundness [multiple](https://github.com/rust-lang/rust/issues/76367)
[times](https://github.com/rust-lang/rust/issues/99408). `#[may_dangle]` restricts
the impl of `fn Drop::drop`. Whether this method is allowed to drop `T` depends
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this sentence means: "the impl of fn Drop::drop"?

[motivation]: #motivation

`PhantomData` is `Copy` but still adds outlives requirements when dropped as a part of
a larger value. This behavior is inconsistent and results in "spooky-dropck-at-a-distance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't find this phrase ("spooky-dropck-at-a-distance") very clear and would recommend we drop it from the RFC and replace it with something more clearly meaningful. My best effort to understand what "spooky dropck at a distance means" is something like "This behavior means that the interaction of PhantomData and dropck cannot be understood in isolation, but instead has to be considered as part of the larger type that the PhantomData is embedded in". Does that sound right?

(As a stylistic side note, although I sometimes forget this myself, I generally find it's better to avoid "cutesy" negative labels for previous systems. Specifically in this case, I find I have a kneejerk reaction when I see the word "spooky" here that makes me feel defensive and want to oppose the change. I know this is silly and I am suppressing it, but it takes effort. In general, avoiding a label that feels pejorative when proposing changes will help people feel good about them.)

even without `#[may_dangle]`:

```rust
use std::marker::PhantomData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic note: Large blocks of code are hard for me to deal with. Can you add comments or break this into paragraphs to walk me through why each piece matters?


```rust
use std::marker::PhantomData;
struct PrintOnDrop<'a>(&'a str); // requires `'a` to be live on drop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., Consider first a type PrintOnDrop which holds a &str reference and has a Drop impl. This impl will access data from the reference when it is dropped. In general the compiler assumes that every drop impl will access all the data from the structs fields.

}
}

fn assign<T>(_: T) -> PhantomData<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain what this is for


A type where `#[may_dangle(must_not_use)]` would be useful is a `Weak` pointer for a variant of `Rc`
where the value is dropped when the last `Rc` goes out of scope. Dropping a `Weak` pointer
would never access `T` in this case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful and avoids the main problem I saw in the previous RFC -- it'd be great to expand a bit on where the two modes should be used in the stdlib to help reader understand the role of them. Also, what is an example of code that compiles because of this but wouldn't otherwise?

@SoniEx2
Copy link

SoniEx2 commented Apr 23, 2023

@jonhoo
Copy link
Contributor

jonhoo commented Apr 29, 2023

In case another instance of a need for this outside of std is of interest, here's a recent one from the tracing crate: tokio-rs/tracing#2578

(Feel free to mark as off-topic if deemed as such)

@thomcc
Copy link
Member

thomcc commented Jul 23, 2023

If use cases are an open question, another notable case where it's desired is https://crates.io/crates/typed-arena. Several optimizations I'd like to implement are impossible without implementing my own Drop, which would break many use cases of typed_arena.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-drop Proposals relating to the Drop trait or drop semantics T-types Relevant to the types team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants