Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow implicit coercions for cross-borrows. #226

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions 0000-crossborrow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

Allow implicit coercion from any type which implements `Deref<T>` to a borrowed
reference, `&T` (sometimes called cross-borrowing).

# Motivation

Rust code is littered with `&*expr` expressions. These are ugly, off-putting to new-comers,
and provide little benefit (see footnote, below). In the past, we allowed ad-hoc
cross-borrowing from Gc<T> and Box<T> to &T. Whilst the previous system
privileged built in pointer types over user-defined pointer types, it was both
usable and ergonomic. I propose using the `Deref` trait to bring back the
ergonomic benefits of cross- borrowing without the ad-hoc-ness of the old
system.

The expected outcome if this RFC is implemented is that Rust code is easier to
read and write, and is less off-putting to new-comers.

Data: there are 1557 occurrences of `&*` (and 3257 uses of `as_slice`) in the
Rust compiler and standard libraries (excluding tests) and 565 uses in Servo
(and 540 uses of `as_slice`). Note that this proposal would remove most, but not
all of these occurrences (because they occur where implicit coercions are not
allowed).

The fact that the coercion is limited to deref has several consequences for
reasoning about code [attrib: aturon]:

* You're generally quite aware of when you're using smart pointers and hence when
this coercion might come into play.
* The Deref trait will ultimately make the target type an associated -- hence,
*output* -- type. So this coercion would apply in only a single way to a given
value. That also protects somewhat against the "abuse" of Deref to implement
non-smart-pointer coercions.
* We already run code on the side for Deref in receiver position, so this doesn't
seem to open significant new cans of worms on that front.

Footnote: To expound on the benefit or otherwise of explicit borrows (`&`) and cross-
borrows (`&*`): there is benefit in being explicit about referencing and
dereferencing (i.e., not having auto-borrowing (other than for receivers) in
Rust) - it means that the programmer may reason (using only local knowledge)
about the performance characteristics of the call, potential aliasing, and (in
the case of mutable references) how objects can be mutated by function calls.
Furthermore, it helps the programmer keep in mind the distinction between
pointer and value types, which is essential in a systems language.

By contrast, requiring explicit `&*` (i.e., not implementing this RFC) has none
of these benefits - since we are simply converting from one kind of pointer to
another, the performance, aliasing, or mutability characteristics cannot change.
Nor does this blur the distinction between pointers and values. Focusing on the
mutation argument, eliding `&mut` at a call site does remove some information
about how an argument may be affected by a function call. However, this can only
happen when the argument is already some kind of mutable pointer and so this is
analogous to the case where the argument has `&mut` type before coercion (where
there is no indication at the call site of being `&mut`).

# Detailed design

If `U` is `&V` and `T` implements the `Deref<V>` trait, or `U` is `&mut V` and
`T` implements `DerefMut<V>`, then `T` may be implicitly coerced to `U` (for
example where a function's formal parameter has type `U` and the corresponding
actual parameter has type `T`). If the expression with type `T` is `e`, then the
dynamic semantics of the conversion are that `e` (in a coercible position) is
reduced to `e.deref()` (or equivalently, `&*e`).

Only a single dereference can be elided and only if there is a matching
reference. For example (using `*` as today),

Write today | Write with this proposal
------------|-------------------------
*e | *e
&e | &e
&*e | e
&**e | *e
&&**e | &*e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this can work?

It seems let x = &&**e; is equivalent to the following today:

let tmp = e.deref().deref();
let x = &tmp;

while the latter seems like it should just be e.deref() (as &*e is now)? Or is there some deep coercions going on here, if so, that doesn't seem like it's backwards compatible, particularly, since &*e could mean two different things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be wrong actually. I was imagining *e would happen first, then we would implicitly coerce to &**e and finally the explicit & would give &&**e. If we allow coercions at &e but not at *e, then I think there is no ambiguity. This distinction does seem a bit ad-hoc though (it follows where we have an expected type in the type checker at the moment). I have another RFC which goes in to depth on sites for coercions, for this RFC I'm just kind of assuming that is nailed down elsewhere.


It would still be legal to write out the references and dereferences explicitly
(i.e., the proposed change is backwards compatible).

I believe `Box` does not actually implement `Deref`, either we implement `Deref`
for `Box` or we special case it.

Implicit re-borrows of borrowed references already occur (e.g., to convert `&mut
T` to `&T`).


# Drawbacks

Makes the ownership metaphor somewhat fuzzier [attrib: aturon]:

* `Box<T>` would coerce to `&T`
* `T` would not coerce to `&T`

You can make some sense of this by thinking of `Box` as a pointer. But, even so,
it's a coercion that silently introduces borrows of owned data. It seems
inconsistent to do data that happens to live on the heap via `Box`, but not
owned data on the stack. It means that when you write

foo(some_data)

you may or may not be transferring ownership of `some_data` -- you have to know
all the types involved. By contrast, in today's Rust, you're always transferring
ownership. With `foo(&some_data)` you're passing ownership of a borrow. Since
ownership and borrowing are so central to Rust, I think consistency here is very
important; I think we should either auto-borrow *and* cross-borrow, or do
neither.

[Response] The difference is not between heap and stack allocation - a field
stored by value on the heap has the same behaviour as a value on the stack. So
the distinction is purely on whether an object is a value or a pointer, and
this is a distinction that must be at the forefront of any system programmer's
mind in any case.

I see two issues - one is that we treat T and Box<T> types differently, in
particular that an argument with no explicit ref/deref has different semantics
depending on the type. Two is that for Box<T> you need to know the type of the
formal argument to know if the actual is moved or borrowed. Both are certainly
disadvantages, but neither, I think, outweighs the benefit here.

For the first issue, the programmer must know this difference in any case -
today we use different syntax for calls (`&` vs `&*`), also for performance
reasons. For the second issue, I don't think it matters if you get it wrong - it
does not affect performance (unlike auto-borrowing, we never change the calling
semantics). Nor can it introduce bugs - if you assume a borrow happened, but
really a move did, the compiler will prevent you using the value after the call,
if you make the opposite assumption you just end up writing code which is too
conservative. Likewise, when reading code, it only matters if ownership is
transferred if the argument is used after the call, and if it is, then you can
assume that ownership was not transferred.

I feel that both the argument that we should not implicitly execute arbitrary
code and the ownership model argument are both placing principal too high
against practicality. I think this is a case where we can be a little less
principled for the sake of ergonomics.


## Other drawbacks

This change makes it less clear where referencing and dereferencing occur.

This change means treating `Box<T>` a bit more like a pointer, rather than a
value.

Allows arbitrary code to be executed implicitly.


# Alternatives

Allow cross-borrowing only in the immutable case. That is, only coerce if there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of this. For methods, it is normally fairly clear that the receiver is being mutated, e.g. vec.push, but this is not necessarily so clear for general arguments.

I know that some C++ style guides disallow non-const references; although we are at a slight advantage, since the effects of 'incorrect' mutation are not nearly as catastrophic.

is a `Deref` implementation to an `&` pointer, not in the the case of `DerefMut`
to an `&mut` pointer. This has the advantage that any calls where an argument
may be mutated by the callee are indicated at the call site (although this is
not such a great advantage because of the `&mut self` case for receivers, and
the case where the argument has `&mut` type).

More drastic auto-borrowing - such as from any type to a borrowed reference. In
the case of smart-pointer types, this would not actually do what the programmer
expects, unless `Deref` was taken into account (i.e., the rules would be
somewhat confusing to account for doing either an `&e` or a `&*e` coercion,
depending on whether `e` implements `Deref` and on the type being coerced to).

Stick with the status quo.

# Unresolved questions

Whether to extend this mechanism to allow conversions from `String` to `&str`
and `Vec<T>` to `&[T]`. Since that really hinges on whether these types should
implement `Deref`, I think it is a separate issue. If we did want to take this
approach for solving the 'as_slice' problem, then this RFC is a necessary step.

There are some issues with exactly where and how coercions and other type
conversions happen in Rust. That is out of scope for this RFC and will be
addressed in a separate one, coming soon.