Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types for enum variants #1450

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
335 changes: 335 additions & 0 deletions text/0000-variant-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
- Feature Name: variant_types
- Start Date: 2016-01-07
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

This is something of a two-part RFC, it proposes

* making enum variants first-class types,
* untagged enums (aka unions).

The latter is part of the motivation for the former and relies on the former to
be ergonomic.

In the service of making variant types work, there is some digression into
default type parameters for functions. However, that needs its own RFC to be
spec'ed properly.


# Motivation

Enums are a convenient way of dealing with data which can be in one of many
forms. When dealing with such data, it is typical to match, then perform some
operations on the interior data. However, in many cases there is a large amount
of processing to be done. Ideally we would factor that out into a function,
passing the data to the function. However, currently in Rust, enum variants are
not types and so we must choose an unsatisfactory work around - we pass
each field of the variant separately (leading to unwieldy function signatures
and poor maintainability), we pass the whole variant with enum type (and have to
match again, with `unreachable!` arms in the function), or we embed a struct
within the variant and pass the struct (duplicating data structures for no good
reason). It would be much nicer if we could refer to the variant directly in the
type system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to have some examples here to make the motivations clearer.


When working with FFI code, we need to communicate with C programs which may use
union data types. There is no way to represent a union in Rust, and thus working
with such types is awkward and involves bug-prone transmutes. We should provide
some way for Rust to handle such types.

As we'll see below, variant types allow for an elegant solution to the union
problem.


# Detailed design - variant types

Consider the example enum `Foo`:

```rust
pub enum Foo {
Variant1,
Variant2(i32, &'static str),
Variant3 { f1: i32, f2: &'static str },
}
```

We create new instances by constructing one of the variants. The only type
introduced is `Foo`. Variant names can only be used in patterns and for creating
instances. E.g.,

```rust
fn new_foo() -> Foo {
Foo::Variant2(42, "Hello!")
}
```

This RFC proposes allowing the programmer to use variant names as types, e.g.,

```rust
fn bar(x: Foo::Variant2) {}
struct Baz {
field: Foo::Variant3,
}
```

Both enums and their variants can currently be imported:

```rust
use Foo;
use Foo::Variant1;
```

Importing an enum imports it into both the value and type namespace. Importing
a variant imports it only into the value namespace. To maintain backwards
compatibility, this will remain the default. In order to import an enum variant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually required for backwards compatibility? I don't think there's currently a way you can have a variant in scope and have a type with the same name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, looks like you're right, that is good news!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't check the code right now, but it looks like variants are already imported into both namespaces (and I do remember that they are defined in both namespaces):

use E::V;

enum E {
    V { field: u8 }
}

type V = u8; // Conflicts as expected

fn main() {
    let e = V{field: 0}; // Compiles as expected 
    match e {
        V{..} => {} // Compiles as expected 
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't check the code right now, but it looks like variants are already imported into both namespaces (and I do remember that they are defined in both namespaces):

I was wondering about that, I know we tried to prepare for the possibility that variants would become types...

into the type namespace, one must use the `import_variant_type` attribute:

```rust
use Foo;
#[import_variant_type]
use Foo::Variant1;

fn bar(v: Variant1) {
let _ = Variant1;
}
```

When we release Rust v2.0, we may choose to import variants into both namespaces
by default and remove the attribute.


## Constructors

Consider `let x = Foo::Variant1;`, currently `x` has type `Foo`. In order to
preserve backwards compatibility, this must remain the case. However, it would
be convenient for `let x: Foo::Variant1 = Foo::Variant1;` to also be valid.

The type checker must consider multiple types for an enum construction
expression - both the variant type and the enum type. If there is no further
information to infer one or the other type, then the type checker uses the enum
type by default. This is analogous to the system we use for integer fallback or
default type parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, pretty much what I expected and @nikomatsakis also confirmed it's what he would do (although him and @aturon aren't sure it's enough i.e. compared to full subtyping).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it gets a bit more complicated if we also support nested enums, though probably the fallback would just be to the root in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to add that with nested enums we would have a chain of potential types to infer to, but I realized that nested enums aren't in the scope of this RFC.


The type of the variants when used as functions must change. Currently they have
a type which maps from the field types to the enum type:

```rust
let x: &Fn(i32, &'static str) -> Foo = &Foo::Variant2;
```

I.e., one could imagine an implicit function definition:

```rust
impl Foo {
fn Variant2(a: i32, b: &'static str) -> Foo { ... }
}
```

This would change to accommodate inferring either the enum or variant type,
imagine

```rust
impl Foo {
fn Variant2<T=Foo>(a: i32, b: &'static str) -> T { ... }
}
```

Since we do not allow generic function types, the result type must be chosen
when the function is referenced:

```rust
let x: &Fn(i32, &'static str) -> Foo = &Foo::Variant2::<Foo>;
let x: &Fn(i32, &'static str) -> Foo::Variant2 = &Foo::Variant2::<Foo::Variant2>;
```

Due to the default type parameter, we remain backwards compatible:

```rust
let x: &Fn(i32, &'static str) -> Foo = &Foo::Variant2;
```

Note that this is an innovation. Default type parameters on functions have
[recently](https://github.com/rust-lang/rust/pull/30724) been feature-gated for
more consideration. The compiler has never accepted referencing a generic
function without specifying type parameters, even when there is a default.
However, I think this should be the expected behaviour. This should be discussed
further in a separate RFC.


## Representation

Enum values have the same representation whether they have enum or variant type.
That is, a value with variant type will still include the discriminant and
padding to the size of the largest variant. This is to make sharing
implementations easier (via coercion), see below.

## Conversions

A variant value may be implicitly coerced to its corresponding enum type (an
upcast). An enum value may be explicitly cast to the type of any of its variants
(a downcast). Such a cast includes a dynamic check of the discriminant and will
panic if the cast is to the wrong variant. Variant values may not be converted
to other variant types. E.g.,

```
let a: Foo::Variant1 = Foo::Variant1;
let b: Foo = a; // Ok
let _: Foo::Variant2 = a; // Compile-time error
let _: Foo::Variant2 = b; // Compile-time error
let _ = a as Foo::Variant2; // Compile-time error
let _ = b as Foo::Variant2; // Runtime error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see checked downcasting here. I would really prefer:

match b {
    x: Foo::Variant2 => {...}
    _ => { /*not Variant2*/ }
}

Then the user can do something useful instead of panicking, and I believe this is necessary if we want to experiment with modelling the DOM as an ADT hierarchy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I want this, it is super-important, but I forgot to put it in. I expect we can keep the current syntax: x @ Foo::Variant2 => { ... }. I'm not sure if we then apply the same inference approach to x or x always has type Variant2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the inference approach should work, if we pin the variant type on variant-specific operations (like accessing a field).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly against panicking on cast for anything but debugging.

let _ = b as Foo::Variant1; // Ok
```

## impls

`impl`s may exist for both enum and variant types. There is no explicit sharing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this let us fix rust-lang/rust#5244? Example:

enum Option<T> {
    Some(T),
    None,
}

// we add this
impl<T> Copy for Option<T>::None {}

// then, either this just works
let x: [Option<String>; 10] = [None; 10];

// or this works (can this be written without the temporary `t`?)
let t: [Option<String>::None; 10] = [None; 10];
let x: [Option<String>; 10] = t;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would likely require that the variant types have the same size as the enum itself, so they'd have to have unused padding for things like the discriminant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the case according to the RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct solution here is not using Copy directly, i.e. by either allowing constants, or ADTs of Copy-able types.

While @japaric's proposal may be equivalent to the latter option, let x: [Option<String>; 10] = [None; 10]; would not work with this RFC as written because None would infer to Option<String> which is not Copy - and there is no conversion from [None<T>; N] to [Option<T>; N] (but there could be? not sure what we can do here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Jan 08, 2016 at 07:14:59AM -0800, Jorge Aparicio wrote:

Would this let us fix rust-lang/rust#5244?

Yes, perhaps, that's one of the appealing things about having variants
be types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Jan 08, 2016 at 10:06:19AM -0800, Eduard-Mihai Burtescu wrote:

While @japaric's proposal may be equivalent to the latter option, let x: [Option<String>; 10] = [None; 10]; would not work with this RFC as written because None would infer to Option<String> which is not Copy - and there is no conversion from [None<T>; N] to [Option<T>; N] (but there could be? not sure what we can do here).

In the RFC as written, I think something like [None: None<String>; 32] would be required?

of impls, and just because as enum has a trait bound, does not imply that the
variant also has that bound. However, the usual conversion rules apply, so if a
method would apply to the enum type, it can be called on a variant value due to
coercion performed by the dot operator.


# Detailed design - untagged enums

An enum may have `#[repr(union)]` as an attibute. This implies `#[repr(C)]`,
i.e., variants will have the layout expected for C structs. More importantly, it
means that the enum is untagged: there is no discriminant. Matching (and `if
let`, etc.) are not allowed on such enums.

The size of a union value is exactly the size of the largest variant (including
any padding). There is no discriminant, nor is it possible to have drop flags.

There is no restriction on the kind of variants that can be used with
`#[repr(union)]`. Unit-like, tuple-like, and struct-like can all be used. Note
that if all variants are unit-like, then the enum is a zero-sized type. If there
are other variants, then unit-like variant values are all padding. I don't see
the utility of such variants, but I see no reason to ban them.

The only operation that can be performed on a union value is casting. An enum
value can be cast to a variant type. This is not checked (it cannot be, since
there is no discriminant) and thus is *unsafe*. Variants can also be cast
'sideways' to other variant types (also unsafe). Like other enums, a variant
value can be implicitly coerced to the enum type; this is a safe operation.

impls work exactly like regular enums.

## Example

```rust
#[repr(union)]
enum MyUnion {
MyInt(i64),
MyBytes(u8, u8, u8, u8),
}

fn foo(m: MyUnion) -> i64 {
#[import_variant_type]
use MyUnion::*;

assert!(size_of::<MyUnion>() == 8);
assert!(size_of::<MyInt>() == 8);
assert!(size_of::<MyBytes>() == 8); // 4 bytes of inaccessible padding

if consult_magic_8_ball() == 42 {
unsafe {
process_bytes(m as MyBytes)
}
} else {
unsafe { m as MyInt }.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A block, including unsafe {...} can only be an rvalue which means this copies MyInt.

While here it wouldn't make much of a difference, showing (m as MyInt).0, maybe even taking a reference thereof or assigning to it, would make the lvalue property clearer.

However, as currently only produces rvalues, so I'm not sure what the overall intent is.

}
}

fn process_bytes(bytes: MyUnion::MyBytes) -> i64 {
// safe code
...
}
```

## Destructors

It would be unsafe for the compiler to assume that a union is a particular
variant, therefore it cannot run destructors for any fields in the union. For
consistency, destructors will not be run even if the union value has a variant
type.

There are two ways to achieve this, either it is forbidden for any field in a
union to implement `Drop`; or, even if a field implements `Drop`, this is
ignored. A compromise solution is that the programmer must opt-in to ignoring
`Drop` on a per-field, per-variant, or per-enum basis, and otherwise fields
which implement `Drop` are forbidden, either with an attribute, or with a
`ManuallyDrop` type (see [RFC PR 197](https://github.com/rust-lang/rfcs/pull/197)).
I prefer this compromise solution.

It will be legal to implement `Drop` for an enum type, but illegal to implement
`Drop` for a variant type (if the variant belongs to an untagged enum). I fear
this must just be an ad-hoc check in the compiler.


# Drawbacks

The variant types proposal is a little bit hairy, in part due to trying to
remain backwards compatible.

One could argue that having both tagged and untagged enums in a language is
confusing. However, I believe the guidance here can be very clear: only use
`#[repr(union)]` for C/FFI interop. The fact that it is an attribute should make
it an obvious second choice.


# Alternatives

An alternative to allowing variants as types is allowing sets of variants as
types, a kind of refinement type. This set could have one member and then would
be equivalent to variant types, or could have all variants as members, making it
equivalent to the enum type. Although more powerful, this approach is more
complex, and I do not believe the complexity is justified.


## Unsafe enums

See [RFC PR 724](https://github.com/rust-lang/rfcs/pull/724) and
[internals dicussion](https://internals.rust-lang.org/t/pre-rfc-unsafe-enums-now-including-a-poll/2873).

Uses `unsafe` rather than an attribute to indicate that an enum is untagged.
Uses an unsafe, irrefutable pattern match (let syntax) to destructure the enum,
giving access to its fields.

Using variant types and unsafe casting as proposed here should be more ergonomic
- it better isolates the operation which is unsafe (discriminating the enum),
from the safe operations (operating on the fields themselves).

## Union structs

See [RFC PR 1444](https://github.com/rust-lang/rfcs/pull/1444).

Annotates structs rather than enums. This has the advantage over RFC 724 that
fields can be accessed directly which is an ergonomic improvement (also true
with this proposal). However, since all field access must be unsafe, it still
requires more unsafe code than you might want.

My preference is for an enum approach (as oppossed to structs) since a union
offers multiple choices of data, like an enum, rather than combining data
together like a struct. That is, enums and unions both 'or' data together,
whereas structs 'and' data together. (In C, structs and unions are syntactically
similar, but semantically very different).

Furthermore, by using enums we allow union variants to have more than one field.
While this is strictly more powerful than is needed for C interop, it is useful
in general. For example, when dealing with binary data, formats will often have
fields which are a given size, but may contain data of different types, an
untagged enum is perfect for this.


# Unresolved questions

There is some potential overlap with some parts of some proposals for efficient
inheritance: if we allow nested enums, then there are many more possible types
for a variant, and generally more complexity. If we allow data bounds (c.f.,
trait bounds, e.g., a struct is a bound on any structs which inherit from it),
then perhaps enum types should be considered bounds on their variant types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can retain only trait bounds and use e.g. V: VariantOf<E> - which is what I've used in my version of the examples in @nikomatsakis' blog post introducing many of these ideas.

There are also interesting questions around subtyping. However, without a
concrete proposal, it is difficult to deeply consider the issues here.

See destructor question above.