Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an error in repr(C) struct offset pseudocode #559

Closed
wants to merge 1 commit into from
Closed

Fix an error in repr(C) struct offset pseudocode #559

wants to merge 1 commit into from

Conversation

rkanati
Copy link

@rkanati rkanati commented Apr 6, 2019

There's an obvious error in the code in question. This fixes that.

@ehuss
Copy link
Contributor

ehuss commented Apr 6, 2019

Thanks! The original is clearly very wrong, but I think the updated version is still not correct. I think the algorithm is something closer to this:

struct.alignment = struct.fields().map(|field| field.alignment).max();

let align_to = |offset: u64, alignment: u64| -> u64 {
    let mask = alignment - 1;
    (offset + mask) & !mask
};

let mut current_offset = 0;

for field in struct.fields_in_declaration_order() {
    // Increase the current offset so that it's a multiple of the alignment
    // of this field. For the first field, this will always be zero.
    // The skipped bytes are called padding bytes.
    current_offset = align_to(current_offset, field.alignment);

    struct[field].offset = current_offset;

    current_offset += field.size;
}

struct.size = align_to(current_offset, struct.alignment);

Maybe someone more knowledgeable than me could verify that?

@rkanati
Copy link
Author

rkanati commented Apr 6, 2019

@ehuss unless I need more coffee, that's equivalent for powers of two, and wrong otherwise. And it is, in any case, pseudocode, and modulo communicates the idea better than a mask.

Edit: actually you're half right, I had a brain fart. Fixing now.

@ehuss
Copy link
Contributor

ehuss commented Jul 6, 2019

This seems correct to me, however I'm not confident enough to definitively say whether it is correct. Or maybe it should be closer to what I wrote in the comment above? Or maybe something like the suggestion in #632?

I'm going to randomly ping @RalfJung and @gnzlbg who seem to be knowledgeable about type layout. Can one of you review this and see if it is correct, or find someone who can?

@HeroicKatora
Copy link

@rkanati Didn't notice you spotted this already 👍, forgot to check PRs before the bug report. You might be interested in this branch-free formulation of align_to, making use of power-of-two assumption for correctness. With several lines of derivation, the final line is a succinct version to compute the wanted offset to the aligned address:

// (align - (offset mod align)) mod align
// = (align - offset) mod align // actual integer, not computer word 
// = (2^N - offset) mod align // PoT assumption, align < 2^N
offset.wrapping_neg() % align // 2-complement arithmetic

// Usage, example:
    // Align `current_offset` to field.alignment
    current_offset += current_offset.wrapping_neg() % field.alignment;

The current patch seems to fix the offset of fields, but there is another alignment offsetting in the last line of the algorithm that also needs fixing. The whole struct should be aligned to struct.alignment but the current line contains an error also.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 7, 2019

The point of the pseudocode is to explain the layout of repr(C) structs, not how to implement a function that rounds an integer up to some alignment. So I would prefer if this pseudocode would use align_offset(offset, alignment) to abstract over that. This intrinsic should be similar to the core::intrinsics::align_offset intrinsic, but using a offset instead of a pointer as the first argument.

EDIT: If you want to, you could provide an implementation of align_offset after the algorithm, and then bug fixing it would be limited to a single place.

@RalfJung
Copy link
Member

Shouldn't Layout provide all the methods we need for this? And if not, doesn't that show a gap in its API?

Or is that too high-level, not showing enough of the details?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 10, 2019

alloc::Layout denotes the "layout" of an untyped allocation - you don't need to know the field offsets for that, and the allocation can hold objects of different types, with different offsets, so :/

@RalfJung
Copy link
Member

RalfJung commented Jul 10, 2019

It has methods for building up a layout of a struct from the layout of its fields, though. Check https://doc.rust-lang.org/nightly/std/alloc/struct.Layout.html#method.extend.

So basically first.extend(second).extend(third).extend(fourth).pad_to_align() should compute the layout of a repr(C) struct containing 4 fields with the given layouts, in that order.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 10, 2019

Which kind of method would you like Layout to have?

When concatenating layouts you don't need offsets, and something like "align to" would probably belong in core::mem, or core::num (a method on integers to round to up / down to some power of two), core::ptr (for pointers), etc.

@RalfJung
Copy link
Member

RalfJung commented Jul 12, 2019

What I meant is that the pseudo-code in the reference, instead of manipulating integers directly, could manipulate Layout.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 12, 2019

Can you show an example ? When one concatenates layouts, one doesn't get any offsets to any fields. You only get a new layout with the size required for the struct, but the offset of the last field isn't necessarily at size - size_of<Last>() due to trailing padding, nor is the offset of the next field necessarily at size, due to padding that might be required to satisfy the alignment requirements.

@RalfJung
Copy link
Member

RalfJung commented Jul 13, 2019

I mean something like this:

/// Returns the layout of a repr(C) struct with the given field layouts in the given order,
/// and the individual field's offsets.
pub fn repr_c(fields: &[Layout]) -> (Layout, Vec<usize>) {
    let mut offsets = Vec::new();
    let mut layout = Layout::from_size_align(0, 0).unwrap();
    for field in fields {
        let (new_layout, offset) = layout.extend(*field).unwrap();
        offsets.push(offset);
        layout = new_layout;
    }
    // Don't forget trailing padding!
    (layout.pad_to_align().unwrap(), offsets)
}

Could probably be improved to rely less on mutability.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 13, 2019

Ah cool, Layout::extend returns the offset from the start of the layout!

@kinseytamsin
Copy link
Contributor

I just noticed this issue too when I looked a bit more carefully at the pseudocode. This seems pretty trivial for a merge, @ehuss @RalfJung @steveklabnik any idea why it hasn't been approved?

@@ -184,7 +184,10 @@ for field in struct.fields_in_declaration_order() {
// Increase the current offset so that it's a multiple of the alignment
// of this field. For the first field, this will always be zero.
// The skipped bytes are called padding bytes.
current_offset += field.alignment % current_offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another "current_offset + current_offset % struct.alignment" further down, why doesn't that need fixing, too?

@RalfJung
Copy link
Member

I indicated above how I think this should be written.

Also I have no approval powers here. Cc @Centril

@Centril
Copy link
Contributor

Centril commented Feb 22, 2020

Also I have no approval powers here. Cc @Centril

@RalfJung I think I handed you write access last week or some such. :)

@RalfJung
Copy link
Member

RalfJung commented Feb 22, 2020

Lol, I had no idea. Also, I didn't say I wanted approval powers here. ;)

Anyway, in terms of the code here, there might be good reasons not to use Layout (like, making the pseudo-code here self-contained), but I think at the very least "pad-to-align" should be a helper method. It's sufficiently subtle to implement, entirely non-obvious to read but has a very intuitive meaning.

Also Cc @gnzlbg and @hanna-kruppe who are much more familiar with data layout matters than I am.

@kinseytamsin
Copy link
Contributor

Heh, maybe it's not so trivial then? But regardless, as @ehuss said in the first reply, the original is clearly very wrong, so this PR deserves to finish being considered. (And if it needs changes and @rkanati ends up being AWOL I might open my own, but hopefully that won't be necessary!)

Also @RalfJung I pinged you specifically because you'd commented on this PR and you're a member of rust-lang, which if I understand correctly means you should have write access to this repo? Anyway, just explaining my logic there XP

@kinseytamsin
Copy link
Contributor

Okay, took some time to actually read more carefully through the comments and understand the arguments and the stuff referencing Layout, so I guess this wasn't merged because @rkanati never made the requested changes? I think I'll open my own PR for this.

@rkanati
Copy link
Author

rkanati commented Feb 22, 2020

Sorry to keep you all waiting - it's been the better part of a year since I submitted this, and I barely remember what's going on.

If the consensus is that this isn't as "trivial" as I thought last April, then I'll close this. Otherwise, I can copy-paste @RalfJung 's suggestion if that's somehow easier for you. Let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants