Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand binary name section to include type, table, memory, global, and label names. #1064

Conversation

AndrewScheidecker
Copy link

@AndrewScheidecker AndrewScheidecker commented May 15, 2017

This is meant to resolve #750.

[function index space](Modules.md#function-index-space). This may include both
module-defined or imported functions, but is only meaningful for module-defined
functions. The `name_map` for a given function assigns names to labels within
the function, with labels identified by the offset from the start of the function's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, using raw byte offsets pierces the abstraction level on which the rest of the binary format is defined and makes the mapping a bit less robust. I wonder if there is an alternative. For example, couldn't we just assume an implicit numbering of a function's structured control instructions in order of appearance?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered an implicit numbering of the control flow operators first, but the benefit to using a byte offset is that it doesn't require tracking additional state when decoding the instructions to compute some implicit label index for the instruction being decoded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on the implementation. If you have already constructed an AST and thrown away your pointers to the binary then it actually requires keeping more information around, and longer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A third option would be to use an instruction index that can hopefully be reused for other debug information.

I'm fine with any of those options, but I think it should be consistent across the various types of debug metadata. #1051 (source maps) and #990 (standard call stack format) both use file offsets (though not necessarily function-relative). Perhaps there's an argument that those should use instruction indices instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed there's a PR for #990, #1053.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dschuff Any thoughts on this (and its consistency with the source maps and call stack formatting PRs)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewScheidecker It feels like the purpose of the name section is assigning optional names to 'logical' entities in a module, which ought to work independent of the current representation of that module, be it binary encoding (with possibly different byte sequences encoding the same semantics), text format (where label names would also be used) or some data structure in memory (where the original encoding bytes might not be available anymore, or not yet, or possibly never at all).

As such the name section should probably use something that has meaning independent of any concrete sequence of bytes, either the 'new index space for labels' idea (most consistent with what the name section does for all other kinds of entities) or using an instruction index.
Adding names to arbitrary positions in the instruction sequence is probably a different feature from adding names to control flow labels, though.

I think you're right, there are discussions to be had about whether, in the case of Webassembly, either one or both of stack traces and source maps should rather refer to positions in a concrete file (byte or text offsets, which is what they traditionally do) or to logical instructions in a function instead. I think there's arguments for both approaches. That however seems like a very different thing from adding names to specific entities in the name section, which probably should depend neither on the former two nor on the concrete binary encoding.

@AndrewScheidecker
Copy link
Author

I updated this to index labels sequentially according to the order of the operators that introduce them.

@jfbastien
Copy link
Member

This generally lgtm. Could you put it on the agenda for the next CG meeting (not tomorrow's since that's too tight an addition, but the next one when it's posted)? Would you also be able to attend and present?

@AndrewScheidecker
Copy link
Author

I will be traveling Dec 11-25, so I probably cannot attend the next video call. I can still add it to the agenda if you'd like.

@jfbastien
Copy link
Member

@AndrewScheidecker you can put it in a section of "items to discuss at next meeting", noting your preference to be present. That way we can punt the item until you can attend.

@sbc100
Copy link
Member

sbc100 commented Nov 28, 2017

Can we name memory segments too? This would be useful for simplifying the static linking which requires naming of segments and currently does this using its own custom section.

@binji
Copy link
Member

binji commented Nov 28, 2017

We don't currently reference memory or table segments by index, but the conditional segment initializer proposal will change that with the addition of the init_memory and init_table instructions. So yeah, it will make sense to have names for those too, if we go forward with that proposal.

@kumpera
Copy link

kumpera commented May 4, 2018

We're using twiggy to inspect our generated wasm images (<3 <3 to the rust team for writing it) and the non-code bits are a significant part of our image.

This would help us a lot figure out what's taking all the space.

@AndrewScheidecker
Copy link
Author

How should I proceed with this PR? I don't think it's controversial, but is very stale now.

FWIW, I have been maintaining a fork of LLD that writes this extended name section, with the exception of the label names (I think getting that information in the linker requires some larger changes).

@binji
Copy link
Member

binji commented Jul 8, 2019

@AndrewScheidecker yes, this PR is too old to land as-is. I think it's useful and a reasonable extension of the current name section format. I think the right way to move forward is to make it a new proposal. Currently it would be phase 0, so we'd need to bring it to the CG to move it to phase 1. At that point we can fork the spec repo and make the change to appropriate place in the spec (https://github.com/WebAssembly/spec/blob/master/document/core/appendix/custom.rst)

@AndrewScheidecker
Copy link
Author

I've created a proposal repo here and hope to present it at the CG meeting tomorrow: https://github.com/AndrewScheidecker/wasm-extended-name-section

I'll close this PR now, but if anybody would like to provide feedback here on the proposal repo, I'd appreciate it!

@AndrewScheidecker AndrewScheidecker deleted the expanded-name-section branch July 8, 2019 12:51
@nomeata
Copy link

nomeata commented Sep 19, 2019

https://github.com/AndrewScheidecker/wasm-extended-name-section says it is archived. What does that mean?

@AndrewScheidecker
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include types, tables, labels, globals, and memories in binary format name section
8 participants