Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Names section extensible #984

Merged
merged 4 commits into from
Feb 17, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 45 additions & 17 deletions BinaryEncoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -436,34 +436,62 @@ The expectation is that, when a binary WebAssembly module is viewed in a browser
environment, the data in this section will be used as the names of functions
and locals in the [text format](TextFormat.md).

The name section contains a sequence of name subsections:

| Field | Type | Description |
| ----- | ---- | ----------- |
| count | `varuint32` | count of entries to follow |
| entries | `function_names*` | sequence of names |
| name_type | `varuint7` | code identifying type of name contained in this subsection |
| name_payload_len | `varuint32` | size of this subsection in bytes |
| name_payload_data | `bytes` | content of this section, of length `name_payload_len` |

Since name subsections have a given length, unknown or unwanted names can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unknown or unwanted name types" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

skipped over by an engine. The current list of valid `name_type` codes are:

| Name Type | Code | Description |
| --------- | ---- | ----------- |
| [Function](#function-names) | `1` | Assigns names to functions |
| [Local](#local-names) | `2` | Assigns names to locals in functions |

When present, name subsections must appear in this order and at most once. The
end of the last subsection must coincide with the last byte of the name
section to be a well-formed name section.

#### Name List

In any of the subsequent subsections, a `name_list` is encoded as:

The sequence of `function_names` assigns names to the corresponding
[function index](Modules.md#function-index-space). (Note: this assigns names to both
imported and module-defined functions.) The count may differ from the actual number of functions.
| Name Type | Type | Description |
| --------- | ---- | ----------- |
| count | `varuint32` | number of `name` in names |
| names | `name*` | sequence of `name` |

where a `name` is encoded as:
Copy link
Member

@rossberg rossberg Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: How about calling such a pair a naming for more clarity, in contrast to a plain name which intuitively is just a string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.


| Name Type | Type | Description |
| --------- | ---- | ----------- |
| name_len | `varuint32` | number of bytes in name_str |
| name_str | `bytes` | UTF8 encoding of the name |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these have to UTF8 -> UTF16 properly? Be valid JavaScript properties?

Or should we just have a byte array like elsewhere, and restrict the name section in JS.md like everywhere else? That seems better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I guess we could just leave them plain old byte arrays, but there is a difference. For import/export names, they might be strings, they might be some custom data structure. In this case, I think they're really supposed to be strings, so I think it's beneficial (and consistent with the state before this PR) to say they are UTF8 strings. Also, they don't end up being reflected in JS as property names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kinda breaks the "embedder makes sense of byte arrays" approach we had elsewhere. It's weird to mention UTF anywhere but in JS.md.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's more like "embedder makes sense of imports and exports"; in this case they're actually supposed to be names (so strings).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC what we had before was a hacked-up names section that got us limping along, with no real intent of keeping it around. Maybe this would help me understand better: what's the intent of this refactoring? I thought you were going for something that'll stand some test of time, so without designing DWARF / TROLL I'm trying to see what we can minimally get "right". Is that out of scope? Happy to back away if so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scope of the names section is just assigning (string) names to various index spaces for the benefit of binary->text rendering. If we want to do something more in the direction of source-level debugging, I think that will require a new section with a lot more structure than a collection of arrays of names; I can't imagine that being able to encode arbitrary binary contents in the names of the names section will be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm not proposing we broaden the scope to more than strings. What I was trying to get to is: are we trying to get this string mapping thing right? Or is this a throwaway thing to tie implementations over?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have a slight preference for not mentioning encodings here either (keep that out of the core spec altogether). Rather say in the JS API spec that names are assumed to be UTF-8 encoded and handled analogously to import/export names.


#### Function names

| Field | Type | Description |
| ----- | ---- | ----------- |
| fun_name_len | `varuint32` | string length, in bytes |
| fun_name_str | `bytes` | valid utf8 encoding |
| local_count | `varuint32` | count of local names to follow |
| local_names | `local_name*` | sequence of local names |
The function names subsection is a `name_list` which assign a name to each
function by [function index](Modules.md#function-index-space). (Note: this
assigns names to both imported and module-defined functions.) The count
may differ from the actual number of functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A count that's bigger or smaller doesn't seem useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just being consistent with what it was before; I don't have a really strong opinion here.


The sequence of `local_name` assigns names to the corresponding local index. The
count may differ from the actual number of locals.
#### Local names

#### Local name
The local names subsection assigns a `name_list` to each function defined inside
the module according to its index in the [Function section](#function-section).
(Note: this does not assign names to imports; imports have no locals.) The
`name_list` for a given function assigns a name to each local. The counts may
differ from both the number of functions and number of locals in a given
function.

| Field | Type | Description |
| ----- | ---- | ----------- |
| local_name_len | `varuint32` | string length, in bytes |
| local_name_str | `bytes` | valid utf8 encoding |

| count | `varuint32` | count of `name_list` in func_locals |
| func_locals | `name_list*` | sequence of `name_list`, one per function |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can't skip a function? Or to do so you need an empty name_list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, you'd set count=0 (similar to what you had to do before when the locals were embedded inside each function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty brittle when combined with "you can avoid specifying functions if you want to". At this point why not use a map of { function_number => name } ?

I agree forcing all of them to be present is suboptimal (mixing debug and release is nice, especially if we consider transfer size). I just don't want us to end up with some super weird encoding, even if name sections aren't really standard right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A map is more general, but more complicated too; I think the normal case here is all-or-nothing (so either no local subsection at all or a full one).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like the shift in index space. And in fact, there is a use for naming locals for imports: parameter names.

More generally, I wonder whether it's optimal to require dense lists for the name section. Once we extend this to other names we might encounter lots of anonymous entries (consider the type section). And even for functions I can imagine scenarios where you have large modules but only partial name information, e.g., when merging several modules into one for deployment (what I called "static linking" the other day). An index->info map seems more flexible and potentially more compact, while not being all that complex.


# Function Bodies

Expand Down