Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Wasm components in wasm-opt #6728

Open
mh4ck-Thales opened this issue Jul 11, 2024 · 13 comments
Open

Add support for Wasm components in wasm-opt #6728

mh4ck-Thales opened this issue Jul 11, 2024 · 13 comments

Comments

@mh4ck-Thales
Copy link

wasm-opt does not support running on a Wasm component, that is the default with WASIp2. wasm-opt is used for example in LLVM to optimize compiled binaries (see llvm/llvm-project#98373), which is not possible in WASIp2 for now because of this lack of support.

A possible solution may be to iterate over a Wasm component and find the core modules within it, apply current wasm-opt optimizations on each core module, then repackage the component.

Remark: I suppose all of binaryen tools can add support for Wasm components. This issue is for wasm-opt specifically but can be transformed into a more general issue.

@kripken
Copy link
Member

kripken commented Jul 11, 2024

A possible solution may be to iterate over a Wasm component and find the core modules within it, apply current wasm-opt optimizations on each core module, then repackage the component.

I think that makes sense, since I don't think there is anything wasm-opt could do better if we gave it the entire component with its multiple modules. However, wasm-metadce on the other hand could be applied to a graph of modules in a component in order to remove code based on their interactions, though there might already be some such tool in the wasm components space?

If someone is interested to contribute code to Binaryen for either of those, that sounds good, though if parsing/writing components is complex then it might make sense to do it in an external tool that calls wasm-opt instead, I'm not sure.

@tlively
Copy link
Member

tlively commented Jul 11, 2024

wasm-opt could do things like cross-module inlining within a component. The simplest implementation would be for it to automatically merge all the modules in the component, optimize, then emit a new component with the single optimized module.

@kripken
Copy link
Member

kripken commented Jul 11, 2024

Good point, inlining would be potentially very useful here.

Merging it all first makes sense, but in that case

  1. Do external tools not exist to do that merging already?
  2. Is it always safe to merge in that way - can a reference to an internal module not be sent out?

@alexcrichton
Copy link
Contributor

At least as far as I'm aware there's no tooling for components at this time that are general purpose component-to-component optimizers. For example there's no dead-code-elimination, merging, etc.

wasm-opt could do things like cross-module inlining within a component
...
can a reference to an internal module not be sent out?

For questions/optimizations like these it would require a much deeper understanding of a component within wasm-opt. Definitely possible to do but is a much larger step up from "just optimize each core wasm module contained in-place".

Components represent a "shared nothing" boundary so you can't import/export a core wasm function/table/memory from a component. That ends up meaning that components all have a "sealed" view of the world. Modules linked together within a component are basically a way of codifying what you might do in JS glue to instantiate a module within the binary format of the component itself. Basically these sorts of optimizations are possible and there's definitely lots of possible ways to optimize a component.

I suspect though that the immediate interest would basically be to just optimize the core wasm modules themselves. Performing wasm-opt as-is should be sufficient for that. The lion's share of a component's work is often just in the "main module" that source code produced, and the more interesting case of multiple modules only starts to come up in the "shared everything dynamic linking" scenarios where you have DLLs all statically included in a component and wired up within a component. That is not all that common today though outside of Python-based components and there's lots of various things to cover there before cross-module-inlining is desired I suspect.

@kripken
Copy link
Member

kripken commented Jul 11, 2024

I see, thanks @alexcrichton Yeah, if components have such a sealed view of the world, then merging the modules in a component first before optimizing seems a good option as @tlively said. Then things like inlining and duplicate function elimination and all that would work automatically and optimally, across what used to be module boundaries.

Is there already a tool that does that merging?

@alexcrichton
Copy link
Contributor

Not currently, no (or at least not that I'm aware of). It's theoretically not too hard to do but would require deeper knowledge of components and handling of the various dependencies between modules and items within a component.

@tlively
Copy link
Member

tlively commented Jul 11, 2024

@kripken, IIUC, the point is that to do the merging in the first place, we would have to lower the component interfaces to core Wasm glue code ourselves, then merge the original componentized modules not only with each other but with that generated glue code. Is that right, @alexcrichton?

@kripken
Copy link
Member

kripken commented Jul 11, 2024

@tlively Yeah, that definitely sounds nontrivial, but isn't that the plan for running components in the browser, to merge/lower them into an MVP module (+JS glue maybe)? I may have misunderstood though! But that is why I've sort of assumed such a tool would exist (and we wouldn't need to do work in Binaryen for it).

@alexcrichton
Copy link
Contributor

Heh you're sort of both correct and I can try to provide some more background information here. I think I have given some false impressions by accident so I'll try to be more precise as well.

First I'll mention that there is indeed a way to run components on the web today, and that's through jco. That works by "decompiling" a component into core wasm modules plus a wad of JS that wires everything up. This is I think what @kripken you might be referring to? I believe that jco already has options to run wasm-opt over each of the core wasm modules within a component as they're extracted for a browser to run. Nothing currently merges them together, though, but a sufficiently advanced wasm bundler might be able to do so.

Next I'll clarify that when talking about components things can be more or less complicated depending on the components you're considering. This issue, for example, was originally inspired by the output of the wasm32-wasip2 target of Clang which is a sort of "leaf component" where it itself contains no other components internally. Instead this component imports component-style APIs (e.g. WASI) and then internally describes how to hook them all up into the core wasm output of LLVM (there's some nontrivial, but boring, things here so I won't go into much detail). In this component there's no such thing as "glue" or "fused adapters" or anything like that. Luke has talked a lot about how when a component calls another you can fuse those calls together and this fused call can itself be expressed in wasm. This only arises with more than one component, however, and the raw output of LLVM doesn't have more than one component.

So to your point @tlively it sounds like you're talking about how when components all each other there's "glue" inbetween which can be expressed as wasm and Binaryen should be able to optimize this. That's definitely not something I would consider in the near term but is a possibility long-term. It's not applicable to anything coming out of LLVM at this time. Additionally the "glue" here isn't actually in the component, something would have to create it. It turns out that this is what jco does, though. Jco will detect "fused adapters" and will generate wasm modules that implement the glue between components. So in theory these merging/etc optimizations could be done at that layer, but you're relatively far from components at this point as there's no step right now which repackages that inside of a component.

Now what I've been talking about is something I think is completely different from what y'all are thinking. In components the idea of "shared everything dynamic linking" is that you have, for example, libc.so, libpython.so, libpandas.so, and main.wasm (or something like that). The main.wasm imports libpython.so which uses libc.so and then libpandas.so is a native Python extension that's dynamically loaded as well. This can all be modeled with components at this time where the component internally describes that it has these core wasm modules which are instantiated in a certain order with various imports going into other instantiations (e.g. libpython.so is instantiated with the exports of libc.so, such as malloc, memory, etc). In this situation it might be possible to actually merge everything into one module and start inlining there. That's all happening within a single component.

So with some of that let me try to also answer directly:

IIUC, the point is that to do the merging in the first place, we would have to lower the component interfaces to core Wasm glue code ourselves, then merge the original componentized modules not only with each other but with that generated glue code. Is that right, @alexcrichton?

This is correct. This is quite a nontrivial task. Jco/Wasmtime do it and if y'all are interested I could talk more about it. The transform today never targets repackaging as a single component so care would have to be taken to ensure it's 100% semantics-perserving with what the original component wanted. Not impossible, just hasn't been a goal yet.

isn't that the plan for running components in the browser, to merge/lower them into an MVP module (+JS glue maybe)?

Yes, that's jco and JS glue is definitely required. No merging happens today though so all the modules generated are just separate modules managed by the JS glue. I do believe though that jco optionally runs wasm-opt over each generated module. I'll note here that not all modules are guaranteed to be present in the original component. The original component's modules are all present but more might be generated by jco itself for the "glue" between components.

@mh4ck-Thales
Copy link
Author

Is the inlining of modules inside a component really a good idea? I'm not overly familiar with the Wasm component model, but it seems to me that one goal is to allow multiple modules bundled together. If the end result of wasm-opt is to output a component that only contains a single wasm module, isn't it going against the spirit of components? And if not, is this inlining and rebundling issue not more of a compile-time issue and not for LTO?

Also, I'm not sure that bundling modules together before optimizing will produce better results than optimizing each module independently (I'd like to be proven wrong on that). Additionally, having clearly separated modules is a plus for security: we can inspect the behavior of each module independently and have a layer of security between modules. I'm not saying that inlining is a bad idea, but maybe we need both options: optimization of each module inside component without structure changes and optimization with inlining if it can bring upsides.

@kripken
Copy link
Member

kripken commented Jul 12, 2024

Thanks @alexcrichton !

Ok, then IIUC for the web the "optimize each module" approach is done or at least possible using jco. For non-web, I imagine a tool would call out to wasm-opt like jco can do now, but that tool would then repackage the optimized modules into the component once more. In theory such a tool could be in Binaryen itself, or external.

For "optimize all the modules of a component together" it does seem like a lot more optimization is possible (inlining, duplicate function elimination, etc.). That could be done once there is either a smart enough wasm bundler that "flattens" all the modules in a component out into a single one, or if Binaryen parsed and understood components, so again this could be done in Binaryen or externally.

As there is existing infrastructure for components outside of Binaryen then I'd guess it makes more sense to do things there. But if for efficiency or other reasons someone were interested to work on it here, that would be worth discussing of course.

However, one piece of infrastructure inside Binaryen that could help here is wasm-merge. Actually jco could use it right now, to merge all the modules it emits, and then direct calls (with no jco JS glue between them) could be optimized. And other tools that want to merge a component's modules could use it too, which could be nice as it does all the annoying work of hooking up imports to exports, using multiple memories/tables when necessary and adjusting global indexes, etc.

@mh4ck-Thales

Fair point that if someone wants the module boundaries inside a component for security or code caching or other reasons then it might not make sense to merge them all. But about this:

Also, I'm not sure that bundling modules together before optimizing will produce better results than optimizing each module independently (I'd like to be proven wrong on that).

There could be huge optimization opportunities there. Imagine one module calls a small function in another:

module A {
  func foo() {
    var sum = 0;
    for (var i in range) {
      sum += B::bar(i);
    }
    return sum;
  }
}

module B {
  func bar(i) {
    return i * i;
  }
}

If we inline between modules, that can be

module A {
  func foo() {
    var sum = 0;
    for (var i in range) {
      sum += i * i;  // inlined code from bar()
    }
    return sum;
  }
}

foo() no longer has a call in what is possibly a hot loop (and func baris no longer needed at all and can be DCE'd away).

This situation is pretty realistic in situations like main.wasm / libpython.so that @alexcrichton mentioned, as the Python C API and others like it often have such small functions.

@mh4ck-Thales
Copy link
Author

@kripken thanks for your example, I better see the interest for inlining now. I guess both use cases exists and are relevant then. They can be the result of different tools, or the same tool with different options depending on who is interested to develop what. TIL about wasm-merge, I guess that if we can leverage already existing Wasm component tools, we should be able to develop both optimization scenarios without (too much) trouble.

A good starting point for this in wasm-opt would be to properly exit when dealing with a Wasm component. For now wasm-opt crashes with an "unknown Wasm version" error when trying to parse a Wasm component.

@kripken
Copy link
Member

kripken commented Jul 16, 2024

@mh4ck-Thales An error makes sense, I agree. I opened #6751 for that now. Please take a look as I am not familiar with the component binary format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants