Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching #52219

Open
joyeecheung opened this issue Mar 26, 2024 · 47 comments
Labels
loaders Issues and PRs related to ES module loaders module Issues and PRs related to the module subsystem.

Comments

@joyeecheung
Copy link
Member

joyeecheung commented Mar 26, 2024

Spinning off from #51977

Background

There has been wide-spread monkey-patching of the CJS loader in the ecosystem to customize the loading process of Node.js (e.g. utility packages that abstract over the patching and get depended on by other packages e.g. require-in-the-middle, pirates, or packages that do this on their own like tsx or ts-node). This includes but is not limited to patching Module.prototype._compile, Module._resolveFilename, Module.prototype.require, require.extensions etc. To avoid breaking them Node.js has to maintain the patchability of the CJS loader (even for the underscored methods on the prototype) and this leads to very convoluted code in the CJS loader and also spreads to the ESM loader. It also makes refactoring of the loaders for any readability or performance improvements difficult.

While the ecosystem can migrate to produce and run real ESM gradually, existing tools still have to maintain support for CJS output (either written as CJS, or transpiled from ESM) while that happens, and the maintenance of CJS loader is still important. If we just provide a universal API that works for both CJS and ESM loading customizations, tooling and their users can benefit from a more seamless migration path.

Why module.register() is not enough

The loader hooks (in the current form, module.register()) were created to address loading customization needs for the ESM loader, so they only work when the graph is handled by the ESM loader. For example it doesn't work when the require() comes from a CJS root (which could be a transpiled result), or from createRequire(import.meta.url). Addressing existing use cases in the CJS loader is not in the scope of the loaders effort, either (it was brought up before was dismissed in a previous PR too). Therefore tooling in the wild still have to maintain both monkey-patching-based hooks for CJS and loader hooks for ESM when they want to provide universal support. Their users either register both hooks, or (if they know what format is actually being run by Node.js) choose one of them.

The module.register() API currently forces the loader hooks to be run on a different worker thread even if the user hooks can do everything synchronously or need to mutate the context on the main thread. While this simplifies de-async-ing of loader code to some extent when they want to run asynchronous code in loader hooks, the value is lost once the loader needs to provide universal support for CJS graphs and has to maintain synchronous hooks for that too (and in that case, they could just spawn their own workers to de-async, e.g. what @babel/register does on top of require() monkey-patching).

For tooling that only has CJS support via require() monkey-patching, if they want to add ESM support, this unconditional worker abstraction as the only way to customize ESM loading makes wiring existing customizations into ESM more complicated that it needs to be. The move to unconditional workers also lead to many issues that are still unaddressed:

For us maintainers, having to support this worker setup as the only way to customize module loading also adds maintenance burden to the already convoluted loaders. It is already difficult to get right in the ESM loader (e.g. having to doge infinite worker creation or having to figure out how to share the loader worker among user workers), let alone in the monkey-patchable CJS loader.

Proposal of a synchronous, in-thread, universal loader hooks API

As such I think we need something simpler than module.register() that:

  1. Allow users to customize module loading universally - both CJS and ESM, no matter how they are loaded.
  2. Easy to migrate to from existing require() monkey-patching-based hooks.
  3. Allow us to fully deprecate require() monkey-patching sooner.

This becomes more important now that we are on a path to support require(esm) and want to help the ecosystem migrate to ESM by providing a path with backwards-compatibility and best-effort interop, instead of providing features that does not work in the existing CJS loader or goes against existing CJS usage patterns, making it difficult for people to migrate.

I propose that we just add a synchronous hooks API that work in both the CJS and the ESM loader as a replacement for the monkey-patchability of require(). The API can be something like this - this is just a straw-person sketch combining existing module.register() APIs and APIs in npm packages like pirates and require-in-the-middle. The key is that we should keep it simple and just take synchronous methods directly, and apply them in-thread.:

// If users want to do something off-thread, they can spawn their own workers here.
// Or import { addHooks } from 'node:module'
require('module').addHooks({
  // Useful for transpiler support of new extensions, etc.
  resolve(specifier, context, nextResolve) { ... },
  // load() + resolve() should be enough to provide what pirates need
  load(specifier, context, nextLoad) {
    // If users want to do things off-thread, e.g. to generate
    // or fetch the source code asynchronously, they could run it
    // in their own worker and block on it using their own
    // Atomics.wait().
  },
  // Should be enough to provide what require-in-the-middle need
  export(specifier, exports, context, nextExport) { ... }
  // Other hooks that we find necessary to replace `require()` monkey-patching
});

The main difference between this and module.register() is that hooks added via module.register() are run on a different worker unconditionally, while module.addHooks() just keeps things simple and runs synchronous hooks synchronously in-thread. If users want to run asynchronous code in the synchronous hooks, they can spawn their own workers - this means technically they could just implement what module.register() offers on top of module.addHooks() themselves. So module.register() just serves as a convenience method for those who want to run the code off-thread and prefer to delegate the worker-atomics-wait handling to Node.js core.

In a graph involving real ESM, module.register() can work in conjunction to module.addHooks(), the hooks are applied in the same order that they are added in the main thread. In a pure CJS graph, module.register() continues to be unsupported, as what's has already been happening. Maybe someday someone would be interested in figuring out how to make module.register() work safely in the CJS loader, but I think the burden from the handling the unconditional workers is just not worth the effort, especially when users can and already do spawn their own workers more safely for this use case. IMO a simple alternative like module.addHooks() would be a more viable plan for universal module loading customization, and it gets us closer to deprecating require() monkey-patching sooner.

Migration plan

Tooling in the wild can maintain just one set of synchronous customizations, and handle the migration path by changing how these customizations are wired into Node.js:

  1. On older versions of Node.js that they still need to support, they continue to fall back to customization wiring via both paths: into CJS via require()-monkey patching, and into ESM via module.register(). This is unfortunately what they already do today if they want to provide universal module support.
  2. On newer versions of Node.js, they can wire the same set of customizations into both CJS and ESM via just one path: module.addHooks(). There is no longer need to maintain two wiring for universal support of CJS and ESM. And, if they didn't support real ESM before, they get to implement ESM support relatively simply by just migrating from require() monkey-patching to module.addHooks().
  3. When they drop support of older versions that lack module.addHooks(), they can remove dependency on require() monkey patching completely.

For us, the migration plan looks like this:

  1. Provide module.addHooks() as a replacement for require() monkey-patching, and make it wired into both require()/createRequire() from the CJS loader and import/require in imported CJS from the ESM loader
  2. Doc-deprecate require() monkey patching and actively encourage user-land packages that rely on patching to migrate. At the mean time require() monkey patching will still work to some extent in conjunction with the new loader hooks, so that packages have a graceful migration period.
  3. When the dependencies on require() monkey patching drop enough in the ecosystem, start emitting runtime warnings when the internal properties of Module are patched, and suggesting to use module.addHooks().
  4. After that we will no longer maintain compatibility hacks for require() monkey patching to work. Packages who monkey-patch Module but don't manage to migrate might still work with newer versions of Node.js - until we do internal changes to the internal properties that they rely on. When we do that and break them, instead of further convoluting internals to make patching work, we'll suggest them to just use module.addHooks() on newer versions of Node.js.
@joyeecheung joyeecheung added loaders Issues and PRs related to ES module loaders module Issues and PRs related to the module subsystem. labels Mar 26, 2024
@joyeecheung
Copy link
Member Author

cc @nodejs/loaders

@joyeecheung
Copy link
Member Author

Also cc @arcanis @Qard from #47999 and @mcollina @legendecas from #52219

@arcanis
Copy link
Contributor

arcanis commented Mar 26, 2024

Having synchronous hooks could also be interesting for 3rd-party "Node.js Resolver" implementations, for which switching their API from sync to async just for loaders is a challenge (ex wooorm/import-meta-resolve#10 (comment)).

@Qard
Copy link
Member

Qard commented Mar 26, 2024

APM vendors would very much like this. We do unspeakable things with import-in-the-middle to make it look similar to require-in-the-middle, and all because ESM loaders were unwilling at the time to give us a way to just patch actual objects rather than only rewriting code text.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 26, 2024

Another use case for the universal hooks: users can create packages to e.g. track module information (like #52180) especially in the context of debugging dual packages, or develop tooling for other sorts of module diagnostics e.g. finding duplicate dependencies, cycles. These all require hooks that are supported by both loaders.

@GeoffreyBooth
Copy link
Member

If I’m remembering correctly, part of the theory of the current hooks design was the expectation that the ESM loader would eventually become the only loader, as it can handle everything including CommonJS: #50356. And the hooks would need to be off-thread because they would need the Atomics syncification stuff in order to support the sync require (among other reasons). And once the ESM loader is the only loader, the current hooks would apply to all loaded modules, and we can drop our never-official support for monkey-patching as the new hooks provide roughly equivalent functionality.

I guess my question is, what’s the longer-term vision for maintaining the modules code? I find the current modules codebase borderline incomprehensible, and they’re not two separate loaders; the CommonJS loader calls into the ESM loader for import() and require(esm), and the ESM loader calls into the CommonJS loader for import of CommonJS, among other interconnections ("exports", "imports", etc.). I understand the hesitance in making the ESM loader the default while it’s slower than the CommonJS loader, but that’s probably not something we can ever entirely fix as it just needs to do more things; and it’ll matter less and less as people write their apps using ESM syntax and therefore the CommonJS loader isn’t an option. So I guess what I’m getting at is, what comes after this? Are we still hoping to reorganize the modules codebase and merge things together and clean it up, and hopefully realize some performance improvements as a result (maybe by porting more parts of it into C++)?

And if so, how does this proposal fit into that plan? I don’t see these hooks as necessarily contradicting such a goal, but I think we should design them in such a way that they’re complementary: they should apply to ESM loader-managed modules as well as CommonJS loader-managed modules, for example. And we should probably add this export hook to the existing set of hooks in the current API; I don’t see why we couldn’t.

@GeoffreyBooth GeoffreyBooth added the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label Mar 26, 2024
@Qard
Copy link
Member

Qard commented Mar 26, 2024

Probably related is that I had previously attempted to add diagnostics_channel events to the module loading lifecycle, which could have been patchable through subscribers. That ended up stalling though as loaders was just at the start of the off-threading change so too much was changing for me to be able to keep up with my PR at the time.

I think we don't actually need a new and complicated system, we just need some diagnostics_channel hooks at the right time to intercept the exports object and rebuild it.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 26, 2024

Are we still hoping to reorganize the modules codebase and merge things together and clean it up, and hopefully realize some performance improvements as a result (maybe by porting more parts of it into C++)?

I think that can still be a long term thing, but deprecating require() monkey patching is a more imminent need - or support for a proper, universal loader hook is really something long overdue and it should've happened many years ago (it also predates ESM). Without this universal hooks I doubt we can really get us out of the messy require() monkey patching situation and the compatibility burden - I see this is a pre-requisite for any substantial module loader refactoring.

and it’ll matter less and less as people write their apps using ESM syntax and therefore the CommonJS loader isn’t an option

Writing in ESM syntax is not the same as running ESM syntax. Many of the dependencies of require() monkey patching are still working with CJS transpiled from ESM for maximum compatibility, so they need to provide CJS support one way or another - and currently, with require() monkey patching. It helps the migration to running real ESM by providing a universal hooks that work for both CJS and ESM loaders, then these tools only need to maintain one set of hooks, and their users don't need to deal with the hook changes when switching to actually running real ESM.

But I think we should design them in such a way that they’re complementary

I think technically the synchronous hooks are just lower level, more powerful hooks that module.register() itself can be built upon. They allow you to do more things and give you more control over the loading process. module.register() is just a high-level convenience API that provides more abstractions and with them, more limitations. If we provided the synchronous hooks long ago, module.register() could've been implemented in the user land (even the ESM loader itself could've been implemented in the user land - which technically is what esm did, using require() monkey-patching).

And we should probably add this export hook to the existing set of hooks in the current API; I don’t see why we couldn’t.

We could but then being off-thread makes it somewhat useless. For example if the module exports a function, then you just can't send a function (a closure) over to a different thread and let the other thread wrap the actual closure somehow, because closures are not serializable. I think that was what #52219 (comment) was talking about. They would then just need to hack their way by modifying the source code that will be compiled and invoked to generate these closures on a different thread (which has nothing to do with exports, modification of the code is done in load), then again it's hacky and likely bug-prone, and if the wrapped closures needs to access some states in the loader it then needs to go through a messaging dance, which again can be bug-prone (e.g. you'll need to employ locks in the wrapped closures to maintain the execution flow, and risk deadlocking yourself as you are already patching code from other people which can be invoked in unpredictable ways). With in-thread export hooks, you can just wrap the exported closures directly in the export hooks' closure, which is also in the same context, and you don't need to use locks because it's in the same thread, and that just simplifies everything.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 26, 2024

I understand the hesitance in making the ESM loader the default while it’s slower than the CommonJS loader, but that’s probably not something we can ever entirely fix as it just needs to do more things

I don't think that's true, I think the design of ESM (the specification, not the Node.js ESM loader) provides a lot of room for an implementation with optimal performance. It doesn't need to unconditionally do more things. All the extra things it need to do are subject to what the module graph looks like and it's possible to provide a fast happy path that can be hit by maybe >80% of the pure ESM packages you can find on npm. And I think synchronous loader hooks can be an important piece in this as it also allows user-land hooks to participate in a fast happy path, instead of having no way to opt out of a slow abstraction that they don't necessarily need or could do better themselves. It also paves the way to fully deprecate require() monkey patching and allows us to do more substantial refactoring for optimizations while keeping breakage to the minimum.

@GeoffreyBooth
Copy link
Member

instead of having no way to opt out of a slow abstraction that they don’t necessarily need or could do better themselves.

Is the “slow abstraction” the fact that the hooks are async, or that they’re off-thread? If it’s that they’re async, I guess the question then is how do you handle when hooks do need to be async. If it’s that they’re off-thread, I don’t think we should assume that the hooks thread necessarily makes things slower. In nodejs/modules#351 (comment) it was estimated that spawning the thread incurs about a 10ms cost, but that once the thread exists the overall loading might actually be faster than a fully main thread approach because many CPU-intensive tasks like transpilation would benefit from running in a separate thread concurrent with the application code running in the main thread. And users might be able to achieve similar goals themselves by having their hooks spawn a separate thread, but it’s much more difficult to orchestrate and multiple customization libraries wouldn’t be likely share the same hooks thread, so there are benefits to Node setting this up for them.

That’s all not to say that we can’t have both, but of course shipping both then causes the UX complexity of essentially two APIs for doing the same thing.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 26, 2024

If it’s that they’re async, I guess the question then is how do you handle when hooks do need to be async.

I think I've repeated this many times in the OP - hooks can spawn their own workers. They can and they already do (e.g. @babel/register already does this). But there's no need to force this upon every loader, no matter they need this or not.

but it’s much more difficult to orchestrate and multiple customization libraries wouldn’t be likely share the same hooks thread

I would say it's a double-edge sowrd, by trying to take this over we are introducing footguns like #50948 or #47615. It's fine if they can work with the footgun. But I don't think it's fine to make that the only option we provide to our users, and when they come back to us, we provide no actionable mitigations other than documenting the footgun or telling them to not do what they need to do or their dependencies need to do, even though the use case can be simply solved by us not trying to be clever and just giving them an option with fewer abstractions.

shipping both then causes the UX complexity of essentially two APIs for doing the same thing.

The complexity already exists, as I've explained in the OP. Users still need to provide both CJS loader hooks via monkey patching and ESM loader hooks via module.register() when they intend to provide universal support. We are just providing an API for them to just do the wiring for both loaders in one API, instead of with a hack in one loader + an API that only supports the other loader.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Mar 26, 2024

Users still need to provide both CJS loader hooks via monkey patching and ESM loader hooks via module.register() when they intend to provide universal support.

They shouldn’t, though, when the ESM loader handles the entry point, because the current hooks handle require when it’s under the ESM loader; and the ESM loader handles even CommonJS entry points today whenever --import is passed. So if users run something like our recommended node --import some-library/register entry.js, that some-library should be able to define hooks using the current API and cover everything, unless I’m mistaken about something. And if I’m right about this, then the only thing that libraries need to provide CommonJS monkey-patching hooks for is when the CommonJS loader is used for a CommonJS entry point; which would go away once the ESM loader handles all entry points. And libraries can simply avoid needing to support that case by saying in their documentation that they need to be registered via --import.

This still leaves the question of whether the current hooks are somehow incomplete in terms of whether they can do everything that CommonJS monkey-patching can do, but that’s not something that necessarily needs to be solved by creating a whole alternate approach to hooks. And this all isn’t to argue that there aren’t advantages to the proposal, as you list many in the top post and many are still valid. But I don’t think that this point isn’t necessarily one of the benefits.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 26, 2024

then the only thing that libraries need to provide CommonJS monkey-patching hooks for is when the CommonJS loader is used for a CommonJS entry point; which would go away once the ESM loader handles all entry points.

Or libraries that haven't started implementing the module.register() hooks, or are still broken by the recent off-thread changes, or still only work with CJS because they only need to work with transpiled CJS, or older versions of libraries that still rely on this directly or indirectly that still get depended on by another part of the ecosystem. module.register() adoption is not something that the ecosystem has already done. What is still being widely adopted now is require() monkey patching. The fact that module.register() works completely different from require() monkey-patching-based hooks isn't helping the adoption, either. Adopting module.register() and asking users to switch to --import is a semver-major change, but refactoring to use the new synchronous hook wherever available can be a semver-patch. If anything, a universal hook that operates closer to e.g. pirates only makes the future switch easier and more seamless - tools migrate to a proper hook that's more reliable than monkey patching, and they get to support ESM at the same time, and they won't be affected by our internal refactoring any more, and the new hooks are close enough to existing hacky hooks that it's simple for them to just drop it in in a semver-patch release (and we can just update e.g. pirates to use our new hooks, thus their dependents get a graceful upgrade, too).

I also think the switch only does more harm than good with the current shape of the ESM loader, and that shouldn't be used as an existing or imminent condition. The existing condition is that the ecosystem still relies on require() monkey-patching, and a universal hook with a model similar to require() monkey-patching-based hooks provides a better migration story (it will also allow both migrated hooks and hooks that are not yet migrated to work together for a graceful period, whereas switching to ESM loader is a hard break because it doesn't support existing code that does require() monkey-patching at the mean time). I think the practice of just breaking what the ecosystem have been doing all along with no graceful migration path, and only providing a solution that's too different from what the existing code is built upon, is what makes ESM adoption slow and basically a repetition of the lack of require(esm).

@Qard
Copy link
Member

Qard commented Mar 28, 2024

If I'm understanding the spec right, we still have the problem of import bindings being immutable, so patching the exports object would not work?

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 28, 2024

One solution would be to create a module facade via vm.SynthethicModule (currently behind —experimental-vm-modules), then you can re-export them with wrapped implementations. That’s what we do to export the builtins as ESM (with an internal version of it) with an added default export.

@GeoffreyBooth
Copy link
Member

Also, presumably the hook would run before the module is linked, right? So the exports could get patched by the hook before they’re registered in V8. They would be immutable after that point, aside from techniques like @joyeecheung mentions, but this pre-linking customization should cover a lot of use cases?

@Qard
Copy link
Member

Qard commented Mar 28, 2024

Would pre-link work? Basically we need to be able to get a reference to a current exported function or class and replace that export with another that calls the original internally. If we aren't linked yet, can we even get the references?

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 28, 2024

presumably the hook would run before the module is linked, right?

If you are talking about export hooks, then it needs to happen after evaluation (that's the point when we have the actual namespace after executing the module code with V8). Linking is the phase where import 'foo' gets resolved to a V8 module that's not yet evaluated (but could be compiled). So that happens before evaluation and of course, before export hooks are run.

@JakobJingleheimer
Copy link
Contributor

JakobJingleheimer commented Apr 9, 2024

I like the general idea proposed here 🙂

Jotting down a few points, which may be invalid or a pipe-dream:

  • We (I?) would very much like to eventually eliminate CJSLoader as much as possible (it's old, nobody knows how it works, and everybody is afraid to even breath at it). If that is realistic, I would want to avoid a solution here that works against that goal. That said, providing a proper alternative to monkey-patching is ultimately more important IMO (but don't make me choose 😰).
  • The proposal looks very similar to existing hooks and its mechanisms; from a bird's eye view, it looks to me that it should be able to leverage/build-upon what we already have. I think if we can do that, it would have a lot of benefits (DRY, smaller maintenance surface area, principle of least knowledge, etc)
  • Official ESM mocks is coming (I'm starting the TC39 proposal maybe in the coming weeks); I'm hoping it can fit together with our existing hooks. If we leverage what we already have for CJS too, perhaps we can get mocks/exports for free.
    • On a related note: I want to avoid doing something here that would hinder mocks (like we can't do it because the design we come up with here codes us into a corner down the line). I think if mocks as a spec lands, that takes priority over CJS (which is not official).

Addressing existing use cases in the CJS loader is not in the scope of the loaders effort, either (it was brought up before was dismissed in a previous PR too).

I think that is not completely accurate: I believe it's not that we rejected the idea, but merely that it is not yet possible/feasible (more work would be needed to facilitate it), so it was out of scope of the cited PR. My vague assumption when I imagine modules in the future is that it actually would be this way (but perhaps that's naïve of me).

Also, I'm not married to the off-threading design (but I will cry a lot a lot if it's ripped out after so much blood, sweat, and tears). Off-threading is undeniably a huge complication and maintenance burden. If there's an alternative that achieves the crucial bits that provided, let's hear it!

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 9, 2024

If that is realistic, I would want to avoid a solution here that works against that goal.

I would say without this proposal doing any substantial changes to the CJS loader would be unrealistic. We cannot get rid of the CJS loader while tens/hundreds of millions of downloads per week on npm rely on patching its underscored methods. We should at least find a way to reduce that to about <1 million of downloads per week before breaking the user-accessible part of the CJS loader, and this proposal hopefully does that.

it should be able to leverage/build-upon what we already have.

I think that's a nice to have though I suspect it could just make maintainability worse, not better, due to the existing complexity of the hooks.

I think if mocks as a spec lands, that takes priority over CJS (which is not official).

I suppose by lands you mean stage 4? That seems pretty far-fetched. ECMA262 proposals don't usually go to stage 4 within a year, especially when they are not a simple helper function. I would estimate something like module mocks to take 2-3 years to go to stage 4 (similar to how long it took for TLA), and leaving CJS monkey-patching in the ecosystem for 2-3 more years doesn't sound like a win for anyone.

One a side note, I think we should learn our lessons in "why require(esm) didn't happen 5 years ago" (TLA was part of it) and be pragmatic. TLA didn't end up contradicting require(esm) (from what I can tell it forced the ESM spec to be conditionally async and only made synchronous-only require(esm) even more sound), and browsers decided to align the module evaluation semantics with the bundlers in the WHATWG spec. I am only seeing a trend of specs being more pragmatic and align with existing usage patterns in the ecosystem - and that's also what the API being proposed here tries to do. It would seem strange that a new ECMA262 proposal goes against the existing usage pattern in the ecosystem.

Also, I'm not married to the off-threading design (but I will cry a lot a lot if it's ripped out after so much blood, sweat, and tears). Off-threading is undeniably a huge complication and maintenance burden. If there's an alternative that achieves the crucial bits that provided, let's hear it!

I think it still serve an important use case for the users (being able to do async work in the loader out of the box), and I don't think it needs to be ripped out. The proposal here is complementary and just provides a lower-level API that allows more customization and more optimization in the module loading process. The off-thread hooks can just be a helper that saves users the trouble of spawning their own workers if they do need them but also don't need control over them.

@Jamesernator
Copy link

Jamesernator commented Apr 12, 2024

    // If users want to do things off-thread, e.g. to generate
    // or fetch the source code asynchronously, they could run it
    // in their own worker and block on it using their own
    // Atomics.wait().

This still means blocking the main thread even in cases where the main thread could continue to run, notably dynamic import would result in pausing the whole main thread even though other things could run. Using userland threads doesn't help as Atomics.wait still ultimately just pauses the main thread allowing nothing else to run.

It wouldn't be that complicated to simply have mirrored hooks for sync and async which are chosen depending on whether require or import is used (with fallback to the sync hooks if async not available), i.e.:

addHooks({
    // Called for require
    resolve(specifier, context, nextResolve) { ... }
    // Called for (dynamic) import, falls back to sync resolve if not provided
    resolveAsync(specifier, context, nextResolveAsync) { ... }
    
    // Ditto for load
});

@GeoffreyBooth
Copy link
Member

I think what we want in the end is this:

  • A single API where users can define resolve and load module customization hooks that apply to any module (CommonJS or ESM or JSON, and eventually also Wasm and maybe .node) handled by either loader (CommonJS or ESM) and either method (require or import).
  • This single API applies regardless of whether the entry point is handled by the CommonJS loader or the ESM loader. The hooks apply to all modules, whether entry points or not.
  • Monkey-patching is deprecated and after sufficient notice, removed.
  • Node maintainers are free to refactor the CommonJS and ESM loaders as much as we want, removing duplication and optimizing performance, merging the loaders together if we want, moving some or all into C++.

It sounds like this proposal can achieve all of this, with the significant caveat that the new hooks would be sync instead of async. So if we want the simplicity of a single set of hooks that work everywhere, we would get rid of the current off-thread async hooks.

I’m okay with jettisoning the current off-thread hooks. They’ve proven difficult to maintain, and if most users don’t need the ability to run async hooks to customize require calls, then we’re burdening ourselves with complexity for little to no benefit. I’d rather not maintain two hooks APIs, one for sync and one for off-sync, as that’s difficult to explain to users and doesn’t save us any maintenance burden. If most users can make do with on-thread sync hooks, and there’s some way to achieve async for the hook authors who really need it, I think that’s good enough.

@Qard
Copy link
Member

Qard commented Apr 15, 2024

The PR implements support for trace events

As far as I understand, those events can only be listened to externally though. Or maybe some C++ to connect with Perfetto somehow? We need this data within the JS main thread though.

In any case, I think this might be derailing a bit, but my main point was I would not assume a sync equivalent of the existing hooks covers all the use cases for patching until we've actually talked to the community and gathered feedback. We probably won't get everyone's attention with that, but we will at least somewhat minimize the impact when we break support for things currently being done with patching. I do still expect we'll catch some unaware and they'll show up in the issues. I just don't want us flooded with hundreds of users all at once complaining that we didn't solve a dozen different use cases before just breaking things.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 15, 2024

I just don't want us flooded with hundreds of users all at once complaining that we didn't solve a dozen different use cases before just breaking things.

I think the most aggressive(?) notification before full breakage would be deprecation warnings even inside node_modules (and in the warning, asking users to reply to a tracking issue about their use case maybe?). That can be done in the duration of an LTS to make sure it get enough attention. Technically we don't need to break them all at once, just emit warnings for individual patched methods as we go and leave some time to the feedback cycle before we make breaking changes. The CJS loader is no stranger to this "emit warning if foo is patched and keep the patched foo work in a hacky way, otherwise use a different implementation of foo" pattern.

@timfish
Copy link
Contributor

timfish commented Apr 17, 2024

Would the proposed solution negate the need for additional Node args like --import/--loader to register these new hooks?

Additional Node args are a pain point for a variety of reasons:

  • They add additional complexity and configuration for end users
  • Many framework cli's and dev servers don't even provide a means to pass Node args when developing
  • Passing Node args to packaged Electron apps is not trivial. Last time I had to do this I had to ship additional scripts that started the Electron executable with the required args!

My understanding was that ESM imports are all evaluated first before any other code is run and that is why we're required to use --import to hook ESM loading. How would this proposal work around that? Would it need to work in a similar way to import-in-the-middle to achieve this?

@GeoffreyBooth
Copy link
Member

Would the proposed solution negate the need for additional Node args like --import/--loader to register these new hooks?

The current hooks can be used without a flag, that’s what module.register is for. You just need to ensure that your application code isn’t already loaded before the hooks are registered, for example with an entry point file like this:

import { register } from 'node:module'

await register(new URL('./hooks.js', import.meta.url)) // Note the await here
await import('./app.js') // Note the dynamic import here

It would be similar for the new hooks.

@timfish
Copy link
Contributor

timfish commented Apr 18, 2024

You just need to ensure that your application code isn’t already loaded before the hooks are registered

await register(new URL('./hooks.js', import.meta.url)) // Note the await here
await import('./app.js') // Note the dynamic import here

While this workaround is viable in a lot of use cases, many of the most popular meta frameworks do not expose an entry script where you could add this async import. It is possible to code your own entry point for many of them but you can break compatibility with their tooling so it's often not documented or recommended.

It's clear that in the long term we should be encouraging frameworks to include hooks that allow us to run code before the app code is async imported. However today, the only options for hooking ESM are using the --import flag or instructing users to use non-standard setups and neither of these options are great or work across all frameworks in dev and production!

@joyeecheung
Copy link
Member Author

joyeecheung commented May 24, 2024

I have a POC in https://github.com/joyeecheung/node/tree/sync-hooks - pretty sure it's full of bugs still and there are a bunch of TODOs, but at least it can work with a mini typescript transpiler in the test. Working on a more concrete API design and planning to PR the API doc to the loaders repo (?). I suppose the export hooks will be left to a later iteration.

@mcollina
Copy link
Member

After an in-depth review of the loaders code, I support this proposal.

@joyeecheung
Copy link
Member Author

Updated the WIP a bit and made a whacky require-in-the-middle mock work. The API currently looks like this, which is fairly consistent with the module.register ones (with some stuff like conditions still missing):

/**
 * @typedef {{
 *   parentURL: string,
 * }} ModuleResolveContext
 */
/**
 * @typedef {{
 *   url: string,
 *   format?: string
 * }} ModuleResolveResult
 */
/**
 * @param {string} specifier 
 * @param {ModuleResolveContext} context
 * @param {(specifier: string, context: ModuleResolveContext) => ModuleResolveResult} nextResolve
 * @returns {ModuleResolveResult}
 */
function resolve(specifier, context, nextResolve) {
  const resolved = nextResolve(specifier, context);
  if (resolved.url.endsWith('.esm')) {
    return {
      ...resolved,
      format: 'module'
    };
  }
  return resolved;
}

/**
 * @typedef {{
 *   format?: string,
 * }} ModuleLoadContext
 */
/**
 * @typedef {{
*   format?: string,
*   source: string
* }} ModuleLoadResult
*/
/**
 * @param {string} url
 * @param {ModuleLoadContext} context 
 * @param {(context: ModuleLoadContext) => {ModuleLoadResult}} nextLoad 
 * @returns {ModuleLoadResult}
 */
function load(url, context, nextLoad) {
  const loaded = nextLoad(context);
  const { source: rawSource, format } = loaded;
  if (url.endsWith('.ts')) {
    const transpiled = ts.transpileModule(rawSource, {
      compilerOptions: { module: ts.ModuleKind.NodeNext }
    });

    return {
      ...loaded,
      format: 'commonjs',
      source: transpiled.outputText,
    };
  }

  return loaded;
}

/**
 * @typedef {{
*   format: string,
 * }} ModuleExportsContext
 */
/**
 * @typedef {{
*   exports: any,
* }} ModuleExportsResult
*/
/**
 * @param {string} url
 * @param {ModuleExportsContext} context 
 * @param {(context: ModuleExportsContext) => {ModuleExportsResult}} nextExports 
 * @returns {ModuleExportsResult}
 */
function exports(url, context, nextExports) {
  const { exports: exported } = nextExports(exports, context);
  const replaced = { ...exported, version: 1 };
  return { exports: replaced };
}

I will write a more serious doc and either PR that along side with the code here, or into https://github.com/nodejs/loaders/tree/main/doc/design (depending on how serious that doc turns out to be?)

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented May 31, 2024

I will write a more serious doc and either PR that along side with the code here, or into nodejs/loaders@main/doc/design (depending on how serious that doc turns out to be?)

I think either works. You could make a new markdown file for the loaders repo alongside the other design docs, or write API docs in a Node core PR (with or without the implementation). The loaders repo design doc might be more useful as it can go into implementation details and discussion that we might not want to include in user-facing documentation; but then we’d have to write it all up again when writing the eventual user docs.

@Qard
Copy link
Member

Qard commented Jun 1, 2024

Does that exports function handle ESM too? Or have you just tested against CJS for that so far? Ideally I would like to have a unified patching interface for both CJS and ESM, but the immutability aspect made that quite a challenge to do externally with IITM. I'm hoping we can find a way around that with a built-in sync API.

Was your thinking that within that function the user would build the vm.SyntheticModule facade we talked about before themselves and only swap out the exported object there? Maybe doable to leave that work up to the user, though I had assumed an interface for that would be provided. 🤔

@GeoffreyBooth
Copy link
Member

Something else we discussed was implementing resolve and load first and following up with exports, as exports raises a lot more questions.

@joyeecheung
Copy link
Member Author

joyeecheung commented Jun 1, 2024

I have got the WIP working for ESM yet, but actually I noticed that for ESM what you need is still
load hooks, because you'd want to swap out the module during linking. So the idea would be something like this:

function load(url, context, nextLoad) {
  if (!shouldBeWrapped(url)) return nextLoad(url, context);

  if (context.format === 'module') {
    const { module: originalModule } = nextLoad(url, context);
    const keys = Object.keys(originalModule.namespace);
    const m = vm.SyntheticModule([keys], () => {
      for (const key of keys) {
        let value = originalModule.namespace[key];
        if (key === 'foo') {
           value = wrap(value);  // Wrap exports.foo
        }
        m.setExports(key, value);
      }
    });
    // Node.js will swap the original module in the loader cache with the returned one, and run
    // the evaluation callback after the evaluation of the original module is completed.
    return { module: m };
  }
  // For CommonJS modules, it's recommended to only replace
  // properties instead of replacing the entire object to avoid
  // getting out of sync if the original module accesses
  // `exports.foo` internally directly
  if (context.format === 'commonjs') {
    exported.foo = wrap(exported.foo);
    return { exports: exported };
  }
  // unreachable?
}

If live binding of unwrapped values are necessary (i.e. if the original module modifies the exported value, you want that modification to still reflect for user code importing that original module), vm.SourceTextModule with a customized linker to resolve customized requests can be used too. You can do some name mangling to avoid conflicts, the gist is by the time the hook is invoked, you already know the keys in the namespace of the original module (but you don't have the values yet). To wrap things up, you put your logic in a wrap method, then export the wrap method in a synthetic module, let the bulit source text module use the wrap method from the synthetic module to wrap the exported value from the original module and re-export it.

    const { module: originalModule } = nextLoad(url, context);
    let source = `import { wrap } from 'util';`;
    for (const key of originalModule.namespace) {
      if (key === 'foo') {
        source += `import { foo as originalFoo } from 'original';`;
        source += `export const foo = wrap(originalFoo);`;
      } else {
        source += `export { ${key} } from 'original';`;  // Export unwrapped values with live binding
      }
    }
    const m = vm.SourceTextModule(source);
    m.linkSync((specifier) => {
       if (specifier === 'original') return originalModule;
       if (specifier === 'util') return util;  // Contains a synthetic module with the wrap method
    });
    // Node.js will swap out the original module in the loader cache with the returned one.
    return { module: m };

@joyeecheung
Copy link
Member Author

Or I think we can introduce a link hook for ESM, and import-in-the-middle would probably look like this:

function link(url, context, nextLink) {
  if (!hasIitm(url)) return nextLink(url, context);

  if (context.format === 'module') {
    const { module: originalModule } = nextLink(url, context);

    const util = new vm.SyntheticModule(['userCallback', 'name', 'basedir'], () => {
      util.setExports('userCallback', userCallback);  // Or a bigger callback folding all the added user callbacks
      const stats = parse(fileURLtoPath(url));
      util.setExports('name', stats.name);
      util.setExports('basedir', stats.basedir);
    });
    let source = `import * as original from 'original';`;
    source += `import { userCallback, name, basedir } from 'util'`;
    source += `const exported = {}`;
    for (const key of originalModule.namespace) {
      source += `let $${key} = original.${key};`;
      source += `export { $${key} as ${key} }`;
      source += `Object.defineProperty(exported, '${key}', { get() { return $${key}; }, set (value) { $${key} = value; }});`;
    }
    source += `userCallback(exported, name, basedir);`;
    const m = vm.SourceTextModule(source);
    m.linkSync((specifier) => {
       if (specifier === 'original') return originalModule;
       // Contains a synthetic module with userCallback, name & basedir computed from url
       if (specifier === 'util') return util;
    });
    return { module: m };
  }
}

@joyeecheung
Copy link
Member Author

Opened nodejs/loaders#198 because I found some higher level design questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
loaders Issues and PRs related to ES module loaders module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests

9 participants