Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm-ep] Implement DiagnosticServer and startup sessions for WebAssembly #72482

Merged
merged 88 commits into from
Jul 25, 2022

Conversation

lambdageek
Copy link
Member

Add a diagnostic server for WebAssembly. Enable by building the runtime with /p:WasmEnablePerfTracing=true or /p:WasmEnableThreads=true.

To configure a project to start the diagnostic server, add this to the .csproj:

      <WasmExtraConfig Include="diagnostic_options" Value='
{
  "server": { "suspend": false, "connect_url": "ws://localhost:8088/diagnostics" }
}' />

The connect_url should be a WebSocket url serviced by dotnet-dsrouter server-websocket from this branch https://github.com/lambdageek/diagnostics/tree/wasm-server

Note that setting "suspend": true will hang the browser tab until a diagnostic tool such as dotnet-trace collect connects to the dsrouter.


Implement creating VFS file based sessions at runtime startup. Add the following to a .csproj:

    <WasmExtraConfig Include="diagnostic_options" Value='
{
  "sessions": [ { "collectRundownEvents": "true", "providers": "WasmHello::5:EventCounterIntervalSec=1" } ]
}' />

That will create and start one or more EventPipe sessions that will store their results into the VFS.

The startup session can be retrieved via MONO.diagnostics.getStartupSessions(). Each session s should be stopped via s.stop() and the data can then be extraced in a Blob using s.getTraceBlob().

This is orthogonal to the diagnostic server support. You don't need dotnet-dsrouter running on the host. But you do need access to JavaScript on the main thread.


Notes/Issues:

  • Tree shaking: I verified that if threads are not available, all the TypeScript diagnostics code is removed.
  • Right now the server is not very robust to dotnet-dsrouter stopping, or starting after the runtime starts. The ideal order is to start dotnet-dsrouter first, and then open the browser
  • Unrelated housekeeping fixes:
    • Tell wasm.proj about all the subdirectories with .ts files - makes incremental builds notice changes in subdirectories.
    • Add a rollup dependencies property to quiet a warning about node/buffer
    • There's a mock implementation of a "websocket" that was used for protocol testing. I verified that tree-shaking removes this in thread-enabled Release builds.
    • Bump PTHREAD_POOL_SIZE to 4 and set PTHREAD_POOL_SIZE_STRICT=2 (returns EAGAIN from pthread_create if the pool needs to grow). The previous setting PTHREAD_POOL_SIZE_STRING=1 (warn and try to grow the pool) seemed to lead to hangs. Usually that means the main thread is creating a thread and immediately calling pthread_join without returning to JS. We should investigate separately.
    • The only implemented diagnostic server commands are CollectTracing2, StopCollecting and ResumeRuntime. None of the Dump, Process and Profiler commands are implemented and the server will crash if it receives them. It should be relatively straightforward to return a "command unsupported" reply (which would allow clients to gracefully disconnect), but it's not done yet.
  • In some error states the runtime kills the browser tab with the following in a terminal window (if Chrome is started from a terminal: FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory). This probably means we're hitting a loop somewhere that rapidly exhausts JIT memory, but it's difficult to investigate when the JS console dies, too (happens with chrome stable v103 and chrome beta v104).

Fixes #69674, contributes to #72481

lambdageek and others added 30 commits July 16, 2022 13:02
Currently not wired up to the runtime
The issue is that once we're setting up streaming sessions, we will
need to send back a DS IPC reply with the session id before we start
streaming.  So it's better to just call back to JS when we start up
and setup all the EP sessions from JS so that when we return to C
everything is all ready.
more printfs
We won't be using the native C version
Combine mono_wasm_init_diagnostics and configure_diagnostics.
Now that finalize_startup is async, we can pause to wait for the
diagnostic server to startup at this point
once we're able to start the DS server, make it log a ping to the console
Clean up some of the old WIP code - we will probably not send
configuration strings from the diagnostic server back to the main thread.
It doesn't work right now because the MessagePort is not created until
the server thread attaches to Mono, which doesn't happen because it's
started before Mono.

Also it doesn't yet send a resume event, so the main thread just
blocks forever
Start the diagnostic server and have it perform the open/advertise
steps with the mock.
we're not going ot need a JS version of the DS EP streaming sessions.
And file-based auto-stop sessions are not going in
@lambdageek
Copy link
Member Author

It would be good to have JavaScript-only unit tests for complex components at some point. https://jestjs.io/ is good choice I think.

I'll play around with that this afternoon. Probably won't be part of this PR unless it turns out to be completely trivial to add the unit tests to our setup.

lambdageek and others added 3 commits July 22, 2022 09:22
Allow each test project to specify its own mock script.

Also provide TypeScript declarations for the mocking interfaces

Also always use binary protocol commands - don't send json for mocks.
@lambdageek
Copy link
Member Author

verified one more time that with threading or perftracing disabled, all of the related JS is removed by tree-shaking

- `shared/` type definitions to be shared between the worker and browser main thread
- `mock/` a utility to fake WebSocket connectings by playing back a script. Used for prototyping the diagnostic server without hooking up to a real WebSocket.

## Mocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it we are mocking and why ? (high level intro)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand that the diagnosed process is a "server" and the tooling outside of the browser is a "client" ? And so, we are mocking the client here and it's requests ?

if (monoDiagnosticsMock) {
const mockPrefix = "mock:";
const scriptURL = mockURL.substring(mockPrefix.length);
return import(scriptURL).then((mockModule) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CJS version of the runtime is out of scope, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, if there's a need for writing the mocking script using CJS, we can add it here. But since the goal here is just to have a tool to improve the server's robustness (for example: make scripts that send garbage and verify that the DS can recover), I don't think there's a need for CJS. This isn't something we're ever going to ship. It's just for better testing.

if (!cwraps.mono_wasm_event_pipe_enable(tracePath, ipcStreamAddr, options.bufferSizeInMB, options.providers, options.rundownRequested, sessionIdOutPtr)) {
return false;
} else {
return memory.getI32(sessionIdOutPtr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getU32 for pointers

for (const sub of subscribers) {
const listener = sub.listener;
if (sub.oneShot) {
this.removeEventListener(event.type, listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need copy of subscribers because of the change in the loop ?

/// });
export const currentWorkerThreadEvents: WorkerThreadEventTarget =
MonoWasmThreads ? new EventTarget() : null as any as WorkerThreadEventTarget; // treeshake if threads are disabled

function monoDedicatedChannelMessageFromMainToWorker(event: MessageEvent<string>): void {
console.debug("got message from main on the dedicated channel", event.data);
console.debug("MONO_WASM: got message from main on the dedicated channel", event.data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we merge both this and my PR, we could prefix this with if (runtimeHelpers.diagnostic_tracing) so that by default the runtime is not noisy. It matters in console like nodeJS. @maraf suggested it's about a time when we introduce logging component to centralize all this logic to one place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Member

@pavelsavara pavelsavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, none of my other comments are blocking merge.

- improve diagnostics mock README
- note that mocking just uses ES6 modules, testing with CJS is not
   supported right now.
- fix iteration over listeners when dispatching a one-shot event in
  the EventTargt polyfill
- use U32 getter in EP session creation
@lambdageek
Copy link
Member Author

https://helix.dot.net/api/2019-06-17/jobs/d29bfb53-4790-47f4-8a1e-aa868ec7afd5/workitems/System.Net.Mail.Functional.Tests/console failure is unrelated

/__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

@lambdageek lambdageek merged commit 7aaa279 into dotnet:main Jul 25, 2022
@pavelsavara
Copy link
Member

/azp run runtime-wasm

@ghost ghost locked as resolved and limited conversation to collaborators Aug 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Tracing-mono
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[wasm-ep] Add a hook to start an EventPipe Session from runtime startup
3 participants