Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offline support #137

Closed
wants to merge 3 commits into from
Closed

Conversation

dmonad
Copy link
Contributor

@dmonad dmonad commented Jun 4, 2021

This PR adds a service worker to the lab folder. It will intercept all requests to the server and store the results in a cache. The next time you load the website, the content will be loaded from the cache instead from the server (while updating the cache in the background). If there is no internet access, the /lab address should still work.

Note that this might make it harder to debug because it will serve stale content by default. When debugging the application (e.g. in dev_mode on localhost) you should enable "bypass for network" in the debugging application panel.

Maybe we could disable service worker for localhost alltogether.

@bollwyvl
Copy link
Collaborator

bollwyvl commented Jun 4, 2021

😻 This is a huge step forward, and the reload times are fantastic with e.g. pyodide. Works as advertised when turning off wifi! We'd probably want to enable even more goodies, e.g. have a pyodide kernel shared between multiple tabs.

But indeed, the various deployment gotchas are very real, so it needs to be easy to turn off at build time for someone who knows they will be deploying someplace not so fun, e.g. jupyter-lite.json#/jupyter-config-data/disableServiceWorker.

Also, I think before landing this, we'd also want to land #118 to get deduplicated, cache-busting assets, so at least first-party stuff doesn't have surprising stale experiences on the docs site, especially since we're tracking upstream alphas now.

As we want this to work on every page, and likely know whether we are in a service worker, perhaps we move all this to the config-utils.js, so it gets used on both the full lab and the various retro pages, and set useServiceWorker dynamically, if not explicitly opted-out... if we can use it, we probably always should!

@dmonad
Copy link
Contributor Author

dmonad commented Jun 4, 2021

Yep, this all makes sense. Let's wait a bit until we merge this. I'll try to keep this up to date. I think it would be nice to also add a Manifest so that users can install this as a web-app. Maybe the service worker should only be active when this app is served as a web-app. This would be another method to circumvent some of the issues.

Btw, the documentation generated a working example of this:

https://jupyterlite--137.org.readthedocs.build/en/137/_static/lab/index.html

@bollwyvl
Copy link
Collaborator

bollwyvl commented Jun 4, 2021

documentation generated a working example

Oh yeah, tried it out immediately, did the ceremonial network-turn-off and everything! This really makes slightly-larger-than-trivial compute reasonable for a documentation site.

@dmonad
Copy link
Contributor Author

dmonad commented Jun 4, 2021

I added a manifest so you can install this as a web app now :)

Screenshot from 2021-06-04 16-22-33

@bollwyvl
Copy link
Collaborator

bollwyvl commented Jun 4, 2021

Very cool. We'll definitely want to hoist the icons, label, theme colors, etc. to something a user "build" (e.g. copy and add <5 json files or a notebook) can configure in #41.

there's already a schema 🎉

@bollwyvl
Copy link
Collaborator

So with #173, we've got some more structure in place for configuring how a site builds, etc. I left a placeholder for the serviceworker stuff, but wasn't sure how to proceed. It would be interesting to get main merged into this so we can start looking at options for moving forward.

@bollwyvl
Copy link
Collaborator

Another angle: the webpack docs point out workbox which seems to make some of the stuff a little more manageable over time. Not sure how this would play with our desire to be able to tweak things after the webpack build, but might still be interesting.

@martinRenou
Copy link
Member

I'd like to take over that PR and rebase it if it's fine with everyone.

I've been looking for the past working days at service workers and how to make use of their advantages from the Python kernel.

I'm not only interested in the caching logic they bring (for offline support etc), I'm also interested in the Python kernel being able to synchronously access data from the virtual file system (local storage). The Python kernel running in a web worker, it can make blocking HTTP requests that are intercepted by the service worker, and the service worker can answer with whatever data it finds. We could monkey patch the open global function with something that does exactly that.

@joemarshall
Copy link
Contributor

@martinRenou I was just thinking about this. I think the right approach for flexibility would be to register a very simple service worker which just allows the registration of URL handlers. So one could make a cache handler extension, a sleep extension, import extension.

It would need some way to identify which main browser thread owned webworker clients, I'm not sure if that can be done through fetch request detection in the service worker or if it needs a wrapper for webworker.

@joemarshall
Copy link
Contributor

Oh hang on, the client Id at least in Chrome appears to be for a whole session, ie. Main page and workers. That's easier than I thought

@joemarshall
Copy link
Contributor

I just checked, and it is trivial to associate client ids of web-workers to the clientid of the main window -

  1. In chrome - all fetches seem to come with the main window clientid
  2. In firefox, when you first fetch the webworker, the fetch has 'resultingClientId for the worker, which is a new client id. worker requests come with that client ID and the same url. So if you're in a web-worker, the client id of the main window is trivial to get.

So a basic URL handler API for pyodide or similar stuff where you essentially want synchronous calls to async javascript things (files, sleep etc.) would look like this:

  1. Serviceworker gets a POST request to a special URL with some content (as JSON or something)
  2. It makes this into a promise and postmessages it to the correct mainwindow (and returns a new unresolved promise to the fetch request)
  3. The main app converts that to a jupyter message and sends it wherever it needs to go.
  4. The handler extension makes a response and sends that back to the main app, which posts it back to the serviceworker, which then resolves the promise made in step 2.
  5. Tada, XML httprequest in webworker finishes, we're all good.

I think caching should maybe be handled in the main serviceworker for performance reasons. But the rest of the serviceworker should stay absolutely barebones - it just takes requests and turns them into jupyter messages if they are a special PUT request.

@martinRenou
Copy link
Member

Thanks for your comments @joemarshall !

My comments about monkey-patching the open global function in Python are invalidated now, so please discard them.

I'm exploring implementing a custom FileSystem (in the emscripten sense) that we mount for Python to use, and exposes the files of the current JupyterLab drive. The work is done in #655 (not all my code is pushed yet).
The problem is emscripten file systems must be synchronous (there is currently some work in emscripten to make those APIs async but it's not finished/released yet, see discussion in the PR mentioned above).

So I think using a service worker the way you're describing above is needed, in order to turn async file/directory fetches into synchronous tasks.

Serviceworker gets a POST request to a special URL with some content (as JSON or something)
It makes this into a promise and postmessages it to the correct mainwindow (and returns a new unresolved promise to the fetch request)
The main app converts that to a jupyter message and sends it wherever it needs to go.
The handler extension makes a response and sends that back to the main app, which posts it back to the serviceworker, which then resolves the promise made in step 2.
Tada, XML httprequest in webworker finishes, we're all good.

This makes perfect sense. I was reading this morning about BroadcastChannel which I think will be perfect for that:

  • it's bidirectional
  • you can have multiple channels (one for input, one for sleep, one for the file system etc)

@joemarshall
Copy link
Contributor

joemarshall commented Jun 3, 2022 via email

@jtpio jtpio closed this in #686 Jun 23, 2022
@jtpio
Copy link
Member

jtpio commented Jun 23, 2022

Thanks @dmonad for initially starting this PR!

The commits have been included in #686.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Gotta go fast
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants