404 and broken link handling #895

pmelab · 2024-09-19T14:49:20Z

The problem

Links that point to non-existent paths will put the router into a faliling state when trying to loading the RSC payload of the target page and throw an exception that is (by default) caught by the global error boundary.

This approach creates some problems for content-focused use cases:

The 404 page can not be a full react page that is managed like all the others.
It's not possible to hand of 404 handling to a different system (e.g. redirects stored in a CMS or CDN).

Solution

I changed the Link component to optionally wait for the prefetch to resolve before navigating and if unsuccessful trigger a native navigation to the same path to trigger the host system 404 behavior.

Based on a discussion with @dai-shi , I attempted to achieve the same with a custom error boundary, but the error boundary gets rendered and there is a flash of the error message before the actual navigation happens, which feels like the website is broken.

Note

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly: 4ca39f7

Otherwise the error boundary itself will throw "In HTML, <h1> cannot be a child of <#document>. This will cause a hydration error." which will result in an infinite re-rendering loop.

If the rsc payload for a link is not available, fall back to a standard http navigation, to trigger host system error handling.

vercel · 2024-09-19T14:49:24Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
waku	✅ Ready (Inspect)	Visit Preview	Sep 19, 2024 3:10pm

codesandbox-ci · 2024-09-19T14:49:43Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

dai-shi · 2024-09-19T23:55:13Z

Hi, thanks for the investigation and the suggestion!

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly

Yeah, it's not surprising because this feature isn't very stable yet. Thanks for the test and the fix!

pmelab · 2024-09-20T02:50:45Z

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

I will try my best! Wall of text incoming ...

Aside from prefetching (which I agree is an additional feature), all the Waku Link component does is change the route. Then the router tries to render the route, and if fetching the RSC payload fails, it will trigger the error boundary.

In our concrete case, the content management system maintains a list of redirects for pages that changed their slug over time. Links to those pages are not necessarily immediately updated in all locations, but just redirect to the targets new slug.

That means, a missing RSC payload is an expected case that we need to handle before the actual route change happens. It should not emit an error state to the user, but fall back to the CMS' redirect handler. We are working in a purely static environment, so we can't use a middleware to catch that.

Prefetch is part of the solution because it can tell us in advance if a route exists or not, and the Link component can simply trigger a standard navigation and the host system can decide on a higher level what to do. That's one of the things that still work well in Gatsby right now, and I took the solution from there 😄

And there is the other nice side effect that the 404 page can just be a regular page that is handled by Vercel/Cloudflare/Netlify/Nginx/Apache ... 's built-in fallback mechanisms.

Using an error boundary would be more elegant, but it will always unmount the whole page before doing anything else. Maybe this could be handled on a router level, but that means it would apply to every link, which felt even more invasive.

dai-shi · 2024-09-20T04:26:26Z

So, does your solution require prefetching always? It feels bad in two ways: a) prefetch should be optional, b) prefetch shouldn't be awaited. (otherwise, we don't get the benefit of prefetching.)

I think we need a solution without prefetching.

pmelab · 2024-09-20T06:02:24Z

Yes, it always has to prefetch. I tried to trigger the reload right where the exception is thrown:

waku/packages/waku/src/client.ts

Line 38 in 58d957c

throw err;

... but this is too late. The error boundary gets rendered already, since the new route is already being evaluated.

I think without prefetching, we would have to make the router remember its previous state and render that in case the RSC payload returns a 404. And then trigger the page reload.

dai-shi · 2024-09-20T06:07:51Z

Yeah, I think that's the right solution. It's tedious though. Awaiting prefetch isn't a prefetching any longer, so it's not an option.

pmelab · 2024-09-20T07:43:19Z

Do you have any pointers where or how to do that? I tried to track the application flow, and from what i can tell the promise returned from fetchRSC gets passed as elements here:

waku/packages/waku/src/client.ts

Lines 205 to 209 in 58d957c

    
           return createElement( 
        
             RefetchContext.Provider, 
        
             { value: refetch }, 
        
             createElement(ElementsContext.Provider, { value: elements }, children), 
        
           );

So the error will always be thrown during rendering, and we are back at the error boundary. Maybe we could store the previous result and and make the error boundary render the whole previous page, but this would still mean a full re-render and probably ui elements re-initializing and flashing.

dai-shi · 2024-09-20T14:15:20Z

I need to sit down and think about it more.

As I understand, this is about client side routing, so location.href = ... shouldn't be used, basically.

Meanwhile, would you send a separate PR for 4ca39f7?

pmelab · 2024-09-21T05:27:52Z

There you go: #896

pmelab · 2024-09-21T06:11:49Z

Note

Brainstorming different angles to the problem

Frontend-redirects

Add all possible redirects to the js-bundle.

✅ Independent of hosting environment
✅ No visual break
🛑 List can get long and inflates js-payload

Redirect pages

Render an actual page for each redirect that does a client side redirect on load.

✅ Independent of hosting environment
✅ No changes to waku necessary
🛑 Causes visual break, but less than a full page unmount
🛑 Might produce a lot of files. Some environments (e.g. Cloudflare pages) have a hard limit.

Redirect RSC payloads

When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.

🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

`createRedirect`

New createRedirect API that places a special RSC payload at that path that causes waku to load a different RSC file instead.

✅ Independent of hosting environment
✅ No visual break
🛑 More files, but at least less than "Redirect pages"

pmelab · 2024-09-23T08:33:58Z

So i had a longer thought about this, and turns out that Option 3 of the brainstorming solves everything:

Redirect RSC payloads
When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.
🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

Nothing in Waku has to change. As an example, I use these Netlify redirects:

/a /b 301
/RSC/a.txt /RSC/b.txt 301

When a Link component now points to /a and the rendering process tries to load /RSC/a.txt, the server responds with the contents of /RSC/b.txt and the correct page is displayed. In a Gatsby environment that would have failed, due to potentially missing client code. But since the RSC payload also knows which js-bundles are required, it just works 🤯

Adding this rewrite...

/RSC/* /RSC/404.txt

... even (almost) solves the 404 problem. When clicking a broken link, the server will respond with the RSC-Payload of the 404 page. It is also correctly displayed, but Waku also replaces the current browser location with /404. Thats great for any other pages, but here we actually don't want that.
@dai-shi : do you know where this happens and if we can add special handling for 404 somehow?

dai-shi · 2024-09-24T12:24:33Z

Thanks for your time thinking about this.
I really like that Waku is becoming a collaborative project.

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem. (like, is it only a hosting issue?)
From my perspective, I can provide some requirements:

We have SSR, but we can opt-out. So, the solution should work without SSR.
We support static sites (with/without SSR). It's a primary use case.

Can the problem happen with static sites and/or without SSR?

pmelab · 2024-09-24T13:08:54Z

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem.

The problem has shifted through the course of this PR. I'll try to summarize.

Initial Situation:

Requirements:

We work in a purely static environment, without a JS server runtime. No SSR and no middleware either. Simplest case would be dist/public served by NGINX for example.
404 error pages are regular pages, maintained in a CMS.
There is an externally maintained list of path redirects (e.g. path /a redirects visitors to page /b) that is already applied by the HTTP server.

Problems:

When navigating with Waku's Link component, the HTTP server has no chance to apply it's built-in redirects or 404 fallbacks.
- Waku Link's that point to non-existing paths don't show the 404 page, but the error boundary message.
- Waku Links that point to redirected pages don't redirect but show the error boundary message as well.
This can be reproduced both in SSR and SSG, but in SSR it's easier to solve, since redirect and fallback logic can be applied at time of the request.

After the findings from my "brainstorming"

✅ Adding the same redirects for the RSC directory (e.g. /RSC/a.txt to /RSC/b.txt) to the HTTP server makes Link correctly follow redirects.
✅ Making the HTTP server fall back to /RSC/404.txt for all requests that ask for something in /RSC/* that does not exist, cause Link with broken paths to display the correct custom 404-page.
- 🛑 .... but it also changes the window.location to /404. It should remain the non-existent path that the link pointed to instead.

Note

The Code in this PR does not reflect its state of discussion any more. I was able to solve everything except that last point outside of Waku.

dai-shi · 2024-09-24T13:36:10Z

Thanks for the summary. That helps my understanding.
So, my mental model is that server routing and client routing are different things.

Server routing is done by the HTTP server (NGINX, Waku server, or something). It fetches HTML (and RSC).
Client routing is basically <Link> with JS enabled. It fetches RSC only.

404.html is basically only for server routing.
404.tsx with waku/router is also basically for server routing (but can it be used for client routing? that would be nice.)

e.g. /RSC/a.txt to /RSC/b.txt

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.

So, I feel like there should be something that waku/router could do.

pmelab · 2024-09-24T14:51:27Z

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.

Yes, thats a flaw here.

So, I feel like there should be something that waku/router could do.

It could handle 404 responses when trying to fetch RSC:

Test the corresponding "real" path to see if it emits a redirect (from NGINX) and attempt to load the RSC for the redirect target path instead. That would mitigate the dependency on internals above.
Fall back to loading the RSC for 404.tsx if available.

pmelab added 3 commits September 19, 2024 16:07

test: add fixture for testing broken links

1dd41ad

fix: add html and body tags to global error boundary

4ca39f7

Otherwise the error boundary itself will throw "In HTML, <h1> cannot be a child of <#document>. This will cause a hydration error." which will result in an infinite re-rendering loop.

feat: optional fallback behavior for links

869026d

If the rsc payload for a link is not available, fall back to a standard http navigation, to trigger host system error handling.

vercel bot deployed to Preview September 19, 2024 14:50 View deployment

chore: prettier

8e687b1

vercel bot deployed to Preview September 19, 2024 15:10 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 and broken link handling #895

404 and broken link handling #895

pmelab commented Sep 19, 2024 •

edited

Loading

vercel bot commented Sep 19, 2024 •

edited

Loading

codesandbox-ci bot commented Sep 19, 2024 •

edited

Loading

dai-shi commented Sep 19, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 21, 2024

pmelab commented Sep 21, 2024

pmelab commented Sep 23, 2024

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 •

edited

Loading

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 •

edited

Loading

404 and broken link handling #895

Are you sure you want to change the base?

404 and broken link handling #895

Conversation

pmelab commented Sep 19, 2024 • edited Loading

The problem

Solution

Note

vercel bot commented Sep 19, 2024 • edited Loading

codesandbox-ci bot commented Sep 19, 2024 • edited Loading

dai-shi commented Sep 19, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 21, 2024

pmelab commented Sep 21, 2024

Frontend-redirects

Redirect pages

Redirect RSC payloads

createRedirect

pmelab commented Sep 23, 2024

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 • edited Loading

Initial Situation:

After the findings from my "brainstorming"

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 • edited Loading

pmelab commented Sep 19, 2024 •

edited

Loading

vercel bot commented Sep 19, 2024 •

edited

Loading

codesandbox-ci bot commented Sep 19, 2024 •

edited

Loading

`createRedirect`

pmelab commented Sep 24, 2024 •

edited

Loading

pmelab commented Sep 24, 2024 •

edited

Loading