Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 and broken link handling #895

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pmelab
Copy link
Contributor

@pmelab pmelab commented Sep 19, 2024

The problem

Links that point to non-existent paths will put the router into a faliling state when trying to loading the RSC payload of the target page and throw an exception that is (by default) caught by the global error boundary.

This approach creates some problems for content-focused use cases:

  • The 404 page can not be a full react page that is managed like all the others.
  • It's not possible to hand of 404 handling to a different system (e.g. redirects stored in a CMS or CDN).

Solution

I changed the Link component to optionally wait for the prefetch to resolve before navigating and if unsuccessful trigger a native navigation to the same path to trigger the host system 404 behavior.

Based on a discussion with @dai-shi , I attempted to achieve the same with a custom error boundary, but the error boundary gets rendered and there is a flash of the error message before the actual navigation happens, which feels like the website is broken.

Note

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly: 4ca39f7

Otherwise the error boundary itself will throw "In HTML, <h1> cannot be
a child of <#document>. This will cause a hydration error." which will
result in an infinite re-rendering loop.
If the rsc payload for a link is not available, fall back to a standard
http navigation, to trigger host system error handling.
Copy link

vercel bot commented Sep 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
waku ✅ Ready (Inspect) Visit Preview Sep 19, 2024 3:10pm

Copy link

codesandbox-ci bot commented Sep 19, 2024

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@dai-shi
Copy link
Owner

dai-shi commented Sep 19, 2024

Hi, thanks for the investigation and the suggestion!

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly

Yeah, it's not surprising because this feature isn't very stable yet. Thanks for the test and the fix!

@pmelab
Copy link
Contributor Author

pmelab commented Sep 20, 2024

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

I will try my best! Wall of text incoming ...

Aside from prefetching (which I agree is an additional feature), all the Waku Link component does is change the route. Then the router tries to render the route, and if fetching the RSC payload fails, it will trigger the error boundary.

In our concrete case, the content management system maintains a list of redirects for pages that changed their slug over time. Links to those pages are not necessarily immediately updated in all locations, but just redirect to the targets new slug.

That means, a missing RSC payload is an expected case that we need to handle before the actual route change happens. It should not emit an error state to the user, but fall back to the CMS' redirect handler. We are working in a purely static environment, so we can't use a middleware to catch that.

Prefetch is part of the solution because it can tell us in advance if a route exists or not, and the Link component can simply trigger a standard navigation and the host system can decide on a higher level what to do. That's one of the things that still work well in Gatsby right now, and I took the solution from there 😄

And there is the other nice side effect that the 404 page can just be a regular page that is handled by Vercel/Cloudflare/Netlify/Nginx/Apache ... 's built-in fallback mechanisms.

Using an error boundary would be more elegant, but it will always unmount the whole page before doing anything else. Maybe this could be handled on a router level, but that means it would apply to every link, which felt even more invasive.

@dai-shi
Copy link
Owner

dai-shi commented Sep 20, 2024

So, does your solution require prefetching always? It feels bad in two ways: a) prefetch should be optional, b) prefetch shouldn't be awaited. (otherwise, we don't get the benefit of prefetching.)

I think we need a solution without prefetching.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 20, 2024

Yes, it always has to prefetch. I tried to trigger the reload right where the exception is thrown:


... but this is too late. The error boundary gets rendered already, since the new route is already being evaluated.

I think without prefetching, we would have to make the router remember its previous state and render that in case the RSC payload returns a 404. And then trigger the page reload.

@dai-shi
Copy link
Owner

dai-shi commented Sep 20, 2024

Yeah, I think that's the right solution. It's tedious though. Awaiting prefetch isn't a prefetching any longer, so it's not an option.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 20, 2024

Do you have any pointers where or how to do that? I tried to track the application flow, and from what i can tell the promise returned from fetchRSC gets passed as elements here:

return createElement(
RefetchContext.Provider,
{ value: refetch },
createElement(ElementsContext.Provider, { value: elements }, children),
);

So the error will always be thrown during rendering, and we are back at the error boundary. Maybe we could store the previous result and and make the error boundary render the whole previous page, but this would still mean a full re-render and probably ui elements re-initializing and flashing.

@dai-shi
Copy link
Owner

dai-shi commented Sep 20, 2024

I need to sit down and think about it more.

As I understand, this is about client side routing, so location.href = ... shouldn't be used, basically.

Meanwhile, would you send a separate PR for 4ca39f7?

@pmelab
Copy link
Contributor Author

pmelab commented Sep 21, 2024

There you go: #896

@pmelab
Copy link
Contributor Author

pmelab commented Sep 21, 2024

Note

Brainstorming different angles to the problem

Frontend-redirects

Add all possible redirects to the js-bundle.

✅ Independent of hosting environment
✅ No visual break
🛑 List can get long and inflates js-payload

Redirect pages

Render an actual page for each redirect that does a client side redirect on load.

✅ Independent of hosting environment
✅ No changes to waku necessary
🛑 Causes visual break, but less than a full page unmount
🛑 Might produce a lot of files. Some environments (e.g. Cloudflare pages) have a hard limit.

Redirect RSC payloads

When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.

🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

createRedirect

New createRedirect API that places a special RSC payload at that path that causes waku to load a different RSC file instead.

✅ Independent of hosting environment
✅ No visual break
🛑 More files, but at least less than "Redirect pages"

@pmelab
Copy link
Contributor Author

pmelab commented Sep 23, 2024

So i had a longer thought about this, and turns out that Option 3 of the brainstorming solves everything:

Redirect RSC payloads
When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.
🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

Nothing in Waku has to change. As an example, I use these Netlify redirects:

/a /b 301
/RSC/a.txt /RSC/b.txt 301

When a Link component now points to /a and the rendering process tries to load /RSC/a.txt, the server responds with the contents of /RSC/b.txt and the correct page is displayed. In a Gatsby environment that would have failed, due to potentially missing client code. But since the RSC payload also knows which js-bundles are required, it just works 🤯

Adding this rewrite...

/RSC/* /RSC/404.txt

... even (almost) solves the 404 problem. When clicking a broken link, the server will respond with the RSC-Payload of the 404 page. It is also correctly displayed, but Waku also replaces the current browser location with /404. Thats great for any other pages, but here we actually don't want that.
@dai-shi : do you know where this happens and if we can add special handling for 404 somehow?

@dai-shi
Copy link
Owner

dai-shi commented Sep 24, 2024

Thanks for your time thinking about this.
I really like that Waku is becoming a collaborative project.

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem. (like, is it only a hosting issue?)
From my perspective, I can provide some requirements:

  • We have SSR, but we can opt-out. So, the solution should work without SSR.
  • We support static sites (with/without SSR). It's a primary use case.

Can the problem happen with static sites and/or without SSR?

@pmelab
Copy link
Contributor Author

pmelab commented Sep 24, 2024

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem.

The problem has shifted through the course of this PR. I'll try to summarize.

Initial Situation:

Requirements:

  • We work in a purely static environment, without a JS server runtime. No SSR and no middleware either. Simplest case would be dist/public served by NGINX for example.
  • 404 error pages are regular pages, maintained in a CMS.
  • There is an externally maintained list of path redirects (e.g. path /a redirects visitors to page /b) that is already applied by the HTTP server.

Problems:

  • When navigating with Waku's Link component, the HTTP server has no chance to apply it's built-in redirects or 404 fallbacks.
    • Waku Link's that point to non-existing paths don't show the 404 page, but the error boundary message.
    • Waku Links that point to redirected pages don't redirect but show the error boundary message as well.
  • This can be reproduced both in SSR and SSG, but in SSR it's easier to solve, since redirect and fallback logic can be applied at time of the request.

After the findings from my "brainstorming"

  • ✅ Adding the same redirects for the RSC directory (e.g. /RSC/a.txt to /RSC/b.txt) to the HTTP server makes Link correctly follow redirects.
  • ✅ Making the HTTP server fall back to /RSC/404.txt for all requests that ask for something in /RSC/* that does not exist, cause Link with broken paths to display the correct custom 404-page.
    • 🛑 .... but it also changes the window.location to /404. It should remain the non-existent path that the link pointed to instead.

Note

The Code in this PR does not reflect its state of discussion any more. I was able to solve everything except that last point outside of Waku.

@dai-shi
Copy link
Owner

dai-shi commented Sep 24, 2024

Thanks for the summary. That helps my understanding.
So, my mental model is that server routing and client routing are different things.

  • Server routing is done by the HTTP server (NGINX, Waku server, or something). It fetches HTML (and RSC).
  • Client routing is basically <Link> with JS enabled. It fetches RSC only.

404.html is basically only for server routing.
404.tsx with waku/router is also basically for server routing (but can it be used for client routing? that would be nice.)

e.g. /RSC/a.txt to /RSC/b.txt

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.


So, I feel like there should be something that waku/router could do.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 24, 2024

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.

Yes, thats a flaw here.

So, I feel like there should be something that waku/router could do.

It could handle 404 responses when trying to fetch RSC:

  • Test the corresponding "real" path to see if it emits a redirect (from NGINX) and attempt to load the RSC for the redirect target path instead. That would mitigate the dependency on internals above.
  • Fall back to loading the RSC for 404.tsx if available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants