-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status page #599
Comments
Hi @MortenHofft As I just put up a status page for the museum and the hosted portal is included in it, I would be interested how this develops. Currently, I have a simple ping check to the hosted portal homepage, and then I also check https://www.gbif.org/api/health to see if any of the services are reporting A few weeks ago, there was an issue with the registry service that caused the dataset summary pages to not load properly. On our end, this could have been detected via either monitoring for a keyword on one of the dataset pages (could be "occurrence" for example), or as it actually occurred, I posted a quick explanation and then referred to gbif.org/health . If there was a hosted portal specific status page, how would this differ from what is available from the current system health page? Just curious as I can perhaps improve some checks! I also see that the issue referenced in this issue is about us! I was not aware that the user submitted the report, but I did create the status page over the past week to address some emails that mentioned the outage. I have linked our status page at the bottom of our portal, and begun to communicate to our users where they can find it and what to expect from it etc. |
I would like to remove this internal endpoint I haven't really decided what to do I must admit. But I imagine it would do the same as what you see on gbif.org/health . How would you like a status page to be integrated on hosted portals? |
Possible ideas: Setting up an endpoint for services that returns either HTTP status codes, or contains keywords upon a GET request.
or
For the hosted portal, perhaps the following services are relevant
The service providing the api endpoint should mimic the action that the hosted portals are doing. For example, if the dataset summary pages has to make 4 requests to some service and call for 10 react components, the check should also try to perform those same actions and receive a non-error code for each. Then individual countries/nodes can setup monitors to send a ping to those endpoints, and the idea below could also pull from them. Replacing the error message with a component that pulls in information from an endpointCurrently the error message is purely technical, however it may be possible to load a "error/status component" in the event that the initial component fails. This is similar to how 404 pages will often give some brief information to the user, and then display some simple navigation to get them back on track. The risk is that if the react components and the status component share a common infrastructure, then the failure of one to be delivered will likely result in the failure of the error message to be delivered as well. It doesn't have to be a complex react component, just has to dynamically load the values from calling out to an api endpoint. Having a monitor for the web server itself would not be beneficial in this case, as if the entire webserver is down, the user can never see this error message. I put the sentence about the refresh in, because currently it will solve the most common error of the components not loading (see #594 and #429). Flow: Other thoughts
|
Both gbif.org and hosted portals need a way to display the status of our services.
Currently hosted portal users report more outages than gbif.org users. Presumably because we do not indicate that we have an outage.
Similar when pages fail, we could tell the user that we are in facet having infrastructure issues at the moment.
Once again I think we should explore the option to have this status exposed via an extrneral statuspage like statuspage.io
The text was updated successfully, but these errors were encountered: