Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: crash report extension #16598

Closed
2 tasks
atoulme opened this issue Dec 3, 2022 · 10 comments
Closed
2 tasks

New component: crash report extension #16598

atoulme opened this issue Dec 3, 2022 · 10 comments
Labels

Comments

@atoulme
Copy link
Contributor

atoulme commented Dec 3, 2022

The purpose and use-cases of the new component

When the collector crashes, it prints information to stdout and stderr which might be useful to capture.

the crash report extension would consist in a module recovering from panics, sending data to a remote location, and then exiting as it does now.

Example configuration for the component

extensions:
  crashreport:
    endpoint: http://www.example.com

Additionally, this extension can be enabled through a cli parameter such as —crashreport=http://www.example.com

Telemetry data types supported

None.

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

@atoulme atoulme added the needs triage New item requiring triage label Dec 3, 2022
@dmitryax
Copy link
Member

dmitryax commented Dec 5, 2022

How the reporting protocol will be implemented? Is there some standardized protocol for crash reporting or each endpoint provider will be added separately added as a "plugin" to the extension?

Additionally, this extension can be enabled through a cli parameter such as —crashreport=http://www.example.com/

Why is this needed? We've been going away from CLI arguments and migrated all the cli args to the config

@atoulme
Copy link
Contributor Author

atoulme commented Dec 5, 2022

I'm not aware of a standardized crash report protocol. My intent is to send a report of the error as recovered from a panic, printed out as a block of text. We can iterate over this design as we learn more.

In my mind, this crash report tool specifically addresses panics using a recover listener registered in a goroutine on start of the extension. See https://gobyexample.com/recover for context.

The only reason you'd want CLI is if the crash happens during the config load. I think we can leave that out of scope for now and look into it again. Maybe CLI is not the right approach, env vars, whatever makes the most sense. But maybe for now keep the scope small.

@dmitryax
Copy link
Member

dmitryax commented Dec 5, 2022

Where would you like to send the crash reports initially? Will be only endpoint config option enough? I assume some authentication would be required as well

@atoulme
Copy link
Contributor Author

atoulme commented Dec 5, 2022

This is very early, and authentication would make sense. Initially, I thought I'd allow sending to a public HTTP server accepting POST requests and leave it that simple for now. The HTTP server would need to build some level of protection.

If we add auth, we could target a Splunk HEC raw endpoint to ingest data next.

Again, I'm proceeding with very small steps and with as small a scope as possible so others get a chance to chime in, offer better solutions. I know there are crash report solutions out there - a quick google turns up vendors like backtrace.io, or see this article with quite a few options: https://raygun.com/blog/best-ios-crash-reporting-tools/

@atoulme
Copy link
Contributor Author

atoulme commented Dec 21, 2022

Now that I dug more into the utilities offered by the collector helpers, I think we can get a http client using HTTPClientSettings to get a client that supports auth, proxies and anything else we want to have supported. This will allow us to set authentication.

@mx-psi
Copy link
Member

mx-psi commented Jan 11, 2023

Who is the sponsor of this component?

@atoulme
Copy link
Contributor Author

atoulme commented Jan 11, 2023

It could be you! There’s no sponsor yet :)

@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants