-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(PanicHandling): Add Options.PanicHandler #2033
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for badger-docs canceled.
|
|
s.kv._go(func() { | ||
s.runCompactor(i, lc) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.kv
is an instance of DB, so I assumed it was safe to start the Goroutine using this instance. But I am not 100% sure this instance is directly derived from the DB instance the user is using, so tell me if this is a mistake.
This PR has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open. |
👎 |
✅ Deploy Preview for badger-docs canceled.
|
Problem
This PR fixes #1883.
The issue is that when a device running badger is abruptly powered off it might cause a database corruption, and subsequently badger will continuously panic from inside a separate Goroutine when trying to open the database.
Panicking from inside a Goroutine causes all other Goroutines to die with an error code, and the main isssue is that having a defer calling
recover()
on other Goroutines cannot prevent this from happening, making it extremely difficult to recover from such a failure if the affected developer has no control over the panicking goroutine.Solution
The solution proposed here is to create all Goroutines with a special method called
DB._go()
that will recover from the panic if the user provides a PanicHandler on the Options argument.If a user decides to use this feature he should make sure not to ignore the panic, but this would give a window for developers to work around this issue by for example deleting the offending database and creating a new one, or even just reporting an error to the appropriate API so a human can be notified of this issue.
The main complexity in this PR is the issue regarding the use of closure and Goroutines inside for loops:
https://go.dev/doc/faq#closures_and_goroutines
Before this PR these issues were handled by passing the arguments to the Goroutines as in:
Due to the way the
DB._go()
function was implemented (i.e. receiving a closure as argument) it is not possible anymore to trust this technique to make sure the data is copied and not passed as a reference to the closure, so instead I am doing it like this now wherever necessary: