Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deadlock in balance shutdown #14125

Merged
merged 2 commits into from
Aug 15, 2024
Merged

fix: deadlock in balance shutdown #14125

merged 2 commits into from
Aug 15, 2024

Conversation

bukata-sa
Copy link
Contributor

@bukata-sa bukata-sa commented Aug 14, 2024

What

Calling bm.sleeperTask.IfStarted -> bm.sleeperTask.WakeUp uses two read locks, while bm.sleeperTask.WakeUpIfStarted uses only one read lock

Why

From time to time balancer monitor can't shutdown. Because there can be exclusive lock (taken on service.Stop) between these two read locks.
Imagine: (g1 and g2 - goroutines)
g1:OnNewLongestChain -> g1:bm.sleeperTask.IfStarted (RLock1 taken) -> g2:bm.sleeperTask.Stop (WLock tried to take, but blocked, waiting for RLock1) -> g1:bm.sleeperTask.WakeUp (RLock2 tried to be taken, but blocked because have pending WLock) => deadlock (RLock2 -(waits)> WLock -(waits)> RLock1)

Extra I had no idea before, but if you have RLock, then pending WLock, then you can't have new RLocks

This test fails

func TestStrangeRWMutex(t *testing.T) {
	mtx := sync.RWMutex{}
	mtx.RLock()
	go func() {
		fmt.Println("before write lock")
		mtx.Lock()
		assert.Fail(t, "should not have been called")
	}()
	time.Sleep(time.Second)
	fmt.Println("before second read lock")
	// mtx.Lock() - uncomment this and test will stuck with deadlock like in balanceMonitor
	assert.True(t, mtx.TryRLock())
}

Requires Dependencies

smartcontractkit/chainlink-common#706

Resolves Dependencies

#13647

jmank88
jmank88 previously approved these changes Aug 14, 2024
@bukata-sa bukata-sa requested a review from a team August 15, 2024 08:53
george-dorin
george-dorin previously approved these changes Aug 15, 2024
@bukata-sa bukata-sa dismissed stale reviews from george-dorin and jmank88 via 695c2e0 August 15, 2024 09:11
@bukata-sa bukata-sa force-pushed the fix/balance_deadlock branch 2 times, most recently from 695c2e0 to 9d774b6 Compare August 15, 2024 09:11
@bukata-sa bukata-sa added this pull request to the merge queue Aug 15, 2024
Merged via the queue into develop with commit 8fa8c3a Aug 15, 2024
138 of 140 checks passed
@bukata-sa bukata-sa deleted the fix/balance_deadlock branch August 15, 2024 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants