Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: fix transaction event asynchronicity #16843

Merged
merged 1 commit into from
May 30, 2018

Conversation

karalabe
Copy link
Member

Fixes #16840.

#16720 reworked transaction event raises and accidentally removed a goroutine when the transaction pool promoted a transaction (old vs new). This unfortunately means that the transaction pool lock is kept held while the event is being processed. If a subsystem is however subscribed to transaction events and at the same time waiting for the transaction pool lock, this synchronicity will lock the process up. This PR fixes the oversight.

@karalabe karalabe requested a review from holiman as a code owner May 30, 2018 07:20
@karalabe karalabe added this to the 1.8.10 milestone May 30, 2018
@karalabe karalabe merged commit ca34e82 into ethereum:master May 30, 2018
quocneo referenced this pull request in haloplatform/go-haloplatform Jul 19, 2018
@fadeAce
Copy link

fadeAce commented Feb 25, 2019

@karalabe did you mean by pool.txFeed.Send(NewTxsEvent{promoted}) , whoever subscribed NewTxsEvent would block the Send itself till it's process done ? Literally I find "// TrySend attempts to send x on the channel v but will not block." at reflect/value.go TrySend():1733 noting it's not a blocked sending when invoking feed.send()

@hadv
Copy link
Contributor

hadv commented Feb 25, 2019

@karalabe did you mean by pool.txFeed.Send(NewTxsEvent{promoted}) , whoever subscribed NewTxsEvent would block the Send itself till it's process done ? Literally I find "// TrySend attempts to send x on the channel v but will not block." at reflect/value.go TrySend():1733 noting it's not a blocked sending when invoking feed.send()

seems this change make goroutine leak as reported in #17450 (comment)

@fadeAce
Copy link

fadeAce commented Feb 25, 2019

@hadv I got you , but I'm in a mess slightly here by karalabe saying that "If a subsystem is however subscribed to transaction events and at the same time waiting for the transaction pool lock, this synchronicity will lock the process up". It's a non-block invoke by pool.txFeed.Send() , because msgs sent by cases[i].Chan.TrySend(rvalue) and the feeder won't wait a slow consumer , instead return false , then it's quoted in loop till succeeded .

	for {
		// Fast path: try sending without blocking before adding to the select set.
		// This should usually succeed if subscribers are fast enough and have free
		// buffer space.
		for i := firstSubSendCase; i < len(cases); i++ {
			if cases[i].Chan.TrySend(rvalue) {
				nsent++
				cases = cases.deactivate(i)
				i--
			}
		}
		if len(cases) == firstSubSendCase {
			break
		}
		// Select on all the receivers, waiting for them to unblock.
		chosen, recv, _ := reflect.Select(cases)
		if chosen == 0 /* <-f.removeSub */ {
			index := f.sendCases.find(recv.Interface())
			f.sendCases = f.sendCases.delete(index)
			if index >= 0 && index < len(cases) {
				// Shrink 'cases' too because the removed case was still active.
				cases = f.sendCases[:len(cases)-1]
			}
		} else {
			cases = cases.deactivate(chosen)
			nsent++
		}
	}

Extremely , if consumers subscribed the Feed go to ask for pool again , they just block themselves , I mean , it shouldn't occur a dead lock .
Having scanned all SubscribeNewTxsEvent() invokers , there's no consumer using the msg subscribed to do sth with pool again . So, my point is that , maybe it is not a problem in eth protocol in using pool.txFeed.Send with out open a new goroutine now.

@hadv
Copy link
Contributor

hadv commented Feb 26, 2019

because msgs sent by cases[i].Chan.TrySend(rvalue) and the feeder won't wait a slow consumer , instead return false , then it's quoted in loop till succeeded .

I think that the return false will make goroutine leak because it never reach the code to unlock the sync f.sendLock <- struct{}{}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Geth 1.8.9 stopped responding
4 participants