Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service-bus] Make close() work when a connection is in process #8746

Conversation

richardpark-msft
Copy link
Member

@richardpark-msft richardpark-msft commented May 6, 2020

Currently it's possible for a user to call close() while a connection is in process. When this happens you can end up in a bad state where the object thinks it's closed but it still has open resources.

This commit changes it so:

  • _init() and close() now use the same lock to prevent them from overlapping in MessageSession and MessageReceiver
  • _init() respects an "is closed" flag.

Fixes #7986

…process wait for it to complete so we can properly close down the connection.
@ramya-rao-a
Copy link
Contributor

Can we pull this PR out of the draft stage?

@richardpark-msft
Copy link
Member Author

richardpark-msft commented May 7, 2020

@ramya-rao-a - I found some issues and I think I want to look at it with fresh eyes tomorrow morning:

  • BatchingReceiver works ok with the way I've got it now - as far as I can tell there isn't a way to start initializing a connection without having set it up in some way so close() blocks on it properly.
  • SessionReceiver also had an issue but the fix there was just to move the lock logic into the SessionReceiver and out of MessageSession.
  • StreamingReceiver doesn't work with how it's coded right now. StreamingReceiver.create is async so there's a chance the user can call close(), complete, and then we'll resume creating the streaming receiver which is now "uncloseable" again. We don't store the context.streamingReceiver in time for close() to even know that open() is happening. I can fix this by just moving the lock logic into Receiver, which is what I basically did for SessionReceiver.

As it is my tests are failing but I think the ideas make sense. Just need to take a longer look at it.

… - the new rules are:

- If you create a receiver it should be created and set before you do any async calls. This ensures that if the user calls close() it actually routes to the instance that is getting "initialized" so all calls are properly accounted for.
- All the locking logic remains in the lower level classes (messageSesssion or messageReceiver). The only things that change in the higher level Receiver/SessionReceiver classes is that they separate out the assignment of the internal field (this._context.<receiver> or this._messageSession) from it's initialization.
@richardpark-msft
Copy link
Member Author

@ramya-rao-a - okay, it looks correct now.

The basic pattern was:

  • When we create the underlying MessageReceiver/MessageSession make sure the outer object (Receiver/SessionReceiver) has a reference to it. Otherwise, close() doesn't actually get routed and all of our locking gets bypased, resulting in an orphaned receiver.
  • In the lock check inside to make sure that the user hasn't tried to close the object already and that the connection isn't open. There's a "connectionClosing" variable that we don't need to check anymore since the lock takes care of it.

@richardpark-msft richardpark-msft marked this pull request as ready for review May 7, 2020 22:08
@richardpark-msft
Copy link
Member Author

/azp run js - servicebus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

- If the result of the _init() for the streaming receiver is a useless receiver (ie, one that isn't open) we can just remove it from context immediately rather than keep it around.
- Don't overwrite the context.streamingReceiver. It's okay to call _init() multiple times on the same instance. This is just a redundant check since there is an "already receiving" check above it but...it's nice to have a simple check in there as well.
- If _init() is called on a closed MessageReceiver throw the standard "this receiver is closed" non-retryable error - it'll never be valid.
…clear the context.streamingReceiver value if it just comes back invalid" commit.
@richardpark-msft
Copy link
Member Author

/azp run js - servicebus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -146,6 +153,10 @@ export class Receiver {
return;
})
.catch((err) => {
if (this._context.streamingReceiver != null && !this._context.streamingReceiver.isOpen()) {
this._context.streamingReceiver = undefined;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What scenario does this cover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "init() failed and we have a streaming receiver that is not open/doesn't own any resources".

There was a test covering this (Streaming - Failed init should not cache recevier) that revealed this behavior which seems sensible. If a streaming receiver is just completely dead (ie, nothing to clean up) it's safe to just not cache it at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this change is because we are now caching first, initializing later?

A cached streaming receiver symbolizes a receiver that needs to be recovered when there is a connection issue. I am slightly concerned on the implications of this change in order on link recovery.

Take the case of init() failing and a connection issue happening at around the same time.
If the connection recovery parts of the code get executed before the catch block here, then this streaming receiver would be on its merry way to being recovered leading to something like #5541

Can we refactor so that we keep the init first, cache later as before, but still have the changes you want?

const sReceiver = StreamingReceiver.create(..);
sReceiver.init().then(() => {
      if (this.isClosed) {
         await sReceiver.close();
         return;
      };
      this._context.streamingReceiver = sReceiver;
      sReceiver.receive(...)
  }).catch (err) { 
  onError(err); 
  }

Also, related issue and PR from the past: #1730 and #2139

@@ -174,7 +185,7 @@ export class Receiver {
this._throwIfReceiverOrConnectionClosed();
this._throwIfAlreadyReceiving();

if (!this._context.batchingReceiver || !this._context.batchingReceiver.isOpen()) {
if (!this._context.batchingReceiver) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is driving this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it it seemed unnecessary - we can call _init() multiple times on a batchingreceiver and each time will work the same (or early exit if it's already open).

This just made it consistent with the check I was doing for streamingReceiver.

(I can bring it back - I have no strong feelings on it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On thinking about this more, I think that the way I have it now is just simpler to reason through. We don't have to worry about any overlap/concurrency issues for the field anymore. It's either set or not and any other calls that manipulate it's state are protected by the lock.

If we don't do that then I have to start reasoning about whether it's possible for two concurrent instances of _context.batchingReceiver can be there (ie, we've swapped out an older one for a newer one and we're somehow initializing the older one). So I think this is worth keeping.

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>
richardpark-msft and others added 5 commits May 8, 2020 16:47
Co-authored-by: Harsha Nalluru <sanallur@microsoft.com>
…r the truth (and the fact that it was throwing an error because I was connecting to the wrong queue)

- Fixed the .catch's to either eliminate them entirely _or_ to only allow one specific error to avoid this situation in the future.
@richardpark-msft
Copy link
Member Author

/azp run js - servicebus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

return;
}

this.isConnecting = true;
Copy link
Contributor

@ramya-rao-a ramya-rao-a May 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isConnecting property gets checked in multiple places to make the decision of whether to call onDetached() or not so that there are not multiple attempts being made at recovering the link.

With the change in this PR, isConnecting is now being set to true only after the lock is acquired, resulting in potential multiple calls to onDetached now being a possibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though recently in #8401, @chradek did add a new _isDetaching flag to ensure that onDetached() is a no-op if it gets called multiple times..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But multiple calls to onDetached() will add to the noise in the logs

So, I would recommend moving the this.isConnecting = true to before the lock is acquired

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a deeper look at this. Your comments have me realize I need to consider the onDetached/open/close in tandem and I don't think I've done that enough.

…y, that's an explicit step for the caller).

(this is still all internal to the library)
@ramya-rao-a
Copy link
Contributor

There are certain parts of this PR that are straight-forward and can be pulled out into a separate PR while we keep this PR open to think more on the ramifications of the changes being done. I recommend creating a separate PR for the below 2 changes

  • Batching Receiver: Resolve the promise with empty array after init() completes, but batching receiver has been closed. This avoids the TypeError from being thrown
  • Streaming Receiver: If close was initiated while link recovery is in progress, then init() should gracefully exit i.e check for this. wasCloseInitiated inside init()

richardpark-msft added a commit that referenced this pull request May 13, 2020
…t() (#8882)

Some simple fixes for robustness that we found when working on #8746. 

* If the receiver is closed in between .init() and the .then() we'll just exit gracefully and return an empty set of messages.
* If close has already been initiated don't init() anything.
@richardpark-msft
Copy link
Member Author

Closing this PR for now:

  • The simple work that @ramya-rao-a mentioned above closes most of the gaps that we were concerned about. close() can still return early (the locks are needed to make that not an issue) but the subsequent init() will make sure the created receiver ultimately gets cleaned up.
  • The work to add in the locking makes sense for track 2 (maybe even in a different form) but will probably be a bit too complicated for just delivering a hotfix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants