[service-bus] Make close() work when a connection is in process #8746

richardpark-msft · 2020-05-06T20:22:02Z

Currently it's possible for a user to call close() while a connection is in process. When this happens you can end up in a bad state where the object thinks it's closed but it still has open resources.

This commit changes it so:

_init() and close() now use the same lock to prevent them from overlapping in MessageSession and MessageReceiver
_init() respects an "is closed" flag.

Fixes #7986

… close a connection if it's in progress.

…process wait for it to complete so we can properly close down the connection.

sdk/servicebus/service-bus/src/core/messageReceiver.ts

sdk/servicebus/service-bus/src/core/batchingReceiver.ts

ramya-rao-a · 2020-05-07T01:59:18Z

Can we pull this PR out of the draft stage?

…onReceiver is now.

richardpark-msft · 2020-05-07T05:21:06Z

@ramya-rao-a - I found some issues and I think I want to look at it with fresh eyes tomorrow morning:

BatchingReceiver works ok with the way I've got it now - as far as I can tell there isn't a way to start initializing a connection without having set it up in some way so close() blocks on it properly.
SessionReceiver also had an issue but the fix there was just to move the lock logic into the SessionReceiver and out of MessageSession.
StreamingReceiver doesn't work with how it's coded right now. StreamingReceiver.create is async so there's a chance the user can call close(), complete, and then we'll resume creating the streaming receiver which is now "uncloseable" again. We don't store the context.streamingReceiver in time for close() to even know that open() is happening. I can fix this by just moving the lock logic into Receiver, which is what I basically did for SessionReceiver.

As it is my tests are failing but I think the ideas make sense. Just need to take a longer look at it.

… - the new rules are: - If you create a receiver it should be created and set before you do any async calls. This ensures that if the user calls close() it actually routes to the instance that is getting "initialized" so all calls are properly accounted for. - All the locking logic remains in the lower level classes (messageSesssion or messageReceiver). The only things that change in the higher level Receiver/SessionReceiver classes is that they separate out the assignment of the internal field (this._context.<receiver> or this._messageSession) from it's initialization.

richardpark-msft · 2020-05-07T22:07:37Z

@ramya-rao-a - okay, it looks correct now.

The basic pattern was:

When we create the underlying MessageReceiver/MessageSession make sure the outer object (Receiver/SessionReceiver) has a reference to it. Otherwise, close() doesn't actually get routed and all of our locking gets bypased, resulting in an orphaned receiver.
In the lock check inside to make sure that the user hasn't tried to close the object already and that the connection isn't open. There's a "connectionClosing" variable that we don't need to check anymore since the lock takes care of it.

richardpark-msft · 2020-05-07T22:11:00Z

/azp run js - servicebus - tests

azure-pipelines · 2020-05-07T22:11:09Z

Azure Pipelines successfully started running 1 pipeline(s).

- If the result of the _init() for the streaming receiver is a useless receiver (ie, one that isn't open) we can just remove it from context immediately rather than keep it around. - Don't overwrite the context.streamingReceiver. It's okay to call _init() multiple times on the same instance. This is just a redundant check since there is an "already receiving" check above it but...it's nice to have a simple check in there as well. - If _init() is called on a closed MessageReceiver throw the standard "this receiver is closed" non-retryable error - it'll never be valid.

…clear the context.streamingReceiver value if it just comes back invalid" commit.

richardpark-msft · 2020-05-08T00:17:40Z

/azp run js - servicebus - tests

azure-pipelines · 2020-05-08T00:17:49Z

Azure Pipelines successfully started running 1 pipeline(s).

sdk/servicebus/service-bus/src/core/batchingReceiver.ts

sdk/servicebus/service-bus/src/core/messageReceiver.ts

ramya-rao-a · 2020-05-08T05:29:22Z

sdk/servicebus/service-bus/src/receiver.ts

@@ -146,6 +153,10 @@ export class Receiver {
        return;
      })
      .catch((err) => {
+        if (this._context.streamingReceiver != null && !this._context.streamingReceiver.isOpen()) {
+          this._context.streamingReceiver = undefined;
+        }


What scenario does this cover?

This is the "init() failed and we have a streaming receiver that is not open/doesn't own any resources".

There was a test covering this (Streaming - Failed init should not cache recevier) that revealed this behavior which seems sensible. If a streaming receiver is just completely dead (ie, nothing to clean up) it's safe to just not cache it at all.

So, this change is because we are now caching first, initializing later?

A cached streaming receiver symbolizes a receiver that needs to be recovered when there is a connection issue. I am slightly concerned on the implications of this change in order on link recovery.

Take the case of init() failing and a connection issue happening at around the same time.
If the connection recovery parts of the code get executed before the catch block here, then this streaming receiver would be on its merry way to being recovered leading to something like #5541

Can we refactor so that we keep the init first, cache later as before, but still have the changes you want?

const sReceiver = StreamingReceiver.create(..); sReceiver.init().then(() => { if (this.isClosed) { await sReceiver.close(); return; }; this._context.streamingReceiver = sReceiver; sReceiver.receive(...) }).catch (err) { onError(err); }

Also, related issue and PR from the past: #1730 and #2139

ramya-rao-a · 2020-05-08T05:30:50Z

sdk/servicebus/service-bus/src/receiver.ts

@@ -174,7 +185,7 @@ export class Receiver {
    this._throwIfReceiverOrConnectionClosed();
    this._throwIfAlreadyReceiving();

-    if (!this._context.batchingReceiver || !this._context.batchingReceiver.isOpen()) {
+    if (!this._context.batchingReceiver) {


What is driving this change?

Looking at it it seemed unnecessary - we can call _init() multiple times on a batchingreceiver and each time will work the same (or early exit if it's already open).

This just made it consistent with the check I was doing for streamingReceiver.

(I can bring it back - I have no strong feelings on it).

On thinking about this more, I think that the way I have it now is just simpler to reason through. We don't have to worry about any overlap/concurrency issues for the field anymore. It's either set or not and any other calls that manipulate it's state are protected by the lock.

If we don't do that then I have to start reasoning about whether it's possible for two concurrent instances of _context.batchingReceiver can be there (ie, we've swapped out an older one for a newer one and we're somehow initializing the older one). So I think this is worth keeping.

sdk/servicebus/service-bus/src/session/messageSession.ts

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>

sdk/servicebus/service-bus/src/core/messageReceiver.ts

Co-authored-by: Harsha Nalluru <sanallur@microsoft.com>

…chardpark-msft/azure-sdk-for-js into richardpark-7986-proper-close

sdk/servicebus/service-bus/src/session/messageSession.ts

sdk/servicebus/service-bus/test/batchReceiver.spec.ts

sdk/servicebus/service-bus/src/session/messageSession.ts

sdk/servicebus/service-bus/test/streamingReceiverSessions.spec.ts

…lock) looks like when we're in init()

Co-authored-by: Harsha Nalluru <sanallur@microsoft.com>

…r the truth (and the fact that it was throwing an error because I was connecting to the wrong queue) - Fixed the .catch's to either eliminate them entirely _or_ to only allow one specific error to avoid this situation in the future.

richardpark-msft · 2020-05-09T00:57:16Z

/azp run js - servicebus - tests

azure-pipelines · 2020-05-09T00:57:25Z

Azure Pipelines successfully started running 1 pipeline(s).

ramya-rao-a · 2020-05-10T04:37:12Z

sdk/servicebus/service-bus/src/core/messageReceiver.ts

+            return;
+          }
+
+          this.isConnecting = true;


The isConnecting property gets checked in multiple places to make the decision of whether to call onDetached() or not so that there are not multiple attempts being made at recovering the link.

With the change in this PR, isConnecting is now being set to true only after the lock is acquired, resulting in potential multiple calls to onDetached now being a possibility.

Though recently in #8401, @chradek did add a new _isDetaching flag to ensure that onDetached() is a no-op if it gets called multiple times..

But multiple calls to onDetached() will add to the noise in the logs

So, I would recommend moving the this.isConnecting = true to before the lock is acquired

I'll take a deeper look at this. Your comments have me realize I need to consider the onDetached/open/close in tandem and I don't think I've done that enough.

sdk/servicebus/service-bus/src/core/streamingReceiver.ts

…ageReceiver as a no-op.

…y, that's an explicit step for the caller). (this is still all internal to the library)

ramya-rao-a · 2020-05-12T00:26:20Z

There are certain parts of this PR that are straight-forward and can be pulled out into a separate PR while we keep this PR open to think more on the ramifications of the changes being done. I recommend creating a separate PR for the below 2 changes

Batching Receiver: Resolve the promise with empty array after init() completes, but batching receiver has been closed. This avoids the TypeError from being thrown
Streaming Receiver: If close was initiated while link recovery is in progress, then init() should gracefully exit i.e check for this. wasCloseInitiated inside init()

…t() (#8882) Some simple fixes for robustness that we found when working on #8746. * If the receiver is closed in between .init() and the .then() we'll just exit gracefully and return an empty set of messages. * If close has already been initiated don't init() anything.

… richardpark-7986-proper-close

richardpark-msft · 2020-05-13T22:14:03Z

Closing this PR for now:

The simple work that @ramya-rao-a mentioned above closes most of the gaps that we were concerned about. close() can still return early (the locks are needed to make that not an issue) but the subsequent init() will make sure the created receiver ultimately gets cleaned up.
The work to add in the locking makes sense for track 2 (maybe even in a different form) but will probably be a bit too complicated for just delivering a hotfix.

richardpark-msft added 2 commits May 6, 2020 13:03

Handle the non-session side of making sure that close() will properly…

6d17e13

… close a connection if it's in progress.

Apply the same open/close logic to MessageSession - if an open is in …

203a90d

…process wait for it to complete so we can properly close down the connection.

richardpark-msft requested a review from ramya-rao-a May 6, 2020 20:22

ramya-rao-a reviewed May 6, 2020

View reviewed changes

sdk/servicebus/service-bus/src/core/messageReceiver.ts Show resolved Hide resolved

Adding in tests for streaming and batching receivers.

741cb63

ramya-rao-a reviewed May 7, 2020

View reviewed changes

sdk/servicebus/service-bus/src/core/batchingReceiver.ts Outdated Show resolved Hide resolved

richardpark-msft mentioned this pull request May 7, 2020

[service-bus] Track2 - remove receiver and sender caching from track 1 #8143

Closed

3 tasks

Rollback change here - it won't work properly with the way that Sessi…

0d3198a

…onReceiver is now.

richardpark-msft added 2 commits May 7, 2020 11:20

Batch receiver is taken care of.

0d1c8fe

richardpark-msft marked this pull request as ready for review May 7, 2020 22:08

richardpark-msft requested a review from chradek as a code owner May 7, 2020 22:08

richardpark-msft requested review from ramya-rao-a and HarshaNalluru May 7, 2020 22:11

richardpark-msft added 2 commits May 7, 2020 17:13

Whoops, should have committed with the previous commit. This is the "…

3330f4a

…clear the context.streamingReceiver value if it just comes back invalid" commit.

ramya-rao-a reviewed May 8, 2020

View reviewed changes

sdk/servicebus/service-bus/src/core/batchingReceiver.ts Outdated Show resolved Hide resolved

ramya-rao-a reviewed May 8, 2020

View reviewed changes

sdk/servicebus/service-bus/src/core/messageReceiver.ts Outdated Show resolved Hide resolved

ramya-rao-a reviewed May 8, 2020

View reviewed changes

sdk/servicebus/service-bus/src/session/messageSession.ts Outdated Show resolved Hide resolved

Consistency

5933c14

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>