Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Event Hubs] Prefetch events #26065

Merged
merged 6 commits into from
Jun 6, 2023
Merged

[Event Hubs] Prefetch events #26065

merged 6 commits into from
Jun 6, 2023

Conversation

deyaaeldeen
Copy link
Member

@deyaaeldeen deyaaeldeen commented May 31, 2023

Updates the partition receiver to use Rhea's prefetch window instead of manually fetching the exact number of events needed. It also adds a subscribe option, prefetchCount, that controls the max number of events to prefetch.

Live run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=2814550&view=results

Related issue: #26055

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-event-hubs

@deyaaeldeen deyaaeldeen changed the title Eventhubs/prefetch [Event Hubs] Prefetch events May 31, 2023

### Other Changes

- Use Rhea's prefetch window to prefetch events from the service. This improves the performance of the receiver by reducing the number of round trips to the service. The default prefetch window is 3 * `maxBatchSize` events. This can be configured by setting the `prefetchCount` option on the `subscribe` method on `EventHubConsumerClient`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when the user closes the consumer/subscription as soon as the prefetch receives the events?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a problem in service-bus since the delivery count would be updated, probably have no effects in event-hubs, would like to understand more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The receiver will be discarded including the prefetched list of events. This is ok in Event Hubs because when you receive from a partition, you start reading from a specific event position and not necessarily from the last event fetched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no event position is specified, the reading will start from the earliest event in the partition.

@@ -123,10 +123,12 @@ testWithServiceTypes((serviceVersion) => {
await receiver1.connect({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this prefetch count support related to the user issue?
I thought the user issue was about asking for a minimum number of messages?
Also, I don't think this was a feature in 5.8.0, how was rolling back to 5.8.0 was fixing the issue for the user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not related. This PR improves performance by prefetching events so chances are the received batch size will be larger than before. We still don't want to wait until we receive maxBatchSize because that is against the library goal of maximizing throughput.

Now there is a question for you, how can we analyze the performance characteristic of this change? More specifically, can we show the average/median batch size before and after the change? can we show how often the user callback was called per minute or second?

Copy link
Member

@HarshaNalluru HarshaNalluru Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these are good metrics to track, how often the user callback is called is already being tracked in the perf test.
We would need to update the framework/test to track the batch size.

Let's work on testing this, so we don't miss anything obvious. 🙂

Copy link
Member

@HarshaNalluru HarshaNalluru Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a stress test run for the checkpoint store test ..
Red ❤️ is the code from this PR, and Blue 💙 is the 5.10.0 version.

image

@deyaaeldeen and I were discussing...
And we speculate the bigger spikes in the CPU usage to be when the credits are added for the "new events to be received". With the new pre-fetch method, the spikes are a lot smaller and less dense.
The memory usage graph also validates the prefetch's improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @HarshaNalluru!

Also, the performance analysis shows that the size median of the received batches increased as well as max and mean throughput.

@deyaaeldeen deyaaeldeen marked this pull request as ready for review June 1, 2023 01:08
@deyaaeldeen deyaaeldeen requested a review from xirzec as a code owner June 1, 2023 01:08
@deyaaeldeen deyaaeldeen merged commit fc8d141 into main Jun 6, 2023
@deyaaeldeen deyaaeldeen deleted the eventhubs/prefetch branch June 6, 2023 01:30
HarshaNalluru added a commit that referenced this pull request Jun 12, 2023
…ting (#26095)

## Updates to the tests following #26065 
1. default test duration is set to 2 days for stress tests
2. Added a new "log-median-batch-size" option for the perf test
- `log-median-batch-size` is a boolean that logs more information
related to the batch size, such as median, max, average, etc.
   - It can be useful when relevant code is updated. 
- Being introduced in the perf test when the prefetch feature was added
to Event Hubs.

---------

Co-authored-by: Deyaaeldeen Almahallawi <dealmaha@microsoft.com>
minhanh-phan pushed a commit to minhanh-phan/azure-sdk-for-js that referenced this pull request Jun 12, 2023
Updates the partition receiver to use Rhea's prefetch window instead of
manually fetching the exact number of events needed. It also adds a
`subscribe` option, `prefetchCount`, that controls the max number of
events to prefetch.

Live run:
https://dev.azure.com/azure-sdk/internal/_build/results?buildId=2814550&view=results

Related issue: Azure#26055
minhanh-phan pushed a commit to minhanh-phan/azure-sdk-for-js that referenced this pull request Jun 12, 2023
…ting (Azure#26095)

## Updates to the tests following Azure#26065 
1. default test duration is set to 2 days for stress tests
2. Added a new "log-median-batch-size" option for the perf test
- `log-median-batch-size` is a boolean that logs more information
related to the batch size, such as median, max, average, etc.
   - It can be useful when relevant code is updated. 
- Being introduced in the perf test when the prefetch feature was added
to Event Hubs.

---------

Co-authored-by: Deyaaeldeen Almahallawi <dealmaha@microsoft.com>
deyaaeldeen added a commit that referenced this pull request Nov 7, 2023
### Packages impacted by this PR
@Azure/event-hubs

### Issues associated with this PR
#27253

### Describe the problem that is addressed by this PR

#27253 (comment)

### What are the possible designs available to address the problem? If
there are more than one possible design, why was the one in this PR
chosen?


### Are there test cases added in this PR? _(If not, why?)_
To be tested using stress testing framework.

UPDATE: The results are in and it is confirmed there is no more space
leak!

### Provide a list of related PRs _(if any)_
#26065

### Command used to generate this PR:**_(Applicable only to SDK release
request PRs)_

### Checklists
- [x] Added impacted package name to the issue description
- [ ] Does this PR needs any fixes in the SDK Generator?** _(If so,
create an Issue in the
[Autorest/typescript](https://github.com/Azure/autorest.typescript)
repository and link it here)_
- [x] Added a changelog (if necessary)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants