Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][client]Reduce CPU usage when client idle and batch message enable #23188

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

stillerrr
Copy link

@stillerrr stillerrr commented Aug 17, 2024

Reduce CPU usage when client idle and batch message enable

Fixes #23187

Motivation

ProducerImple use EventLoopGroup to send batch message per 1ms by default, it would cause about 14% CPU usage when client is idle and no messaage producing

Modifications

Use JDK ScheduleExecutorService to do this schedule task

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

Reduce CPU usage when client idle and batch message enable
Copy link

@stillerrr Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-label-missing doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Aug 17, 2024
@@ -266,6 +271,8 @@ public ProducerImpl(PulsarClientImpl client, String topic, ProducerConfiguration
}
this.batchMessageContainer = (BatchMessageContainerBase) containerBuilder.build();
this.batchMessageContainer.setProducer(this);
ThreadFactory threadFactory = new ExecutorProvider.ExtendedThreadFactory("pulsar-batch", Thread.currentThread().isDaemon());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse an existing thread pool? Otherwise each ProducerImpl(partition) will create a thread pool

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resue the scheduleExecutorProvider of PulsarClientImpl

ClientCnx cnx = cnx();
if (cnx != null && isBatchMessagingEnabled()) {
this.batchFlushTask = cnx.ctx().executor().schedule(catchingAndLoggingThrowables(this::batchFlushTask),
if (isBatchMessagingEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove cnx != null ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already add back, make sure the judgment is not changed

Reuse PulsarClient scheduleExecutorProvider
Add back the judgment condition
@stillerrr stillerrr changed the title Reduce CPU usage when client idle and batch message enable [improve][client]Reduce CPU usage when client idle and batch message enable Aug 17, 2024
@stillerrr
Copy link
Author

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

@stillerrr
Copy link
Author

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread
image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl
image
image
image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default
image
image
image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result
image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

@hanmz
Copy link
Contributor

hanmz commented Aug 23, 2024

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?

@lhotari
Copy link
Member

lhotari commented Aug 23, 2024

Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?

@hanmz In this case, there shouldn't be any active tasks if the client is idle. please check my comment #23188 (review) . That's why this PR doesn't make sense to me.

@lhotari
Copy link
Member

lhotari commented Aug 23, 2024

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread

@stillerrr Have you tried profiling with Async Profiler to find out what's going on?

@lhotari
Copy link
Member

lhotari commented Aug 23, 2024

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

@stillerrr There shouldn't be any sending tasks at all when the client is idle. please see my previous comment. That's why it's necessary to profile this with Async Profiler.

@stillerrr
Copy link
Author

@hanmz @lhotari The point thing is ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule.
Once batch sending enable, create a cause that sending task 1000 times per 1s, ScheduledExecutorService.schedule woule use less 1% CPU, but EventExecutorGroup.schedule use 14% CPU

In version 2.10.4, pulsar producer do scheduled task once 1ms, the result of cpu usage is what I sent
14% for EventExecutorGroup.schedule
image

less than 1% for ScheduledExecutorService.schedule
image

@lhotari
Copy link
Member

lhotari commented Sep 3, 2024

Reduce CPU usage when client idle and batch message enable

The description is currently misleading. Re: #23187 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[improve][client]pulsar-client-io thread has high CPU usage when client idle and batch message enable
4 participants