Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages from few partitions are getting delayed #18467

Closed
ponsakthi opened this issue Feb 5, 2021 · 5 comments
Closed

Messages from few partitions are getting delayed #18467

ponsakthi opened this issue Feb 5, 2021 · 5 comments
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@ponsakthi
Copy link

We are using Microsoft.Azure.EventHubs.Processor 4.2.0 version on .net core 3.1 running as a pod on Openshift cluster
We have 8 pods listening to eventhub with 32 partitions.

Quite often we are seeing issues where the pod is not able to receive message from few partition where as it is able to pull messages from other partitions. The delay is as high as 10 minutes at times. This gets auto resolved and we get all the messages in burst fashion. But we don't have visibility on why is there a huge delay on few partitions. Is there a trace or log that can show us what is happening behind the scenes while polling the eventhub?

Is this the same issue that is fixed as part of #12691 in the latest version Microsoft.Azure.EventHubs.Processor 4.3.1

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 5, 2021
@Mohit-Chakraborty Mohit-Chakraborty added Client This issue points to a problem in the data-plane of the library. Event Hubs needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Feb 5, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 5, 2021
@Mohit-Chakraborty
Copy link
Contributor

Thank you for your feedback. Tagging and routing to the team best able to assist.

@jsquire jsquire added the Service Attention Workflow: This issue is responsible by Azure service team. label Feb 5, 2021
@ghost
Copy link

ghost commented Feb 5, 2021

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @samuelkoppes.

Issue Details

We are using Microsoft.Azure.EventHubs.Processor 4.2.0 version on .net core 3.1 running as a pod on Openshift cluster
We have 8 pods listening to eventhub with 32 partitions.

Quite often we are seeing issues where the pod is not able to receive message from few partition where as it is able to pull messages from other partitions. The delay is as high as 10 minutes at times. This gets auto resolved and we get all the messages in burst fashion. But we don't have visibility on why is there a huge delay on few partitions. Is there a trace or log that can show us what is happening behind the scenes while polling the eventhub?

Is this the same issue that is fixed as part of #12691 in the latest version Microsoft.Azure.EventHubs.Processor 4.3.1

Author: ponsakthi
Assignees: -
Labels:

Client, Event Hubs, Service Attention, customer-reported, needs-team-attention, question

Milestone: -

@ponsakthi
Copy link
Author

Update:
We upgraded to Microsoft.Azure.EventHubs.Processor 4.3.1 and we are seeing the same issue . But we are using EventProcessorHost (v4) and not EventProcessorClient(v5)

One more thing that we observed: Say pod A is holding the lease to partition 1. The lease ownership changes to Pod B if Pod A is not successfully renewing the least but Pod B is not pulling messages even if the ownership is with it. Later Pod A reclaims the lease back and successfully starts processing from partition 1 again. For the entire time when PodB was holding the lease no messages were processed and hence the latency. Is this a known issue in v4 client and have we addressed this in v5 client?

@serkantkaraca
Copy link
Member

serkantkaraca commented Apr 23, 2021

I have investigated very similar issue with another customer on K8s and that turned to be a downstream write stuck issue.

Can you please trace in ProcessEventsAsync code as in and out? See if you have a blocking call which is causing the stuck behavior. Each 'in' should have a corresponding 'out' w/ reasonable delay.

@ramya-rao-a ramya-rao-a added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Jun 15, 2021
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Jun 22, 2021
@ghost
Copy link

ghost commented Jun 22, 2021

Hi, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@ghost ghost closed this as completed Jul 7, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2023
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

5 participants