Improve the batching algorithm to optimize the size of batches #18634

Mohit-Chakraborty · 2021-02-10T01:03:03Z

No description provided.

ghost · 2021-03-12T10:05:18Z

Hi @Mohit-Chakraborty. There hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by removing the no-recent-activity label. Otherwise, we'll close this out in 7 days.

ghost · 2021-03-13T16:00:17Z

Hi @Mohit-Chakraborty. There was a mistake and this issue was unintentionally flagged as a stale pull request. The label has been removed and the issue will remain active; no action is needed on your part. Apologies for the inconvenience.

Mohit-Chakraborty · 2021-03-24T00:10:24Z

When submitting documents for indexing, we split batches on unique key of the documents, so that any exception from the service side can be correctly mapped and our document submission retry mechanism is stable. When designing a solution, we should aim to -

Try to fill the entire batch
Maintain the order of document submission
For .NET, we modified the batching algorithm via #18469
This helps with 2 above (maintain the order of document submission), but we can do better regarding 1 (try to fill the batch to the maximum extent).

I tried another round of improvement with #18603, but there were concerns about semantic change to the operation.

The change is adding a “flush duplicate actions immediately after the batch, regardless of size” behavior that might result in us sending a lot of extra batches. Ideally we’d just update the existing pending/retry queues and keep the rest of the logic the same (since it’s already so much to wrap your head around). That’s probably a nontrivial ask with .NET’s Queue though. I think we might need to switch from Queue to something else.

The remaining work is to modify the batching algorithm further, so that the above concerns are alleviated.

Mohit-Chakraborty added Client This issue points to a problem in the data-plane of the library. Search labels Feb 10, 2021

Mohit-Chakraborty added this to the Backlog milestone Feb 10, 2021

Mohit-Chakraborty self-assigned this Feb 10, 2021

ghost added the no-recent-activity There has been no recent activity on this issue. label Mar 12, 2021

ghost removed the no-recent-activity There has been no recent activity on this issue. label Mar 13, 2021

pallavit assigned ShivangiReja and unassigned Mohit-Chakraborty Dec 7, 2022

ShivangiReja closed this as not planned Won't fix, can't repro, duplicate, stale Dec 12, 2023

github-actions bot locked and limited conversation to collaborators Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the batching algorithm to optimize the size of batches #18634

Improve the batching algorithm to optimize the size of batches #18634

Mohit-Chakraborty commented Feb 10, 2021

ghost commented Mar 12, 2021

ghost commented Mar 13, 2021

Mohit-Chakraborty commented Mar 24, 2021

Improve the batching algorithm to optimize the size of batches #18634

Improve the batching algorithm to optimize the size of batches #18634

Comments

Mohit-Chakraborty commented Feb 10, 2021

ghost commented Mar 12, 2021

ghost commented Mar 13, 2021

Mohit-Chakraborty commented Mar 24, 2021