Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] A potential data corruption in Pandas UDFs #9941

Closed
firestarman opened this issue Dec 4, 2023 · 0 comments · Fixed by #9942
Closed

[BUG] A potential data corruption in Pandas UDFs #9941

firestarman opened this issue Dec 4, 2023 · 0 comments · Fixed by #9942
Assignees
Labels
bug Something isn't working

Comments

@firestarman
Copy link
Collaborator

firestarman commented Dec 4, 2023

Thanks to @GaryShen2008 finding this.

There is a potential data corruption in Pandas UDFs using BatchQueue to combine the input and python output batches.
Currently the BatchProducer and BatchQueue use different locks to protect the batch pulling from the input iterator and the batch appending to the queue separately.
So in a two-threaded Python ruuner, there is a race when the reader thread and the writer thread append batches to the batch queue.

One possible case is:

  1. the writer thread gets a batch A, but next it pauses.
  2. then the reader thread gets the next Batch B, and appends it to the queue.
  3. the writer thread resumes and appends batch A to the queue.

Therefore, batch A and B have the reversed order in the queue now, leading to data
corruption when doing the combination.

@firestarman firestarman added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 4, 2023
@firestarman firestarman changed the title [BUG] [BUG] A potential data corruption in Pandas UDFs Dec 4, 2023
@firestarman firestarman self-assigned this Dec 4, 2023
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants