Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spurious wake-ups and worker threads initialization in workerpool #339

Merged
merged 2 commits into from
Jun 26, 2024

Conversation

berteauxjb
Copy link

I encountered random segfaults while using this library and tried to figure out what was happening. I isolated the segfault to the function worker_thread() of workerpool.c. Long story short, there is no predicate that prevents worker_thread() from running if it is woken up spuriously and a task has already been added. Since there is no mutex used in workerpool_add_task(), it is possible for the zarray wp->tasks to be reallocated, rendering the volatile pointer task in worker_thread() invalid and causing a segfault.

This PR addresses this issue by adding a predicate and shielding calls that access wp behind a mutex. I also saw issue #70 and agree that the initialization of the worker threads might lead to race conditions, so I addressed that too.

@christian-rauch
Copy link
Collaborator

Thanks for the PR. I approved the CI for this PR. The tests crash on Windows (Exception: SegFault). Any idea what is going on there?

@berteauxjb
Copy link
Author

Thanks for the PR. I approved the CI for this PR. The tests crash on Windows (Exception: SegFault). Any idea what is going on there?

I was able to reproduce on a Windows machine 😄 It turns out that the wp->mutex is only initialized if wp->nthreads > 1. In the test there is only 1 thread, and workerpool_add_task was accessing a non-initialized mutex. I shielded the usage of the mutex with if (wp->nthreads > 1) like it's done in other places in the file.

@christian-rauch christian-rauch self-requested a review June 26, 2024 07:09
Copy link
Collaborator

@christian-rauch christian-rauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final request: Can you now squash your fixup commit 737ff6b into the original commit be17a2d and force-push again with the clean git history?

Jean-Bernard Berteaux added 2 commits June 26, 2024 10:16
Threads that wait on the condition variable startcond can be woken up
spuriously. This is not a problem if no tasks have been added, but as
soon as there is one, the condition in the while loop of worker_thread
will evaluate to false and the worker thread will run the task. If more
tasks are added, there is a chance that the zarray is reallocated and
that its pointer is changed. Since the woker thread gets a raw pointer
to the zarray, this pointer can become invalid and cause a segfault.

The solution is to add a predicate that is checked when the worker
thread is woken up on the condition variable startcond. The while loop
now has 2 conditions:
- start condition: if wp->start_tasks is true, tasks can be executed
- stop condition: if wp->taskspos == zarray_size(wp->tasks), there are
no more tasks to execute

I also shielded concurrent accesses to the pointer wp with the mutex.
@berteauxjb
Copy link
Author

Final request: Can you now squash your fixup commit 737ff6b into the original commit be17a2d and force-push again with the clean git history?

Done!

@christian-rauch
Copy link
Collaborator

Thanks a lot!

@christian-rauch christian-rauch merged commit 188c0e0 into AprilRobotics:master Jun 26, 2024
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants