You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A nasty race condition exists in Project.get_annotator_task if it is called twice in parallel for the same user (e.g. due to network instability the user's browser believes the first call has failed and retries it - this has happened at least once in the production installation)
The method first checks whether there is already a task in progress for this user, and if so returns it. If not it proceeds to assign a new task and returns that instead. However two parallel calls can both see no existing task, then both attempt to assign a new task, and even though the two calls are both executing inside with transaction.atomic() (in the RPC call), both can validly succeed, leaving the user with multiple pending annotation tasks recorded in the database. This means that all subsequent calls to get_annotator_task will fail with an exception since they enforce the invariant that an annotator can have at most one pending task at any given time.
Analysis
The reason why the atomic transaction is not sufficient protection is, I believe, to do with the way we assign documents to annotators at random. Assigning a task to an annotator does not modify the project or user tables directly, it just inserts a row into the annotation table linking to the user ID and the document ID. Since two calls to assign a new task will generally assign different document IDs, these two inserts do not clash and both transactions can commit successfully.
The simplest fix for this race would be to completely serialize the transactions, i.e. grab some sort of lock before doing the "is there a task already in progress" check and release it only after completing the new task assignment, such that a second simultaneous call is blocked from even checking for pending annotations until the first call is complete and has committed its insert. If it would be too heavy to grab this lock for every call to get_annotator_task then a halfway house might be a kind of "double-checked locking" approach, where it is only in the else case (line 597) that we grab the lock and then check for pending tasks again (this check must happen inside the scope of the lock, even if it has already been checked outside) before assigning a new task and finally releasing the lock.
I don't know enough Django to know how best to implement this, is there a Django concept that maps to the SQL SELECT FOR UPDATE construct? Or a Postgresql-specific API for acquiring locks?
The text was updated successfully, but these errors were encountered:
ianroberts
changed the title
Potential race condition in get_annotator_task assigning multiple docs to same annotator
Race condition in get_annotator_task assigning multiple docs to same annotator
May 16, 2023
If it would be too heavy to grab this lock for every call to get_annotator_task
If we can scope the lock per user (e.g. SELECT FOR UPDATE * FROM service_user WHERE id=:this_user) then it shouldn't be too big a deal as you're only blocking a user against themself.
Describe the bug
A nasty race condition exists in
Project.get_annotator_task
if it is called twice in parallel for the same user (e.g. due to network instability the user's browser believes the first call has failed and retries it - this has happened at least once in the production installation)gate-teamware/backend/models.py
Lines 583 to 602 in f9a1a99
The method first checks whether there is already a task in progress for this user, and if so returns it. If not it proceeds to assign a new task and returns that instead. However two parallel calls can both see no existing task, then both attempt to assign a new task, and even though the two calls are both executing inside
with transaction.atomic()
(in the RPC call), both can validly succeed, leaving the user with multiple pending annotation tasks recorded in the database. This means that all subsequent calls toget_annotator_task
will fail with an exception since they enforce the invariant that an annotator can have at most one pending task at any given time.Analysis
The reason why the atomic transaction is not sufficient protection is, I believe, to do with the way we assign documents to annotators at random. Assigning a task to an annotator does not modify the project or user tables directly, it just inserts a row into the
annotation
table linking to the user ID and the document ID. Since two calls to assign a new task will generally assign different document IDs, these two inserts do not clash and both transactions can commit successfully.The simplest fix for this race would be to completely serialize the transactions, i.e. grab some sort of lock before doing the "is there a task already in progress" check and release it only after completing the new task assignment, such that a second simultaneous call is blocked from even checking for pending annotations until the first call is complete and has committed its insert. If it would be too heavy to grab this lock for every call to
get_annotator_task
then a halfway house might be a kind of "double-checked locking" approach, where it is only in theelse
case (line 597) that we grab the lock and then check for pending tasks again (this check must happen inside the scope of the lock, even if it has already been checked outside) before assigning a new task and finally releasing the lock.I don't know enough Django to know how best to implement this, is there a Django concept that maps to the SQL
SELECT FOR UPDATE
construct? Or a Postgresql-specific API for acquiring locks?The text was updated successfully, but these errors were encountered: