Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

Taskq locking optimization #32

Closed
behlendorf opened this issue Nov 4, 2010 · 1 comment
Closed

Taskq locking optimization #32

behlendorf opened this issue Nov 4, 2010 · 1 comment
Labels

Comments

@behlendorf
Copy link
Contributor

Recent oprofile results suggest that the tq->tq_lock can be highly contended on 16-core when your attempting perform a large number of small work items. This results in a significant impact to taskq throughput at lots of wasted cpu cycles, 19% according to oprofile. Additionally, this lock has to disable interrupts so there may be wide spread performance implications. This appears to be much less of an issue on smaller SMP systems.

To address this issue we need to look in to refactoring the taskq code to minimize the length of time this lock is held. Currently this single lock protects everything is the taskq structure all the values, flags, list heads, and wait queues. This was done to keep the implementation simple but we could benefit from finer grained locking and clever use of atomic types.

Fundamentally, the only the pending and priority lists need to be protected by a spin_lock_irqsave which disables interrupts. This is done to ensure that while one of the taskq threads is dequeuing a request with the lock held, we never service an interrupt which dispatched a new item to the same taskq. This would lead to a deadlock. If we can minimize what need to be locked in taskq_dispatch() say by making it atomic, or using per-cpu data structures, we can then minimize the need to use spin_lock_irqsave().

In fact as of 2.6.36 we can take advantage of some new kernel infrastructure. It's in this release that concurrency managed work queues (cmwq) first appear and they perform a very similar function to taskq. So much so that it appears we should be able to layer the taskq API on top of cmwqs. This has a lot of advantages including keeping the kernel thread count to a minimum. The previous workqueues were not used for the taskq's because the API was for to limited, incidentally the kernel developers decided the same thing which is why we now have cmwq's.

Anyway, that's great for folks running bleeding edge kernels but we may still want to look in to optimizing the existing taskq implementation. This support is so new it likely won't be appearing in things like RHEL6 which is what we'll likely be using.

@nedbass
Copy link
Contributor

nedbass commented Jan 18, 2012

I've found the large memory allocation in the splat test may fail on some systems. We should probably scale back the number of tasks before this is merged.

behlendorf pushed a commit that referenced this issue Jan 18, 2012
Add a test designed to generate contention on the taskq spinlock by
using a large number of threads (100) to perform a large number (131072)
of trivial work items from a single queue.  This simulates conditions
that may occur with the zio free taskq when a 1TB file is removed from a
ZFS filesystem, for example.  This test should always pass.  Its purpose
is to provide a benchmark to easily measure the effectiveness of taskq
optimizations using statistics from the kernel lock profiler.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
nedbass added a commit to nedbass/spl that referenced this issue Jan 19, 2012
This reverts commit ec2b410.

A race condition was introduced by which a wake_up() call can be lost
after the taskq thread determines there is no pending work items,
leading to deadlock:

1. taksq thread enables interrupts
2. dispatcher thread runs, queues work item, call wake_up()
3. taskq thread runs, adds self to waitq, sleeps

This could easily happen if an interrupt for an IO completion was
outstanding at the point where the taskq thread reenables interrupts,
just before the call to add_wait_queue_exclusive().  The handler would
run immediately within the race window.

Issue openzfs#32
nedbass added a commit to nedbass/spl that referenced this issue Jan 19, 2012
Testing has shown that tq->tq_lock can be highly contended when a
large number of small work items are dispatched.  The lock hold time
is reduced by the following changes:

1) Use exclusive threads in the work_waitq

When a single work item is dispatched we only need to wake a single
thread to service it.  The current implementation uses non-exclusive
threads so all threads are woken when the dispatcher calls wake_up().
If a large number of threads are in the queue this overhead can become
non-negligible.

2) Conditionally add/remove threads from work waitq

Taskq threads need only add themselves to the work wait queue if
there are no pending work items.

Issue openzfs#32
behlendorf pushed a commit that referenced this issue Jan 20, 2012
This reverts commit ec2b410.

A race condition was introduced by which a wake_up() call can be lost
after the taskq thread determines there is no pending work items,
leading to deadlock:

1. taksq thread enables interrupts
2. dispatcher thread runs, queues work item, call wake_up()
3. taskq thread runs, adds self to waitq, sleeps

This could easily happen if an interrupt for an IO completion was
outstanding at the point where the taskq thread reenables interrupts,
just before the call to add_wait_queue_exclusive().  The handler would
run immediately within the race window.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
behlendorf pushed a commit that referenced this issue Jan 20, 2012
Testing has shown that tq->tq_lock can be highly contended when a
large number of small work items are dispatched.  The lock hold time
is reduced by the following changes:

1) Use exclusive threads in the work_waitq

When a single work item is dispatched we only need to wake a single
thread to service it.  The current implementation uses non-exclusive
threads so all threads are woken when the dispatcher calls wake_up().
If a large number of threads are in the queue this overhead can become
non-negligible.

2) Conditionally add/remove threads from work waitq

Taskq threads need only add themselves to the work wait queue if
there are no pending work items.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants