Possible txg locking improvements #4333

ryao · 2016-02-13T00:00:24Z

This contains a few ideas that I had when looking at code related to txg_quiesce().

We can avoid some overhead by using atomic instructions on ->tc_count rather than protecting it with a mutex. We have fewer points in the code where a mutex would block us and use 1 less atomic instruction in each.
We really do not need to hold ->tc_open_lock when processing a DMU transaction provided that we we have incremented ->tc_count, so we let it go early.
Calling kpreempt_enable() defeats the purpose of using CPU_SEQID to index into a per-CPU structure because we are then preempted, allowing another thread to contend with us on the per-CPU structure we reference. This behavior was introduced by 15a9e03 as a means to work around scary warnings from Linux, but we can workaround them by mapping things to raw_smp_processor_id(), so lets do that to avoid unnecessary preemption.

Using a mutex to protect increment and decrement operations is inefficient. It uses two atomic instructions, will spin when a single waiter is present (under Linux with typical configuration options) and if two waiters occur, will actually deschedule them, which is far slower than waiting. Using atomic instructions allows hardware to handle the blocking, which is much faster than software. Signed-off-by: Richard Yao <ryao@gentoo.org>

There does not appear to be anything protected by ->tc_open_lock that callers use while holding ->tc_open_lock means that anything that preempts us would block on us when the only things that really need protection are ->tx_open_txg and ->tc_count. Signed-off-by: Richard Yao <ryao@gentoo.org>

ryao · 2016-02-17T21:54:55Z

@behlendorf Thinking some more about the "txg visibility code should not execute under tc_open_lock" patch some more, direct reclaim in the memory allocation in spa_txg_history_add() could deadlock when it tries to grab ->tc_open_lock in txg_hold_open() because the thread would likely already be holding it.

Moving it out of the critical section is an improvement, although I suspect it does not go far enough as blocking here means that the sync thread cannot get new transactions to sync. I suspect that the right thing to do might be to change the code to use KM_NOSLEEP and handle NULL return values by allowing the code to skip reporting.

The memory allocation and locking in `spa_txg_history_*()` can potentially block txg_hold_open for arbitrarily long periods of time. Signed-off-by: Richard Yao <ryao@gentoo.org>

…SEQID use in preemptible context" This reverts commit 15a9e03 and commit 8878261. Wrapping CPU_SEQID with kpreempt_disable() and kpreempt_enable() causes us to call preempt_schedule(). This allows the thread to be rescheduled on another CPU, allowing the vacated CPU to begin using the per-CPU structure that we still reference on the CPU we just vacated. This is an unnecessary source of contention that is a regression from the illumos behavior. We can turn off the Linux warnings by calling `raw_smp_processor_id()` instead of wrapping things with kpreempt_disable() and kpreempt_enable(), so lets do that. Requires-spl: refs/pull/532/head Signed-off-by: Richard Yao <ryao@gentoo.org>

The memory allocation and locking in `spa_txg_history_*()` can potentially block txg_hold_open for arbitrarily long periods of time. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#4333

behlendorf · 2016-07-28T00:26:05Z

I've merged f26b4b3 which is straight forward but would prefer not merge the other changes until we have some evidence that they improve performance.

f26b4b3 txg visibility code should not execute under tc_open_lock

ryao force-pushed the locking branch 6 times, most recently from 7c52fff to aafb5d6 Compare February 17, 2016 02:06

ryao force-pushed the locking branch from aafb5d6 to 5e40e83 Compare February 17, 2016 20:55

ryao force-pushed the locking branch 2 times, most recently from 16db03c to 70d70fe Compare February 17, 2016 21:20

ryao and others added 2 commits February 17, 2016 21:26

txg visibility code should not execute under tc_open_lock

3ac9d91

The memory allocation and locking in `spa_txg_history_*()` can potentially block txg_hold_open for arbitrarily long periods of time. Signed-off-by: Richard Yao <ryao@gentoo.org>

ryao force-pushed the locking branch from 70d70fe to 4159569 Compare February 18, 2016 02:26

sempervictus mentioned this pull request Mar 15, 2016

TRIM/Discard support from Nexenta #3656

Closed

behlendorf closed this Jul 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible txg locking improvements #4333

Possible txg locking improvements #4333

ryao commented Feb 13, 2016

ryao commented Feb 17, 2016

behlendorf commented Jul 28, 2016

Possible txg locking improvements #4333

Possible txg locking improvements #4333

Conversation

ryao commented Feb 13, 2016

ryao commented Feb 17, 2016

behlendorf commented Jul 28, 2016