Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race between zil_commit() and zil_suspend() #14514

Merged
merged 1 commit into from
Mar 1, 2023

Commits on Mar 1, 2023

  1. Fix data race between zil_commit() and zil_suspend()

    openzfsonwindows#206 found that it is possible to trip
    `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed
    by the scheduler long enough for a parallel `zil_suspend()` operation to
    exit `zil_commit_impl()`. This is a data race. To prevent this, we
    introduce a `zilog->zl_suspend_lock` rwlock to ensure that all
    outstanding `zil_commit()` operations finish before `zil_suspend()`
    begins and that subsequent operations fallback to `txg_wait_synced()`
    after `zil_suspend()` has begun.
    
    On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers
    from writer starvation. This means that a ZIL intensive system can delay
    `zil_suspend()` indefinitely. This is a pre-existing problem that
    affects everything that uses rw locks, so it needs to be addressed in
    the SPL.  However, builds against `PREEMPT_RT` Linux kernels are
    currently broken due to a GPL symbol issue (openzfs#11097), so we can safely
    disregard that issue for now.
    
    Reported-by: Arun KV <arun.kv@datacore.com>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    ryao committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    81b8c86 View commit details
    Browse the repository at this point in the history