panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14) #15802

VoxSciurorum · 2024-01-22T01:30:19Z

System information

Type	Version/Name
Distribution Name	FreeBSD
Distribution Version	15-CURRENT
Kernel Version	15
Architecture	arm64, amd64
OpenZFS Version	2.1.14

Describe the problem you're observing

Sometimes the system panics during shutdown with an assertion failure.
Four examples from different systems:

panic: VERIFY3(rc->rc_count == number) failed (262144 == 0)
panic: VERIFY3(rc->rc_count == number) failed (59392 == 0)
panic: VERIFY3(rc->rc_count == number) failed (232652800 == 0)
panic: VERIFY3(rc->rc_count == number) failed (872153088 == 0)

Describe how to reproduce the problem

Unload ZFS and get unlucky. It doesn't always happen, but it happens to more
people than just me. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276341

Include any warning/errors/backtraces from the system logs

panic: VERIFY3(rc->rc_count == number) failed (262144 == 0)

cpuid = 0
time = 1705275564
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
spl_panic() at spl_panic+0x44
zfs_refcount_destroy_many() at zfs_refcount_destroy_many+0xdc
arc_fini() at arc_fini+0x274
dmu_fini() at dmu_fini+0xc
spa_fini() at spa_fini+0x34
zfs_kmod_fini() at zfs_kmod_fini+0x78
zfs_shutdown() at zfs_shutdown+0x40
kern_reboot() at kern_reboot+0x574
sys_reboot() at sys_reboot+0x350
do_el0_sync() at do_el0_sync+0x58c
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000

[...]

#9  0xffff00000003daf8 in spl_panic (file=<optimized out>, 
    func=<optimized out>, line=13227925, 
    fmt=0x12 <error: Cannot access memory at address 0x12>)
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:100
        ap = {__stack = 0xffff0000f39d5630, __gr_top = 0xffff0000f39d55e0, 
          __vr_top = 0xffffa000010a92a0, __gr_offs = -32, __vr_offs = 0}
#10 0xffff0000001633e0 in zfs_refcount_destroy_many (rc=<optimized out>, 
    number=18446462598740100184)
    at /usr/src/sys/contrib/openzfs/module/zfs/refcount.c:98
        _verify3_right = 18446462598740100184
        _verify3_left = 115
        cookie = 0x0
        ref = <optimized out>
#11 0xffff0000001633fc in zfs_refcount_destroy (rc=0x12, 
    rc@entry=0xffff0000010e7a80 <ARC_anon+128>)
    at /usr/src/sys/contrib/openzfs/module/zfs/refcount.c:112
No locals.
#12 0xffff0000000c0508 in arc_state_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:7421
No locals.
#13 arc_fini () at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:7753
        p = <optimized out>
#14 0xffff0000000ee4d0 in dmu_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/dmu.c:2549
No locals.
#15 0xffff000000189dd8 in spa_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/spa_misc.c:2549
No locals.
#16 0xffff0000001e67ac in zfs_kmod_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7933
        zsnext = 0x0
        zs = 0x0
#17 0xffff00000006e654 in zfs__fini ()
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:284
No locals.
#18 zfs_shutdown (arg=<optimized out>, howto=<optimized out>)
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:297
No locals.
#19 0xffff000000693158 in kern_reboot (howto=0)
    at /usr/src/sys/kern/kern_shutdown.c:517
        _ep = 0xffffa00002f63c40
        _t = 0xffffa00002f63c40
        _el = 0xffffa00001709c80
        once = 1
#20 0xffff000000692b74 in sys_reboot (td=0xffff000121c2a640, 
    uap=0xffff000121c2aa40) at /usr/src/sys/kern/kern_shutdown.c:312
        error = <optimized out>
#21 0xffff000000a60060 in syscallenter (td=0xffff000121c2a640)
    at /usr/src/sys/arm64/arm64/../../kern/subr_syscall.c:186
        se = 0xffff000000fa1378 <sysent+1760>
        sa = 0xffff000121c2aa30
        p = <optimized out>
        error = <optimized out>
        sy_thr_static = true
        traced = <optimized out>
        _audit_entered = <optimized out>

The call that failed is

zfs_refcount_destroy(&arc_anon->arcs_size[ARC_BUFC_DATA]);

I printed out the object being destroyed and nothing seemed wrong with it
except the count that should have been zero was not zero.

(kgdb) p ARC_anon.arcs_size[0]
$3 = {rc_count = 262144, rc_mtx = {lock_object = {lo_name = 0xffff000000c8d8f6 "rc->rc_mtx", 
      lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rc_tree = {avl_root = 0x0, 
    avl_compar = 0xffff0000001630f0 <zfs_refcount_compare>, avl_offset = 0, avl_numnodes = 0}, 
  rc_removed = {list_size = 48, list_offset = 0, list_head = {
      list_next = 0xffff0000010e6ad8 <ARC_anon+216>, list_prev = 0xffff0000010e6ad8 <ARC_anon+216>}}, 
  rc_removed_count = 0, rc_tracked = 0}

Note that sx_lock=1 is the correct value for a lock of this type when it is destroyed.
The links in the list point to the list object as I expect for an empty list.

The text was updated successfully, but these errors were encountered:

VoxSciurorum · 2024-04-23T13:37:39Z

On a different FreeBSD system running 2.2.3:

# kldunload zfs
Freed UMA keg (arc_buf_hdr_t_full) was not empty (2 items).  Lost 2 pages of memory.
Freed UMA keg (arc_buf_t) was not empty (2 items).  Lost 2 pages of memory.
Freed UMA keg (zio_buf_comb_131072) was not empty (2 items).  Lost 64 pages of memory.

This may be the same bug. ZFS did not crash because assertions were not enabled. The OS noticed memory had not been freed. (The message came from the kernel, not kldunload. I was typing the command on the system console.)

lundman · 2024-05-09T05:52:25Z

Old ticket, but came across something similar at openzfsonwindows#364 (comment)

I wonder if failmode=continue is in use here as well, and if there was a zio_wait restarting hung I/O for pool in this issue.

markjdb · 2024-05-10T19:15:36Z

I set vfs.zfs.reference_tracking_enable=1 and reproduced the problem. While shutting down, I hit a panic in the dbuf code which may or may not be related:

panic: No such hold 0xffffa7ffccf11000 on refcount ffff0000017a7d80
...
#7  kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/home/markj/freebsd/sys/kern/subr_kdb.c:556
#8  0xffff0000004a8d50 in vpanic (fmt=<optimized out>, ap=...) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:967
#9  0xffff0000004a8b64 in panic (fmt=0x12 <error: Cannot access memory at address 0x12>) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:892
#10 0xffff00000163abb8 in zfs_refcount_remove_many (rc=<optimized out>, number=4800, holder=<optimized out>) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/refcount.c:187
#11 0xffff0000015ac24c in dbuf_evict_user (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:580
#12 0xffff0000015a2338 in dbuf_clear_data (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:1232
#13 0xffff0000015a4844 in dbuf_destroy (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:3040
#14 0xffff0000015e2dbc in dnode_evict_dbufs (dn=0xffffa7ffcdd5c780) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dnode_sync.c:509
#15 0xffff0000015c03d4 in dmu_objset_evict_dbufs (os=<optimized out>) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dmu_objset.c:981
#16 0xffff000001550c2c in zfsvfs_teardown (zfsvfs=zfsvfs@entry=0xffffa7ffccf1c000, unmounting=<optimized out>, unmounting@entry=1) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1656
#17 0xffff00000154f24c in zfs_umount (vfsp=0xffff00050f76b280, fflag=524288) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1710
#18 0xffff000000595bd4 in dounmount (mp=0xffff00050f76b280, flags=<optimized out>, flags@entry=524288, td=<optimized out>) at /usr/home/markj/freebsd/sys/kern/vfs_mount.c:2340
#19 0xffff0000005a2520 in unmount_or_warn (mp=0xffff00050f76b280) at /usr/home/markj/freebsd/sys/kern/vfs_subr.c:4973
#20 vfs_unmountall () at /usr/home/markj/freebsd/sys/kern/vfs_subr.c:5007
#21 0xffff00000057343c in bufshutdown (show_busybufs=<optimized out>) at /usr/home/markj/freebsd/sys/kern/vfs_bio.c:1509
#22 0xffff0000004a853c in kern_reboot (howto=0) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:508
#23 0xffff0000004a8228 in sys_reboot (td=0xffff000202b48640, uap=0xffff000202b48a40) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:313
#24 0xffff000000864658 in syscallenter (td=0xffff000202b48640) at /usr/home/markj/freebsd/sys/arm64/arm64/../../kern/subr_syscall.c:189
#25 svc_handler (td=0xffff000202b48640, frame=<optimized out>) at /usr/home/markj/freebsd/sys/arm64/arm64/trap.c:198
#26 do_el0_sync (td=0xffff000202b48640, frame=<optimized out>) at /usr/home/markj/freebsd/sys/arm64/arm64/trap.c:648
#27 <signal handler called>

This is happening in a block of code in dbuf_evict_user() which was added recently, in commit 92dc4ad. I can't comment intelligently on this code, but maybe @robn can see what's going on here? I'm happy to provide more debugging info if needed.

Looking at the remaining references in ARC_anon.arcs_size (it's always anonymous buffers that are leaked), I can see it's two 128KB buffers. (Unsurprisingly, I could see 256KB of "Anon" ARC memory lingering on the idle system in top before I shut down.) The refcount tag is an arc_buf_t, both reference an ARC buffer header which looks like this:

(kgdb) p *$23->b_hdr                                                                                                                                           
$27 = {b_dva = {dva_word = {0, 0}}, b_birth = 0, b_type = ARC_BUFC_DATA, b_complevel = 0 '\000', b_reserved1 = 0 '\000', b_reserved2 = 0, b_hash_next = 0x0, b_flags = (ARC_FLAG_HAS_L1HDR | ARC_FLAG_COMPRESSED_ARC | ARC_FLAG_COMPRESS_1), b_psize = 256, b_lsize = 256, b_spa = 2989832975500785283, b_l2hdr = {
    b_dev = 0x0, b_daddr = 0, b_hits = 0, b_arcs_state = ARC_STATE_ANON, b_l2node = {list_next = 0x0, list_prev = 0x0}}, b_l1hdr = {b_state = 0xffff000001795900 <ARC_anon>, b_arc_node = {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0, b_mru_hits = 0, b_mru_ghost_hits = 0, b_mfu_hits = 0, b_mfu_ghost_hits = 0, 
    b_byteswap = 10 '\n', b_buf = 0xffffa815ebe62840, b_refcnt = {rc_count = 1, rc_mtx = {lock_object = {lo_name = 0xffff00000147e117 "rc->rc_mtx", lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rc_tree = {avl_root = 0xffffa82c9c51d0c0, avl_compar = 0xffff00000163a5b8 <zfs_refcount_compare>,     
        avl_offset = 0, avl_numnodes = 1}, rc_removed = {list_offset = 0, list_head = {list_next = 0xffffa82c017f86a8, list_prev = 0xffffa82c017f86a8}}, rc_removed_count = 0, rc_tracked = 1}, b_acb = 0x0, b_pabd = 0x0, b_freeze_cksum = 0x0, b_freeze_lock = {lock_object = {                                           
        lo_name = 0xffff00000143d604 "hdr->b_l1hdr.b_freeze_lock", lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}}, b_crypt_hdr = {b_rabd = 0x0, b_dsobj = 0, b_ot = DMU_OT_NONE, b_salt = "\000\000\000\000\000\000\000", b_iv = '\000' <repeats 11 times>, b_mac = '\000' <repeats 15 times>}}

I'm out of time to look at this for the moment, but perhaps this is enough for someone smarter than me to spot the problem?

robn · 2024-05-10T23:27:37Z

@markjdb thanks for this. Nothing jumping out at me, but when I did a code read a couple of days ago I didn't go through that codepath, so it gives me something else to go on. I'll start looking later today.

Is there a semi-reliable repro here? From the descriptions it looks like the inducing a bunch of ARC churn and then unloading the module is probably enough.

markjdb · 2024-05-11T00:03:15Z

@markjdb thanks for this. Nothing jumping out at me, but when I did a code read a couple of days ago I didn't go through that codepath, so it gives me something else to go on. I'll start looking later today.

Is there a semi-reliable repro here? From the descriptions it looks like the inducing a bunch of ARC churn and then unloading the module is probably enough.

I messaged you out-of-band with some info. If anyone else has steps to reproduce this reliably, please feel free to add them.

robn · 2024-05-11T07:55:07Z

Right, I think I understand the dbuf refcount bug, and I'll send a patch through for that soon (tonight I hope). I don't think it's related to the original issue; its a straight refcounting bug (actually, we're not even miscounting, just holding the refcount API a little bit wrong). It might be related at a distance, but I think not, because the memory leak reported is real, and is from the ARC, not the dbuf cache.

robn · 2024-05-11T09:17:44Z

#16191 should fix the dbuf refcounting thing. I doubt it'll do much for the original issue, but should at least let reference_tracking_enable=1 get a read on it.

markjdb · 2024-05-14T21:17:01Z

I tried modifying the refcount code to save a stack trace in the tracker structure at the point where the reference is acquired. If I dump the stacks, I see:

(kgdb) set $_i = 0
(kgdb) while ($_i < 10)
 >x/x $7.ref_st.pcs[$_i]
 >set $_i = $_i + 1
 >end
0xffff000001639a30 <zfs_refcount_add_many+152>: 0x42
0xffff00000158f76c <arc_get_data_impl+464>:     0x8b
0xffff00000157c258 <arc_buf_alloc_impl+980>:    0xd1
0xffff00000157adf0 <arc_alloc_buf+108>: 0x25
0xffff0000015a6954 <dbuf_noread+188>:   0x0c
0xffff0000015a6db8 <dmu_buf_will_fill+284>:     0xb8
0xffff0000015b9040 <dmu_write_uio_dnode+452>:   0x17
0xffff0000015b91e4 <dmu_write_uio_dbuf+84>:     0x26
0xffff0000016d744c <zfs_write+1328>:    0x51
0xffff000001558ce4 <zfs_freebsd_write+60>:      0x8e

Both 128KB buffers have the same stack.

Reading the relevant code, I see that commit 9b1677f introduced some new cases, and it was committed around the time that this bug was first noticed. Maybe @amotin has some idea?

amotin · 2024-05-15T00:59:42Z

I don't have any quick ideas, but at some point Brian noticed FreeBSD CI failures with supposedly some kernel panic after tests completion, possibly on module unload. With some hope it may be this assertion. If we could reproduce it on proper system with a console and find what it is and what test triggers it -- it could help.

markjdb · 2024-05-15T13:22:21Z

I don't have any quick ideas, but at some point Brian noticed FreeBSD CI failures with supposedly some kernel panic after tests completion, possibly on module unload. With some hope it may be this assertion. If we could reproduce it on proper system with a console and find what it is and what test triggers it -- it could help.

I have access to such a system. To reproduce it, I run make -j80 buildworld buildkernel. It consistently leaks 2x128KB arc_anon buffers each time, somewhere late in buildworld. If there is some other useful info I can collect, please let me know what to try.

amotin · 2024-05-15T14:03:49Z

I hoped for something more specific/synthetic, like a CI test. I'll think about it.

markjdb · 2024-05-20T20:49:06Z

@amotin Suppose we're in dmu_write_uio_dnode() -> dmu_buf_will_fill() -> dbuf_noread(), and the dbuf's initial state is UNCACHED. Then we allocate a buffer and change the state to FILL. Suppose the fill fails (e.g., because copying from userspace triggered a page fault and FreeBSD's VFS needs to return EFAULT to avoid a deadlock), so dmu_write_uio_dnode() calls dmu_buf_fill_done() to roll back, and the dbuf state is rolled back to UNCACHED (the else if (failed) case). How does the aforementioned ARC buffer get freed?

markjdb · 2024-05-20T22:55:57Z

I hoped for something more specific/synthetic, like a CI test. I'll think about it.

Here's a reproducer program. Give it a path to a non-existent file on a ZFS filesystem. If run in a loop, I see the number of anon ARC pages grow quickly. Something like while true; do ./test /tmp/foo; done.

#include <sys/mman.h>
#include <err.h>
#include <fcntl.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
        void *addr;
        ssize_t n;
        long ps;
        int fd;

        ps = 128 * 1024; /* has to be the record size? */

        (void)unlink(argv[1]);
        fd = open(argv[1], O_RDWR | O_CREAT, 0644);
        if (fd < 0)
                err(1, "open");
        if (ftruncate(fd, ps) < 0)
                err(1, "ftruncate");

        addr = mmap(NULL, ps, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
        if (addr == MAP_FAILED)
                err(1, "mmap");

        /*
         * Writing a freshly mapped anon page triggers a fault since the kernel
         * hasn't added it to the page tables yet.
         */
        n = write(fd, addr, ps);
        if (n < 0)
                err(1, "write");
}

amotin · 2024-05-21T02:25:51Z

@markjdb Thank you for the reproduction, it works. Addition of arc_buf_destroy(db->db_buf, db); into dmu_buf_fill_done() seems to fix the assertion, but I'll look deeper tomorrow to be sure.

In case of error dmu_buf_fill_done() returns the buffer back into DB_UNCACHED state. Since during transition from DB_UNCACHED into DB_FILL state dbuf_noread() allocates an ARC buffer, we must free it here, otherwise it will be leaked. Fixes: openzfs#15665 Closes: openzfs#15802 Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.

In case of error dmu_buf_fill_done() returns the buffer back into DB_UNCACHED state. Since during transition from DB_UNCACHED into DB_FILL state dbuf_noread() allocates an ARC buffer, we must free it here, otherwise it will be leaked. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15665 Closes #15802 Closes #16216

lundman · 2024-05-25T02:31:52Z

@behlendorf Worth pointing out the PR did fix the leak problem, but does not fix the title of the ticket. panic: VERIFY3(rc->rc_count == number) failed can still be triggered reliably.

behlendorf · 2024-05-25T02:38:02Z

Thanks I overlooked that. Reopening.

amotin · 2024-05-25T02:41:18Z

but does not fix the title of the ticket. panic: VERIFY3(rc->rc_count == number) failed can still be triggered reliably.

@lundman Are you talking about failmode=continue, or you have some other reproduction? If that is it, then I would call it a different issue.

lundman · 2024-05-25T02:47:23Z

For me, I can trigger the assert panic: VERIFY3(rc->rc_count == number) failed more easily if I set failmode=continue and lower timeout.

But it can be triggered with default ZFS timeout and zstd19. My guess is that IO just takes "too long" - for whatever reason, zstd19/failing-device etc, and the code that is supposed to restart IO does so incorrectly. Just a guess.

This ticket does say FreeBSD, but it is the same alert - I do not know if the FreeBSD bug had failmode=continue when they created this ticket.

The memory leak issue was guessed to be related to this ticket, but probably should have been a separate ticket.

amotin · 2024-05-25T03:02:49Z

This assertion IS about memory leaks. The fact that you see it in othet scenarios just means there are other leaks, probably unrelated to the one I fixed.

markjdb · 2024-05-25T03:04:08Z

This ticket does say FreeBSD, but it is the same alert - I do not know if the FreeBSD bug had failmode=continue when they created this ticket.

Probably not, given the number of people upstream who have reported the problem.

The memory leak issue was guessed to be related to this ticket, but probably should have been a separate ticket.

It wasn't guessed. I hit the problem quite a few times and finally debugged it. Based on the timing of the upstream bug report and the commit which introduced the leak (within a couple of weeks of each other), the fact that the folks reporting the problem track the FreeBSD development branch (where the bug would have first appeared), and the fact that this apparently wasn't observed on Linux (which avoids the code path in question because it prefaults data to be written), I'm quite sure that the original report is about the problem that is now fixed.

lundman · 2024-05-25T03:16:52Z

OK that sounds good. Let's go with that and close it.

In case of error dmu_buf_fill_done() returns the buffer back into DB_UNCACHED state. Since during transition from DB_UNCACHED into DB_FILL state dbuf_noread() allocates an ARC buffer, we must free it here, otherwise it will be leaked. Fixes: openzfs#15665 Closes: openzfs#15802 Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.

In case of error dmu_buf_fill_done() returns the buffer back into DB_UNCACHED state. Since during transition from DB_UNCACHED into DB_FILL state dbuf_noread() allocates an ARC buffer, we must free it here, otherwise it will be leaked. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15665 Closes openzfs#15802 Closes openzfs#16216

RJVB · 2024-07-22T11:29:26Z

OK that sounds good. Let's go with that and close it.

An issue originally reported for v2.1.14 got fixed in some version after 2.2.3? :)
Better late than never but probably not very helpful for the OP and others with reasons to remain on 2.1 - unless the fix is simple enough to backport?

markjdb · 2024-07-22T13:44:44Z

OK that sounds good. Let's go with that and close it.

An issue originally reported for v2.1.14 got fixed in some version after 2.2.3? :) Better late than never but probably not very helpful for the OP and others with reasons to remain on 2.1 - unless the fix is simple enough to backport?

I don't believe 2.1.14 is actually affected, looking at the code. In particular, dmu_buf_fill_done() is missing the third parameter, failed. The bug came in with commit 9b1677f.

RJVB · 2024-07-22T15:58:17Z

On Monday July 22 2024 06:45:06 Mark Johnston wrote: I don't believe 2.1.14 is actually affected, looking at the code. In particular, `dmu_buf_fill_done()` is missing the third parameter, `failed`. The bug came in with commit 9b1677f.

I did notice that the function in question is a lot simpler...

In case of error dmu_buf_fill_done() returns the buffer back into DB_UNCACHED state. Since during transition from DB_UNCACHED into DB_FILL state dbuf_noread() allocates an ARC buffer, we must free it here, otherwise it will be leaked. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15665 Closes openzfs#15802 Closes openzfs#16216 (cherry picked from commit 02c5aa9)

VoxSciurorum added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 22, 2024

robn mentioned this issue May 11, 2024

dbuf: separate refcount calls for dbuf and dbuf_user #16191

Merged

13 tasks

lundman mentioned this issue May 14, 2024

zfs_deadman_failmode = continue can panic in zfs_refcount_destroy_many openzfsonwindows/openzfs#378

Open

amotin mentioned this issue May 22, 2024

Destroy ARC buffer in case of fill error #16216

Merged

13 tasks

behlendorf closed this as completed in #16216 May 25, 2024

behlendorf reopened this May 25, 2024

lundman closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14) #15802

panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14) #15802

VoxSciurorum commented Jan 22, 2024

VoxSciurorum commented Apr 23, 2024

lundman commented May 9, 2024

markjdb commented May 10, 2024

robn commented May 10, 2024

markjdb commented May 11, 2024

robn commented May 11, 2024

robn commented May 11, 2024

markjdb commented May 14, 2024

amotin commented May 15, 2024

markjdb commented May 15, 2024

amotin commented May 15, 2024

markjdb commented May 20, 2024

markjdb commented May 20, 2024 •

edited

Loading

amotin commented May 21, 2024

lundman commented May 25, 2024

behlendorf commented May 25, 2024

amotin commented May 25, 2024

lundman commented May 25, 2024

amotin commented May 25, 2024

markjdb commented May 25, 2024

lundman commented May 25, 2024

RJVB commented Jul 22, 2024

markjdb commented Jul 22, 2024

RJVB commented Jul 22, 2024 via email

panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14) #15802

panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14) #15802

Comments

VoxSciurorum commented Jan 22, 2024

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

VoxSciurorum commented Apr 23, 2024

lundman commented May 9, 2024

markjdb commented May 10, 2024

robn commented May 10, 2024

markjdb commented May 11, 2024

robn commented May 11, 2024

robn commented May 11, 2024

markjdb commented May 14, 2024

amotin commented May 15, 2024

markjdb commented May 15, 2024

amotin commented May 15, 2024

markjdb commented May 20, 2024

markjdb commented May 20, 2024 • edited Loading

amotin commented May 21, 2024

lundman commented May 25, 2024

behlendorf commented May 25, 2024

amotin commented May 25, 2024

lundman commented May 25, 2024

amotin commented May 25, 2024

markjdb commented May 25, 2024

lundman commented May 25, 2024

RJVB commented Jul 22, 2024

markjdb commented Jul 22, 2024

RJVB commented Jul 22, 2024 via email

markjdb commented May 20, 2024 •

edited

Loading