Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-2.2.2 patchset #15602

Merged
merged 15 commits into from
Nov 30, 2023
Merged

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Patchset for 2.2.2. This release includes the fix for dirty dbuf corruption: #15526

Description

Include fix for data corruption. Full details in: #15526. Other fixes also included.

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

behlendorf and others added 8 commits November 28, 2023 09:03
This reverts commit bd7a02c which
can trigger an unlikely existing bio alignment issue on Linux.
This change is good, but the underlying issue it exposes needs to
be resolved before this can be re-applied.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#15533
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce3 Revert "Report holes when there are only metadata changes"
  ec4f9b8 Report holes when there are only metadata changes
  454365b Fix dirty check in dmu_offset_next()
  66aca24 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes openzfs#15571
Closes openzfs#15526
In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t.  Few other fields can be and are still encrypted.

This should fix panic on ZIL claim after crash when block cloning
is actively used.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#15543
Closes openzfs#15513
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15534
Closes openzfs#15550
So that zdb (and others!) can get at the BRT on-disk structures.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes openzfs#15541
Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes openzfs#15541
zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.

Below errors are seen:

~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory

~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory

We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes openzfs#15532
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes openzfs#15566
Closes openzfs#15526
asomers and others added 3 commits November 28, 2023 15:19
It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0.  So OpenZFS should be using
  VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1.  We can drop
  support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0.  OpenZFS change 9d08874
  assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b03
  assumed 13.0 or later.

Sponsored-by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#15551
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes openzfs#15576
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I've just found and fixed one more simple block cloning issue, that would be nice to include: #15603 .

@AllKind
Copy link
Contributor

AllKind commented Nov 29, 2023

@tonyhutter Two other things:
Would it be possible to add a note to the release notes, as proposed in #15575 ?

Regarding:
#15404 and #15586

I just verified myself, when using the tarball https://github.com/openzfs/zfs/releases/download/zfs-2.2.1/zfs-2.2.1.tar.gz, make native-deb-utils fails:

make native-deb-utils 
cp -r contrib/debian debian; chmod +x debian/rules;
cp contrib/debian/control debian/control; \
dpkg-buildpackage -b -rfakeroot -us -uc;
cp: cannot stat 'contrib/debian/control': No such file or directory
dpkg-buildpackage: error: cannot read debian/control: No such file or directory
make: *** [Makefile:14364: native-deb-utils] Error 255

Would be nice not to carry that problem into the next (now 3rd) release.

@KungFuJesus
Copy link

Github's mention tracking kind of makes this redundant but... probably worth including:
acb33ee

@amotin
Copy link
Member

amotin commented Nov 29, 2023

And we need this one: #15606 to fix e96675a build.

mmatuska and others added 4 commits November 29, 2023 13:08
Bug introduced in 213d682.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes openzfs#15606
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means on actual
replay we must take additional references, which would stay in BRT.

Without this blocks could be freed prematurely when either original
file or its clone are destroyed.  I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened.  With the patch I see expected stats.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15603
Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount.  If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.

This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.

External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes openzfs#15563
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
@AllKind
Copy link
Contributor

AllKind commented Nov 30, 2023

8adf2e3 - would be good to also include, I guess.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 30, 2023
@telsasoft
Copy link

Just curious if you'll also revert Disable block cloning by default here/now/later.

@tonyhutter tonyhutter merged commit 494aaae into openzfs:zfs-2.2-release Nov 30, 2023
24 of 26 checks passed
@FL140
Copy link

FL140 commented Nov 30, 2023

Just curious if you'll also revert Disable block cloning by default here/now/later.

I would be also interested if this would be safe to enable again, since from my understanding block cloning was not responsible for the data loss problems after all, but just increased the possibility for corruption. If I understand it correctly the real bug got fixed in #15571 (c7fadf2)
which is included in the commits for 2.2.2.
If I got that right it should be safe to enable block cloning again without problems, right?

@amotin
Copy link
Member

amotin commented Nov 30, 2023

@telsasoft @FL140 The idea is to re-enable block cloning in some of the following 2.2 releases. 2.2.2 includes number of block cloning fixes too, making it more robust. But at least #15617 I couldn't get in time, the issue @ixhamza found just yesterday. Though we are getting to more and more rare cases and could now benefit from another round of tests.

@FL140
Copy link

FL140 commented Dec 1, 2023

@amotin Great, I absolutely appreciate testing. I have no issue with having the feature disabled until this is rock solid. So since I upgraded the pools as recommended after the upgrade to Ubuntu 23.10 I have to ask, to make sure this is the correct approach now.

2.2.1 disabled the block cloning feature by just ignoring the setting all the time, correct? So we DON'T? need to set zfs_bclone_enabled=0 in addition. That way the feature will get activated when the hotfix get's removed in an upstream version.

In addition zfs_dmu_offset_next_sync=0 is not required any longer after upgrade to 2.2.2 as that one got fixed in the release.

I am sure about zfs_dmu_offset_next_sync, but can someone of the devs please confirm on zfs_bclone_enabled? THX

@robn
Copy link
Member

robn commented Dec 1, 2023

@FL140 yes, the 2.2.2 defaults are zfs_bclone_enabled=0 and zfs_dmu_offset_next_sync=1, and we believe those settings to be totally safe.

@FL140
Copy link

FL140 commented Dec 1, 2023

@FL140 yes, the 2.2.2 defaults are zfs_bclone_enabled=0 and zfs_dmu_offset_next_sync=1, and we believe those settings to be totally safe.

@robn Thank's for confirming, unfortunately I just ran in a really bad "... uncorrectable I/O failure and failure mode property for this pool is set to panic." after using the 2.2.2 branch from @tonyhutter with this patchset!

  • I build a Ubuntu package from that branch without applying additional settings or patches.
  • Then I resarted the computer, everything looked good.
  • Then I ran the script https://github.com/0x5c/zfs-bclonecheck to get an idea which files have been cloned after the feature got enabled.
  • That led to the error above and a crashed system.
  • Brave enough I restarted the computer and (very unexpectedly) the computer booted, and I am writing those lines from a running system where zpool status tells me errors: No known data errors.

Honestly I am really trying to keep being positive here after the last to weeks, but this is getting more frustrating by every day. Upgrading to a stable Distro should never end up in such a f**k*p, sorry but I am getting frustrated right now and need a stable systems again ASAP.

Trying to being rational I will investigate this further in ~9h to add as much info as possible, but I will have to dd the disk prior any further steps, this is getting to dangerous.

So right now I can't tell if this is caused by any of the patches, but my guess would be that due to the "block cloning" bug (which we now know has a different background) there is a problem with the cloned data!? which was produced since 2.2.0 and prior 2.2.2. But I can't tell if that is the case. If anyone can point me in the right direction I am happy to investigate. But I really need a working machines again within the next days.

@FL140
Copy link

FL140 commented Dec 1, 2023

I ran the script in #15586 (comment) to build the Ubuntu 23.10 packages.

@FL140
Copy link

FL140 commented Dec 1, 2023

Noted that 2.2.2 just got released while I ran into that, I will compile the packages again with the official 2.2.2, but I guess the result will be the same as the patchset is the same in the official 2.2.2 branch and in @tonyhutter 2.2.2 branch, or did I miss something?

@0x5c
Copy link

0x5c commented Dec 1, 2023

Then I ran the script https://github.com/0x5c/zfs-bclonecheck to get an idea which files have been cloned after the feature got enabled.

Did the script output anything? If so, did it get to print the list of paths? At least that could give an idea what zdb invocation in the script triggered it.

@FL140
Copy link

FL140 commented Dec 1, 2023

Then I ran the script https://github.com/0x5c/zfs-bclonecheck to get an idea which files have been cloned after the feature got enabled.

Did the script output anything? If so, did it get to print the list of paths? At least that could give an idea what zdb invocation in the script triggered it.

Thanks for the fast reply. No the script crashed with the above message very early. I made a screenshot but I think it will not be of much help here. The exact output (retyped from screenshot except pool name) was:

$ ./bclonecheck.sh xyzpool
loading concrete vdev 0, metaslab 107 of 108 ...
error: error: error: Pool 'xyzpool' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
Pool 'xyzpool' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.

@0x5c Please don't get me wrong, apart from the fact that the script triggered the panic, which is not your fault!, I am really happy that you wrote it, so I can check the pool to some degree, which already showed that there still is a problem even with 2.2.2 or the data produced between 2.2.0 and 2.2.2.

@0x5c
Copy link

0x5c commented Dec 1, 2023

loading concrete vdev 0, metaslab 107 of 108 ...

This output is not from the first zdb invocation. Since it is the only one touches the BRT (ie: block cloning), this seems to exclude any bug related to block cloning or the new zdb feature to dump the BRT.

The rest of the script is one long series of pipes which

  1. Dumps the entire list of blocks in the pool
  2. Uses standard utilities to filter only the blocks that match DVAs dumped from the BRT and extract only the dataset and object IDs
  3. Runs zfs_ids_to_path with the IDs through xargs
  4. Breaks the lines into two columns using column

With that output, the failure could be just before or during 1), or one of the zfs_ids_to_path invocations in 3). It's hard to tell since 4) doesn't stream and waits to have received the full input before making columns.

If you try again, do this change to the script

-... | xargs -n2 zfs_ids_to_path -v "$zpool" | column -ts: -N DATASET,PATH -l2
+... | xargs -n2 zfs_ids_to_path -v "$zpool"

EDIT: Just to make sure, also add cat "${tempdir}/dvas.txt" on line 46

@FL140
Copy link

FL140 commented Dec 1, 2023

@0x5c Thank's for the input, actually I am in the lucky position, that I am quite good with shell scripts, so I WILL investigate this for sure, but after a very long night I will fist dd the whole SSD prior any further steps. So expect feedback in the next days.

@robn
Copy link
Member

robn commented Dec 1, 2023

@FL140 its unclear to me if you are seeing errors on the released 2.2.2, please confirm? If so, is it only coming from zdb, or from the kernel on your imported pool?

@FL140
Copy link

FL140 commented Dec 1, 2023

@FL140 its unclear to me if you are seeing errors on the released 2.2.2, please confirm? If so, is it only coming from zdb, or from the kernel on your imported pool?

@robn While that happened with the tonyhutter branch of 2.2.2 this should be the same as the released 2.2.2. Which I build, installed just after the tonyhutter branch and I am running at the moment. The commits are exactly the same. As written I will investigate this in depth over the next days, but I want to dd the 1.7TB pool prior that. Then I will run the script again and report the results back. I will also disect the script and trace where it happens, if it happens again.

What I really don't get though is that I get the message that the pool is in a panic mode and after a reboot it is happy working as nothing ever happened and zpool status also doesn't report an error.

@robn
Copy link
Member

robn commented Dec 1, 2023

What I really don't get though is that I get the message that the pool is in a panic mode and after a reboot it is happy working as nothing ever happened and zpool status also doesn't report an error.

That's why I asked if you got this from zdb, or from the kernel?

@FL140
Copy link

FL140 commented Dec 1, 2023

What I really don't get though is that I get the message that the pool is in a panic mode and after a reboot it is happy working as nothing ever happened and zpool status also doesn't report an error.

That's why I asked if you got this from zdb, or from the kernel?

@robn I can't say where the message came from, BUT a quick grep of zfs, panic, core, zdb on the syslog doesn't produce any suspicious output, I also walked visually quickly through the syslog and didn't notice anything awkward at first glance.

So I would guess it is not coming from the kernel, but can't say anything related to zdb (zdb was found nowhere in the syslog) is there output produced somewhere else than the terminal?

But greping through the zfs source code with the error message should point us in the right direction anyways.

@FL140
Copy link

FL140 commented Dec 1, 2023

@robn dissecting 0x5c:zfs-bclonecheck.git/bclonecheck.sh:

zdb -TTT rpool > /home/junk/wrk/202311xx_zfs_bclone_bug/zdb_-TTT_rpool.txt 2>&1

runs without an error and produces the file content:

BRT: used 242M; saved 268M; ratio 2.10x
BRT: vdev 0: refcnt 2.96K; used 242M; saved 268M

DVA              REFCNT
0:95c74b6000     1
0:e367d98000     11
0:11290a8a000    8
0:16f11346000    1
0:55678b8000     1
0:12ea7f88000    1
0:19a2eea4000    1
0:135279a0000    1
0:19a2c178000    1
0:13526fe8000    1
0:13e20948000    1
0:135dca20000    1
0:1412ae80000    1
0:16f10ea6000    1
0:126ffe86000    1
0:13156380000    1
0:12e32d30000    1
0:16f1083e000    1
0:12e31d6a000    1
0:b574f5e000     1
0:19710aac000    1
0:19a1fe00000    1 
0:1816f10c000    1
0:135395b8000    1
0:1a32a184000    1
0:19a23486000    1 
0:12e328d6000    1
0:1357e174000    1
0:95c7336000     1
0:12bb6d7a000    1
0:13526ef0000    1
0:19a2d628000    1
0:19a1f4b0000    1
0:12554cca000    1
0:14068298000    1
0:19a2b69c000    1
0:1328b224000    1
0:12d1727e000    1
0:16f0f9f4000    1
0:1352f10a000    1
0:16f08174000    1
0:172ccbda000    1
0:19a2e1c0000    1
0:19a22ffe000    1
0:13c82cc4000    14
0:13526558000    1
0:136a6d3c000    3
0:135eec74000    1
0:16f10926000    1
0:19a1f6d6000    1
...

@0x5c
Copy link

0x5c commented Dec 1, 2023

runs without an error and produces the file content:

That's the only part we already knew works, it's everything else that isn't clear

@FL140
Copy link

FL140 commented Dec 1, 2023

runs without an error and produces the file content:

That's the only part we already knew works, it's everything else that isn't clear

@0x5c @robn Trying to be brave here, but running, crashing and rebooting takes time... I also ran zdb -bbb -vvv rpool and piped the output into a file. The command runs for a while an produces a lot of output the last lines are:

objset 2599 object 681556 level 0 offset 0x0 DVA[0]=<0:e5b6312000:4000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=2a00L/2a00P birth=3037L/3037P fill=1 cksum=000004b481acec6a:0019be5dce474a76:59995f394f4d41e5:b4a59628ffa2de73
objset 2599 object 681558 level 0 offset 0x0 DVA[0]=<0:e5b6316000:16000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=15200L/15200P birth=3037L/3037P fill=1 cksum=000023cb3739a922:056dd8676eb9838c:6fdb243beb3694a2:c9f579861a026a5f
objset 2599 object 681560 level 0 offset 0x0 DVA[0]=<0:dd9a03e000:8000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=7a00L/7a00P birth=3037L/3037P fill=1 cksum=00000e91e361d955:00e0247e264f3287:e95ef021a16b879c:b548a146581c7c1a
objset 2599 object 681562 level 0 offset 0x0 DVA[0]=<0:dd9a046000:14000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=12e00L/12e00P birth=3037L/3037P fill=1 cksum=00002010b9374856:044749cdc30778c4:1d32532cc28b4b9e:90f3ced384692d70
objset 2599 object 681564 level 0 offset 0x0 DVA[0]=<0:c8d7f3e000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=3200L/2000P birth=3037L/3037P fill=1 cksum=000001b2fc6fcf73:000a38b09c05f58d:1fc5dd9fe340c1b1:d19da9daf072e7b5
objset 2599 object 681566 level 0 offset 0x0 DVA[0]=<0:d069fd2000:c000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=d800L/c000P birth=3037L/3037P fill=1 cksum=00001669f83a735d:0232dd1493502c0f:723e8ba7ad602441:403bfa38ee32afde
objset 2599 object 0 level 0 offset 0x14ccc000 DVA[0]=<0:e5b7ece000:2000> DVA[1]=<0:4817ca0000:2000> [L0 DMU dnode] fletcher4 lz4 unencrypted LE contiguous unique double size=4000L/2000P birth=3037L/3037P fill=16 cksum=00000063de0f58e8:0002d884acc610f1:0a6973e77175480b:797dbe666cc2f7a2
objset 2599 object 681568 level 0 offset 0x0 DVA[0]=<0:e5b632c000:4000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=5200L/4000P birth=3037L/3037P fill=1 cksum=0000062a7f582e76:003c12084343a468:4ca8e764b6c23efb:b46a9922e0803acd
objset 2599 object 681570 level 0 offset 0x0 DVA[0]=<0:dd9a034000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=3600L/2000P birth=3037L/3037P fill=1 cksum=000002cff59dfb73:000eda27835750ca:2b07894ba7d25ea4:3016558a82029269
objset 2599 object 681572 level 0 offset 0x0 DVA[0]=<0:dd9a05a000:8000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=6a00L/6a00P birth=3037L/3037P fill=1 cksum=00000c91ceefa2c8:00a7fd51175d0128:cb0f19abbc77c062:990da7cbed68e69f
objset 2599 object 681574 level 0 offset 0x0 DVA[0]=<0:e5b6330000:6000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=7800L/6000P birth=3037L/3037P fill=1 cksum=00000a91bf829d7f:0089e2ab8c5afdc8:5c884fcd9ae81bb7:bbec7afdde761e9b
objset 2599 object 681576 level 0 offset 0x0 DVA[0]=<0:c8d7f40000:4000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=4400L/4000P birth=3037L/3037P fill=1 cksum=00000435125513bd:003122a98aeec693:2c129628b122d35e:1b70c74f66d30b4c
objset 2599 object 681578 level 0 offset 0x0 DVA[0]=<0:c8d7f44000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=3a00L/2000P birth=3037L/3037P fill=1 cksum=000002a8c0b7a71d:000d5d8a44e1df03:259df52eda052b5a:95e5799d2f115763
objset 2599 object 681580 level 0 offset 0x0 DVA[0]=<0:e5b6336000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=3200L/2000P birth=3037L/3037P fill=1 cksum=0000021059d2b65f:000c09f6e28ebe02:24c4347119c75ab3:f16e1db5e6b19ef6
objset 2599 object 681582 level 0 offset 0x0 DVA[0]=<0:dd9a062000:6000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=4600L/4600P birth=3037L/3037P fill=1 cksum=00000851b46907fe:00491ee9c1d4b5c5:a94a0bc07f738bb5:fe9d40a2890f89b3
objset 2599 object 681584 level 0 offset 0x0 DVA[0]=<0:dd9a068000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=3800L/2000P birth=3037L/3037P fill=1 cksum=00000269ab7edeb1:000cd46398c2786c:25372348cb655b47:df4acbb17f83db8f
objset 2599 object 681586 level 0 offset 0x0 DVA[0]=<0:d069fde000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=2600L/2000P birth=3037L/3037P fill=1 cksum=0000008c92beae36:00040e16fe3fb8f4:0f035d3cf3ae2a6a:27d6553641ca00f0
objset 2599 object 681588 level 0 offset 0x0 DVA[0]=<0:d069fe0000:2000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=4000L/2000P birth=3037L/3037P fill=1 cksum=000003b571189581:000f9eea523970f1:2a4511d60f7e494c:47930f96ac19e121
objset 2599 object 681590 level 0 offset 0x0 DVA[0]=<0:e5b6338000:c000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=ca00L/c000P birth=3037L/3037P fill=1 cksum=

Looking on that my first guess would be a bad checksum!? But then I don't know how often the output is flushed and if we are missing stuff at the end...

All of the above was produced with the release 2.2.2 branch and installed building Ubuntu packages.

FYI: The output file has 2746022 lines, so it doesn't crash fast.
I would love to upload the file here but it is 130MB compressed and I don't know if it would be helpful.

Any ideas on how to proceed from here?

@0x5c
Copy link

0x5c commented Dec 1, 2023

Does it panic the system, exits, or stalls after those last lines? If it doesn't trigger a panic proper, it's hard to tell without running the modified script if it is zdb -bbb -vvv itself crashing intermittently or zfs_ids_to_path choking on one of those objset/object ID pairs.

Also before you test the modified script, can you run zdb -TTT "$zpool" | grep -Po '^[0-9a-f]+:[0-9a-f]+(?=\s)' > <somepath>/dvas.txt in /bin/sh and check it only has the DVAs? We've already seen it won't crash, but I wonder if the DVAs are properly extracted.

@FL140
Copy link

FL140 commented Dec 1, 2023

Does it panic the system, exits, or stalls after those last lines? If it doesn't trigger a panic proper, it's hard to tell without running the modified script if it is zdb -bbb -vvv itself crashing intermittently or zfs_ids_to_path choking on one of those objset/object ID pairs.

@0x5c I didn't run the script as a whole I took the relevant commands one after the other and analyzed the output. So I can confirm that zdb -bbb -vvv itself leads to the pool panic and then a system stall which can only be handled by a reboot. The output of dvas.txt looks correct to me. So when running cat zdb_-bbb_-vvv_rpool.txt |grep 'level 0' | grep -wf dvas.txt | cut -d\ -f2,4 | xargs -n2 zfs_ids_to_path -v rpool| uniq -c | column -ts: -N COUNT,DATASET,PATH -l2 (Tweaked the output a bit to remove duplicated lines and get a count of them instead.) I get a list of effected files (86). so it is definitely zdb -bbb -vvv what triggers the zpool panic.

zfs_ids_to_path runs just fine.

Also before you test the modified script, can you run zdb -TTT "$zpool" | grep -Po '^[0-9a-f]+:[0-9a-f]+(?=\s)' > <somepath>/dvas.txt in /bin/sh and check it only has the DVAs? We've already seen it won't crash, but I wonder if the DVAs are properly extracted.

yes, they are as far as I can tell.

It boils down to the zdb -bbb -vvv command which encounters something that make it trigger the pool panic. So I guess it is zdb source code patching time, but I really have to make a break now. THX to anyone for the help so far in tracking this down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.