Skip to content

Commit

Permalink
btrfs: zoned: handle broken write pointer on zones
Browse files Browse the repository at this point in the history
Btrfs rejects to mount a FS if it finds a block group with a broken write
pointer (e.g, unequal write pointers on two zones of RAID1 block group).
Since such case can happen easily with a power-loss or crash of a system,
we need to handle the case more gently.

Handle such block group by making it unallocatable, so that there will be
no writes into it. That can be done by setting the allocation pointer at
the end of allocating region (= block_group->zone_capacity). Then, existing
code handle zone_unusable properly.

Having proper zone_capacity is necessary for the change. So, set it as fast
as possible.

We cannot handle RAID0 and RAID10 case like this. But, they are anyway
unable to read because of a missing stripe.

Fixes: 265f723 ("btrfs: zoned: allow DUP on meta-data block groups")
Fixes: 568220f ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # 6.1+
Reported-by: HAN Yuwei <hrx@bupt.moe>
Cc: Xuefer <xuefer@gmail.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
  • Loading branch information
naota authored and kdave committed Sep 2, 2024
1 parent c346c62 commit b1934cd
Showing 1 changed file with 25 additions and 5 deletions.
30 changes: 25 additions & 5 deletions fs/btrfs/zoned.c
Original file line number Diff line number Diff line change
Expand Up @@ -1406,6 +1406,8 @@ static int btrfs_load_block_group_dup(struct btrfs_block_group *bg,
return -EINVAL;
}

bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);

if (zone_info[0].alloc_offset == WP_MISSING_DEV) {
btrfs_err(bg->fs_info,
"zoned: cannot recover write pointer for zone %llu",
Expand All @@ -1432,7 +1434,6 @@ static int btrfs_load_block_group_dup(struct btrfs_block_group *bg,
}

bg->alloc_offset = zone_info[0].alloc_offset;
bg->zone_capacity = min(zone_info[0].capacity, zone_info[1].capacity);
return 0;
}

Expand All @@ -1450,6 +1451,9 @@ static int btrfs_load_block_group_raid1(struct btrfs_block_group *bg,
return -EINVAL;
}

/* In case a device is missing we have a cap of 0, so don't use it. */
bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity);

for (i = 0; i < map->num_stripes; i++) {
if (zone_info[i].alloc_offset == WP_MISSING_DEV ||
zone_info[i].alloc_offset == WP_CONVENTIONAL)
Expand All @@ -1471,9 +1475,6 @@ static int btrfs_load_block_group_raid1(struct btrfs_block_group *bg,
if (test_bit(0, active))
set_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &bg->runtime_flags);
}
/* In case a device is missing we have a cap of 0, so don't use it. */
bg->zone_capacity = min_not_zero(zone_info[0].capacity,
zone_info[1].capacity);
}

if (zone_info[0].alloc_offset != WP_MISSING_DEV)
Expand Down Expand Up @@ -1563,6 +1564,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
unsigned long *active = NULL;
u64 last_alloc = 0;
u32 num_sequential = 0, num_conventional = 0;
u64 profile;

if (!btrfs_is_zoned(fs_info))
return 0;
Expand Down Expand Up @@ -1623,7 +1625,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
}
}

switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
switch (profile) {
case 0: /* single */
ret = btrfs_load_block_group_single(cache, &zone_info[0], active);
break;
Expand All @@ -1650,6 +1653,23 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
goto out;
}

if (ret == -EIO && profile != 0 && profile != BTRFS_BLOCK_GROUP_RAID0 &&
profile != BTRFS_BLOCK_GROUP_RAID10) {
/*
* Detected broken write pointer. Make this block group
* unallocatable by setting the allocation pointer at the end of
* allocatable region. Relocating this block group will fix the
* mismatch.
*
* Currently, we cannot handle RAID0 or RAID10 case like this
* because we don't have a proper zone_capacity value. But,
* reading from this block group won't work anyway by a missing
* stripe.
*/
cache->alloc_offset = cache->zone_capacity;
ret = 0;
}

out:
/* Reject non SINGLE data profiles without RST */
if ((map->type & BTRFS_BLOCK_GROUP_DATA) &&
Expand Down

0 comments on commit b1934cd

Please sign in to comment.