Skip to content

Commit

Permalink
net/mlx5: Fix missing lock on sync reset reload
Browse files Browse the repository at this point in the history
On sync reset reload work, when remote host updates devlink on reload
actions performed on that host, it misses taking devlink lock before
calling devlink_remote_reload_actions_performed() which results in
triggering lock assert like the following:

WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50
…
 CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S      W          6.10.0-rc2+ torvalds#116
 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015
 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core]
 RIP: 0010:devl_assert_locked+0x3e/0x50
…
 Call Trace:
  <TASK>
  ? __warn+0xa4/0x210
  ? devl_assert_locked+0x3e/0x50
  ? report_bug+0x160/0x280
  ? handle_bug+0x3f/0x80
  ? exc_invalid_op+0x17/0x40
  ? asm_exc_invalid_op+0x1a/0x20
  ? devl_assert_locked+0x3e/0x50
  devlink_notify+0x88/0x2b0
  ? mlx5_attach_device+0x20c/0x230 [mlx5_core]
  ? __pfx_devlink_notify+0x10/0x10
  ? process_one_work+0x4b6/0xbb0
  process_one_work+0x4b6/0xbb0
[…]

Fixes: 84a433a ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: NipaLocal <nipa@local>
  • Loading branch information
mosheshemesh2 authored and NipaLocal committed Jul 31, 2024
1 parent 1f9cf00 commit 8fef540
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev)
static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unloaded)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
struct devlink *devlink = priv_to_devlink(dev);

/* if this is the driver that initiated the fw reset, devlink completed the reload */
if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) {
Expand All @@ -218,9 +219,11 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unload
mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
else
mlx5_load_one(dev, true);
devlink_remote_reload_actions_performed(priv_to_devlink(dev), 0,
devl_lock(devlink);
devlink_remote_reload_actions_performed(devlink, 0,
BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE));
devl_unlock(devlink);
}
}

Expand Down

0 comments on commit 8fef540

Please sign in to comment.