Commit 24f8b57
committed
md/md-cluster: handle REMOVE message earlier
JIRA: https://issues.redhat.com/browse/RHEL-94433
commit 948b1fe
Author: Heming Zhao <heming.zhao@suse.com>
Date: Mon Jul 28 12:21:40 2025 +0800
md/md-cluster: handle REMOVE message earlier
Commit a1fd37f ("md: Don't wait for MD_RECOVERY_NEEDED for
HOT_REMOVE_DISK ioctl") introduced a regression in the md_cluster
module. (Failed cases 02r1_Manage_re-add & 02r10_Manage_re-add)
Consider a 2-node cluster:
- node1 set faulty & remove command on a disk.
- node2 must correctly update the array metadata.
Before a1fd37f, on node1, the delay between msg:METADATA_UPDATED
(triggered by faulty) and msg:REMOVE was sufficient for node2 to
reload the disk info (written by node1).
After a1fd37f, node1 no longer waits between faulty and remove,
causing it to send msg:REMOVE while node2 is still reloading disk info.
This often results in node2 failing to remove the faulty disk.
== how to trigger ==
set up a 2-node cluster (node1 & node2) with disks vdc & vdd.
on node1:
mdadm -CR /dev/md0 -l1 -b clustered -n2 /dev/vdc /dev/vdd --assume-clean
ssh node2-ip mdadm -A /dev/md0 /dev/vdc /dev/vdd
mdadm --manage /dev/md0 --fail /dev/vdc --remove /dev/vdc
check array status on both nodes with "mdadm -D /dev/md0".
node1 output:
Number Major Minor RaidDevice State
- 0 0 0 removed
1 254 48 1 active sync /dev/vdd
node2 output:
Number Major Minor RaidDevice State
- 0 0 0 removed
1 254 48 1 active sync /dev/vdd
0 254 32 - faulty /dev/vdc
Fixes: a1fd37f ("md: Don't wait for MD_RECOVERY_NEEDED for HOT_REMOVE_DISK ioctl")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Link: https://lore.kernel.org/linux-raid/20250728042145.9989-1-heming.zhao@suse.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>1 parent 46b1d8b commit 24f8b57
1 file changed
+6
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9803 | 9803 | | |
9804 | 9804 | | |
9805 | 9805 | | |
9806 | | - | |
9807 | | - | |
| 9806 | + | |
| 9807 | + | |
9808 | 9808 | | |
9809 | 9809 | | |
9810 | 9810 | | |
| |||
10110 | 10110 | | |
10111 | 10111 | | |
10112 | 10112 | | |
10113 | | - | |
| 10113 | + | |
| 10114 | + | |
| 10115 | + | |
10114 | 10116 | | |
| 10117 | + | |
10115 | 10118 | | |
10116 | 10119 | | |
10117 | 10120 | | |
| |||
0 commit comments