Shi Jin
2013-11-06 16:25:23 UTC
Hi there,
I have set up a RAID-1 between two PVs on two different hard disks in a
RHEL-6 environment.
I am having a problem restoring the broken mirror without rebooting the OS.
Here is how to reproduce my problem.
1. First of all, we have set up a raid-1 LV that looks like this:
[***@shi-rhel63 home]# lvs -a -o +seg_pe_ranges|grep home
lv_home vg_root rwi-aom- 1.50g
100.00 lv_home_rimage_0:0-47 lv_home_rimage_1:0-47
[lv_home_rimage_0] vg_root iwi-aor- 1.50g
/dev/sda2:192-239
[lv_home_rimage_1] vg_root iwi-aor- 1.50g
/dev/sdb2:34-81
[lv_home_rmeta_0] vg_root ewi-aor- 32.00m
/dev/sda2:177-177
[lv_home_rmeta_1] vg_root ewi-aor- 32.00m
/dev/sdb2:33-33
[***@shi-rhel63 home]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg_root lvm2 a-- 17.84g 224.00m
/dev/sdb2 vg_root lvm2 a-- 18.84g 1.22g
I can test the mirror by performing a dd write and watch the iostat on both
physical disks:
[***@shi-rhel63 home]# dd if=/dev/zero of=test bs=1M count=10000 &
[1] 23388
[***@shi-rhel63 home]# iostat -x 1 sda sdb -m
Linux 2.6.32-279.el6.x86_64 (shi-rhel63) 06/11/13 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.07 0.35 0.32 0.00 99.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.05 0.78 0.38 0.14 0.00 0.00 22.14
0.01 11.07 5.63 0.29
sdb 0.11 0.79 0.14 0.23 0.00 0.00 27.94
0.00 5.60 3.71 0.14
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 7.14 92.86 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5702.04 1.02 56.12 0.01 22.46 805.25
50.75 911.66 17.86 102.04
sdb 0.00 5173.47 0.00 71.43 0.00 20.42 585.43
1.21 16.97 3.44 24.59
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 8.08 91.92 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5131.31 0.00 52.53 0.00 19.71 768.50
51.39 894.83 19.23 101.01
sdb 0.00 5121.21 0.00 69.70 0.00 18.20 534.70
1.15 15.96 3.41 23.74
As you may see, both sda and sdb gets similar write IOs so the miror is in
fact working.
2. Now I am going to simulate a disk failure on sdb by removing the disk (I
use VMware so it is very easy). Of course, the mirror is now broken as show
below:
[***@shi-rhel63 home]# lvs -a -o +seg_pe_ranges|grep home
Couldn't find device with uuid pnsMYs-Ce4t-9KYR-3Zfs-GItC-k5SZ-VidQ30.
lv_home vg_root rwi-aom- 1.50g
100.00 lv_home_rimage_0:0-47 lv_home_rimage_1:0-47
[lv_home_rimage_0] vg_root iwi-aor- 1.50g
/dev/sda2:192-239
[lv_home_rimage_1] vg_root iwi-aor- 1.50g
unknown device:34-81
[lv_home_rmeta_0] vg_root ewi-aor- 32.00m
/dev/sda2:177-177
[lv_home_rmeta_1] vg_root ewi-aor- 32.00m
unknown device:33-33
[***@shi-rhel63 home]# pvs
Couldn't find device with uuid pnsMYs-Ce4t-9KYR-3Zfs-GItC-k5SZ-VidQ30.
PV VG Fmt Attr PSize PFree
/dev/sda2 vg_root lvm2 a-- 17.84g 224.00m
unknown device vg_root lvm2 a-m 18.84g 1.22g
I already have a slight problem here since the mirror status above still
shows 100% Copy% but I accept it as a minor presentation issue.
3. Now I put the moved disk back and I would like to have a way to
incrementally resync the difference from the point where the mirror is
broken. Note that the same disk now shows up as sdc
[***@shi-rhel63 home]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg_root lvm2 a-- 17.84g 224.00m
/dev/sdc2 vg_root lvm2 a-- 18.84g 1.22g
[***@shi-rhel63 home]# lvs -a -o +seg_pe_ranges|grep home
lv_home vg_root rwi-aom- 1.50g
100.00 lv_home_rimage_0:0-47 lv_home_rimage_1:0-47
[lv_home_rimage_0] vg_root iwi-aor- 1.50g
/dev/sda2:192-239
[lv_home_rimage_1] vg_root iwi-aor- 1.50g
/dev/sdc2:34-81
[lv_home_rmeta_0] vg_root ewi-aor- 32.00m
/dev/sda2:177-177
[lv_home_rmeta_1] vg_root ewi-aor- 32.00m
/dev/sdc2:33-33
So everything looks perfect but if I perform the same dd write test, here
is what I got:
[***@shi-rhel63 home]# iostat -x 1 sda sdc -m
Linux 2.6.32-279.el6.x86_64 (shi-rhel63) 06/11/13 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.07 0.36 0.42 0.00 99.10
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.05 7.70 0.39 0.21 0.00 0.03 109.24
0.11 174.10 6.77 0.40
sdc 0.02 0.00 0.01 0.00 0.00 0.00 8.30
0.00 3.07 2.85 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 5.15 94.85 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 6281.44 0.00 49.48 0.00 22.14 916.38
138.42 2932.19 20.83 103.09
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 5.10 94.90 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 6213.27 0.00 54.08 0.00 23.99 908.30
136.34 2812.40 18.87 102.04
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 6.06 93.94 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5141.41 0.00 50.51 0.00 22.23 901.36
133.47 2612.74 20.00 101.01
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
So it is clear that the newly added mirror is not being written at all.
What is really interesting is that if I reboot, it will work properly. But
is there a way not to reboot?
Thanks a lot,
Shi
PS. My OS info
[***@shi-rhel63 home]# uname -a
Linux shi-rhel63 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012
x86_64 x86_64 x86_64 GNU/Linux
[***@shi-rhel63 home]# lvm version
LVM version: 2.02.95(2)-RHEL6 (2012-05-16)
Library version: 1.02.74-RHEL6 (2012-05-16)
Driver version: 4.22.6
I have set up a RAID-1 between two PVs on two different hard disks in a
RHEL-6 environment.
I am having a problem restoring the broken mirror without rebooting the OS.
Here is how to reproduce my problem.
1. First of all, we have set up a raid-1 LV that looks like this:
[***@shi-rhel63 home]# lvs -a -o +seg_pe_ranges|grep home
lv_home vg_root rwi-aom- 1.50g
100.00 lv_home_rimage_0:0-47 lv_home_rimage_1:0-47
[lv_home_rimage_0] vg_root iwi-aor- 1.50g
/dev/sda2:192-239
[lv_home_rimage_1] vg_root iwi-aor- 1.50g
/dev/sdb2:34-81
[lv_home_rmeta_0] vg_root ewi-aor- 32.00m
/dev/sda2:177-177
[lv_home_rmeta_1] vg_root ewi-aor- 32.00m
/dev/sdb2:33-33
[***@shi-rhel63 home]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg_root lvm2 a-- 17.84g 224.00m
/dev/sdb2 vg_root lvm2 a-- 18.84g 1.22g
I can test the mirror by performing a dd write and watch the iostat on both
physical disks:
[***@shi-rhel63 home]# dd if=/dev/zero of=test bs=1M count=10000 &
[1] 23388
[***@shi-rhel63 home]# iostat -x 1 sda sdb -m
Linux 2.6.32-279.el6.x86_64 (shi-rhel63) 06/11/13 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.07 0.35 0.32 0.00 99.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.05 0.78 0.38 0.14 0.00 0.00 22.14
0.01 11.07 5.63 0.29
sdb 0.11 0.79 0.14 0.23 0.00 0.00 27.94
0.00 5.60 3.71 0.14
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 7.14 92.86 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5702.04 1.02 56.12 0.01 22.46 805.25
50.75 911.66 17.86 102.04
sdb 0.00 5173.47 0.00 71.43 0.00 20.42 585.43
1.21 16.97 3.44 24.59
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 8.08 91.92 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5131.31 0.00 52.53 0.00 19.71 768.50
51.39 894.83 19.23 101.01
sdb 0.00 5121.21 0.00 69.70 0.00 18.20 534.70
1.15 15.96 3.41 23.74
As you may see, both sda and sdb gets similar write IOs so the miror is in
fact working.
2. Now I am going to simulate a disk failure on sdb by removing the disk (I
use VMware so it is very easy). Of course, the mirror is now broken as show
below:
[***@shi-rhel63 home]# lvs -a -o +seg_pe_ranges|grep home
Couldn't find device with uuid pnsMYs-Ce4t-9KYR-3Zfs-GItC-k5SZ-VidQ30.
lv_home vg_root rwi-aom- 1.50g
100.00 lv_home_rimage_0:0-47 lv_home_rimage_1:0-47
[lv_home_rimage_0] vg_root iwi-aor- 1.50g
/dev/sda2:192-239
[lv_home_rimage_1] vg_root iwi-aor- 1.50g
unknown device:34-81
[lv_home_rmeta_0] vg_root ewi-aor- 32.00m
/dev/sda2:177-177
[lv_home_rmeta_1] vg_root ewi-aor- 32.00m
unknown device:33-33
[***@shi-rhel63 home]# pvs
Couldn't find device with uuid pnsMYs-Ce4t-9KYR-3Zfs-GItC-k5SZ-VidQ30.
PV VG Fmt Attr PSize PFree
/dev/sda2 vg_root lvm2 a-- 17.84g 224.00m
unknown device vg_root lvm2 a-m 18.84g 1.22g
I already have a slight problem here since the mirror status above still
shows 100% Copy% but I accept it as a minor presentation issue.
3. Now I put the moved disk back and I would like to have a way to
incrementally resync the difference from the point where the mirror is
broken. Note that the same disk now shows up as sdc
[***@shi-rhel63 home]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg_root lvm2 a-- 17.84g 224.00m
/dev/sdc2 vg_root lvm2 a-- 18.84g 1.22g
[***@shi-rhel63 home]# lvs -a -o +seg_pe_ranges|grep home
lv_home vg_root rwi-aom- 1.50g
100.00 lv_home_rimage_0:0-47 lv_home_rimage_1:0-47
[lv_home_rimage_0] vg_root iwi-aor- 1.50g
/dev/sda2:192-239
[lv_home_rimage_1] vg_root iwi-aor- 1.50g
/dev/sdc2:34-81
[lv_home_rmeta_0] vg_root ewi-aor- 32.00m
/dev/sda2:177-177
[lv_home_rmeta_1] vg_root ewi-aor- 32.00m
/dev/sdc2:33-33
So everything looks perfect but if I perform the same dd write test, here
is what I got:
[***@shi-rhel63 home]# iostat -x 1 sda sdc -m
Linux 2.6.32-279.el6.x86_64 (shi-rhel63) 06/11/13 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.07 0.36 0.42 0.00 99.10
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.05 7.70 0.39 0.21 0.00 0.03 109.24
0.11 174.10 6.77 0.40
sdc 0.02 0.00 0.01 0.00 0.00 0.00 8.30
0.00 3.07 2.85 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 5.15 94.85 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 6281.44 0.00 49.48 0.00 22.14 916.38
138.42 2932.19 20.83 103.09
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 5.10 94.90 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 6213.27 0.00 54.08 0.00 23.99 908.30
136.34 2812.40 18.87 102.04
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 6.06 93.94 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5141.41 0.00 50.51 0.00 22.23 901.36
133.47 2612.74 20.00 101.01
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
So it is clear that the newly added mirror is not being written at all.
What is really interesting is that if I reboot, it will work properly. But
is there a way not to reboot?
Thanks a lot,
Shi
PS. My OS info
[***@shi-rhel63 home]# uname -a
Linux shi-rhel63 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012
x86_64 x86_64 x86_64 GNU/Linux
[***@shi-rhel63 home]# lvm version
LVM version: 2.02.95(2)-RHEL6 (2012-05-16)
Library version: 1.02.74-RHEL6 (2012-05-16)
Driver version: 4.22.6