[linux-lvm] LVM RAID5 out-of-sync recovery

Before anything else, I would have suggested backing up the image and
meta sub LVs, but it looks like you are just testing.

Clear down any odd state with dmsetup remove /dev/vg/... and then run:

vgextend --restoremissing

Actually, always run LVM commands with -v -t before really running them.

Post by Slava Prisivko
In order to mitigate cross-posting, here's the original question on
Serverfault.SE: LVM RAID5 out-of-sync recovery, but feel free to answer
wherever you deem appropriate.
How can one recover from an LVM RAID5 out-of-sync?

Post by Slava Prisivko
I have an LVM RAID5 configuration (RAID5 using the LVM tools).
However, because of a technical problem mirrors went out of sync. You can

Playing with my Jessie VM, I disconnected (virtually) one disk. That
worked, the machine stayed running. lvs, though, gave no indication the
arrays were degraded.

You should have noticed something in the kernel logs. Also, lvs should
have reported that the array was now (p)artial.

I re-attached the disk, and removed a second. Stayed
running (this is raid6). Re-attached, still no indication from lvs. I ran
lvconvert --repair on the volume, it told me it was OK. Then I pulled a
third disk... and the machine died. Re-inserted it, rebooted, and am now
unsure how to fix.

So this is RAID6 rather than RAID5?
And you killed 3 disks in a RAID 6 array?

Post by Slava Prisivko
If I had been using mdadm, I could have probably recovered the data using
`mdadm --force --assemble`, but I was not able to achieve the same using the
LVM tools.

LVM is very different. :-(

Post by Slava Prisivko
I have tried to concatenate rmeta and rimage for each mirror and put them on
three linear devices in order to feed them to the mdadm (because LVM
leverages MD), but without success (`mdadm --examine` does not recognize the
superblock), because it appears that the mdadm superblock format differs
from the dm_raid superblock format (search for the "dm_raid_superblock").

Post by Slava Prisivko
I tried to understand how device-mapper RAID leverages MD, but was unable to
find any documentation while the kernel code is quite complicated.
I also tried to rebuild the mirror directly by using `dmsetup`, but it can't
rebuild if metadata is out of sync.
Overall, almost the only useful information I could find is RAIDing with LVM
vs MDRAID - pros and cons? question on Unix & Linux SE.

Well, I would read through this as well (versions 6 and 7 also available):
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Logical_Volume_Manager_Administration/index.html

Post by Slava Prisivko
The output of various commands is provided below.
# lvs -a -o +devices
test vg rwi---r--- 64.00m
test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
[test_rimage_0] vg Iwi-a-r-r- 32.00m /dev/sdc2(1)
[test_rimage_1] vg Iwi-a-r-r- 32.00m /dev/sda2(238244)
[test_rimage_2] vg Iwi-a-r-r- 32.00m /dev/sdb2(148612)
[test_rmeta_0] vg ewi-a-r-r- 4.00m /dev/sdc2(0)
[test_rmeta_1] vg ewi-a-r-r- 4.00m /dev/sda2(238243)
[test_rmeta_2] vg ewi-a-r-r- 4.00m /dev/sdb2(148611)
# lvchange -ay vg/test -v
Activating logical volume "test" exclusively.
activation/volume_list configuration setting not defined: Checking
only host tags for vg/test.
Loading vg-test_rmeta_0 table (253:35)
Suppressed vg-test_rmeta_0 (253:35) identical table reload.
Loading vg-test_rimage_0 table (253:36)
Suppressed vg-test_rimage_0 (253:36) identical table reload.
Loading vg-test_rmeta_1 table (253:37)
Suppressed vg-test_rmeta_1 (253:37) identical table reload.
Loading vg-test_rimage_1 table (253:38)
Suppressed vg-test_rimage_1 (253:38) identical table reload.
Loading vg-test_rmeta_2 table (253:39)
Suppressed vg-test_rmeta_2 (253:39) identical table reload.
Loading vg-test_rimage_2 table (253:40)
Suppressed vg-test_rimage_2 (253:40) identical table reload.
Creating vg-test
Loading vg-test table (253:87)
device-mapper: reload ioctl on (253:87) failed: Invalid argument
Removing vg-test (253:87)
device-mapper: table: 253:87: raid: Cannot change device positions in
RAID array
device-mapper: ioctl: error adding target to table

That's a new error message to me. I would try clearing out the dm
table (dmsetup remove /dev/vg/test_*) before trying again (-v -t,
first).

Post by Slava Prisivko
# lvconvert --repair vg/test
vg/test must be active to perform this operation.

And it requires new PVs ("replacement drives") to put the subLVs on.
It's probably not what you want.

Post by Slava Prisivko
# lvm version
LVM version: 2.02.145(2) (2016-03-04)
Library version: 1.02.119 (2016-03-04)
Driver version: 4.34.0

I would update LVM to whatever is in Debian testing as there has been
a fair bit of change this year.

Post by Slava Prisivko
Linux server 4.4.8-hardened-r1-1 #1 SMP

More useful would be the contents of /etc/lvm/backup/vg and the output
of vgs and pvs.

Slava Prisivko

2016-10-04 22:14:33 UTC

Thanks!

On Tue, Oct 4, 2016 at 12:49 PM Giuliano Procida <***@gmail.com>
wrote:

Before anything else, I would have suggested backing up the image and
meta sub LVs, but it looks like you are just testing.

Already did. I'm not testing, I just renamed the LVs to "test_*" because
the previous name doesn't matter.

There is nothing particularly important there, but I would like to
understand whether I would be able to recover should something alike happen
in the future.

Clear down any odd state with dmsetup remove /dev/vg/... and then run:

vgextend --restoremissing

I didn't have to, because all the PVs are present:

# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg lvm2 a-- 1.82t 1.10t
/dev/sdb2 vg lvm2 a-- 3.64t 1.42t
/dev/sdc2 vg lvm2 a-- 931.51g 195.18g

Actually, always run LVM commands with -v -t before really running them.

Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
for using -t. Am I wrong?

I suppose it's supposed to recover mostly automatically.
*If* your array is assembled (or whatever the LVM-equivalent
termiology is) then you can force a given subset of PVs to be
resynced.
http://man7.org/linux/man-pages/man8/lvchange.8.html - look for rebuild
However, this does not seem to be your problem.

Yeah, I tried, but in vain:
# lvchange --rebuild /dev/sda2 vg/test -v
Archiving volume group "vg" metadata (seqno 518).
Do you really want to rebuild 1 PVs of logical volume vg/test [y/n]: y
Accepted input: [y]
vg/test must be active to perform this operation.

Post by Slava Prisivko
I have an LVM RAID5 configuration (RAID5 using the LVM tools).
However, because of a technical problem mirrors went out of sync. You can

Playing with my Jessie VM, I disconnected (virtually) one disk. That
worked, the machine stayed running. lvs, though, gave no indication the
arrays were degraded.

You should have noticed something in the kernel logs. Also, lvs should
have reported that the array was now (p)artial.

Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
later), so when I switched the computer on for the first time, /dev/sda was
missing (in the current device allocation). I switched off the computer,
swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
consequences) and switched it on. This time the /dev/sdb was missing. I
replaced the faulty cable with a new one and switched the machine back on.
This time sda, sdb and sdc were all present, but the RAID went out-of-sync.

I'm pretty sure there were very few (if any) writing operations during the
degraded operating mode, so the I could recover by rebuilding the old
mirror (sda) using the more recent ones (sdb and sdc).

So this is RAID6 rather than RAID5?
And you killed 3 disks in a RAID 6 array?

Although I have RAID5, not the RAID6, but the principle is the same (as I
explained in the previous paragraph).

Post by Slava Prisivko
If I had been using mdadm, I could have probably recovered the data using
`mdadm --force --assemble`, but I was not able to achieve the same using the
LVM tools.

LVM is very different. :-(

Not only that, but (as far as I can tell), LVM RAID 6 parity (well,
syndrome) is calculated in a different manner to the older mdadm RAID;
it uses an industry-standard layout instead of the (more obvious?) md
layout.
I wrote a utility to parity-check the default LVM RAID6 layout with
the usual stripe size (easily adjusted) here:
https://drive.google.com/open?id=0B8dHrWSoVcaDbkY3WmkxSmpfSVE

You can use this to see to what degree the data in the image LVs are
in fact in/out of sync. I've not attempted to add sync functionality
to this.

Thanks, I used your raid5_parity_check.cc utility with the default stripe
size (64 * 1024), but it actually doesn't matter since you're just
calculating the total xor and the stripe size acts as a buffer size for
that.

I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would like
to try to reconstruct the test_rimage_1 using h the other two. Just in
case, here are the bad stripe numbers: 16, 48, 49.

/dev/sda2(238244)

Post by Slava Prisivko
[test_rimage_2] vg Iwi-a-r-r- 32.00m

/dev/sdb2(148612)

Post by Slava Prisivko
[test_rmeta_0] vg ewi-a-r-r- 4.00m /dev/sdc2(0)
[test_rmeta_1] vg ewi-a-r-r- 4.00m

/dev/sda2(238243)

Post by Slava Prisivko
[test_rmeta_2] vg ewi-a-r-r- 4.00m

/dev/sdb2(148611)

Post by Slava Prisivko
# lvchange -ay vg/test -v
Activating logical volume "test" exclusively.
activation/volume_list configuration setting not defined: Checking
only host tags for vg/test.
Loading vg-test_rmeta_0 table (253:35)
Suppressed vg-test_rmeta_0 (253:35) identical table reload.
Loading vg-test_rimage_0 table (253:36)
Suppressed vg-test_rimage_0 (253:36) identical table reload.
Loading vg-test_rmeta_1 table (253:37)
Suppressed vg-test_rmeta_1 (253:37) identical table reload.
Loading vg-test_rimage_1 table (253:38)
Suppressed vg-test_rimage_1 (253:38) identical table reload.
Loading vg-test_rmeta_2 table (253:39)
Suppressed vg-test_rmeta_2 (253:39) identical table reload.
Loading vg-test_rimage_2 table (253:40)
Suppressed vg-test_rimage_2 (253:40) identical table reload.
Creating vg-test
Loading vg-test table (253:87)
device-mapper: reload ioctl on (253:87) failed: Invalid argument
Removing vg-test (253:87)
device-mapper: table: 253:87: raid: Cannot change device positions in
RAID array
device-mapper: ioctl: error adding target to table

That's a new error message to me. I would try clearing out the dm
table (dmsetup remove /dev/vg/test_*) before trying again (-v -t,
first).

After cleaning the dmsetup table of test_* and trying to lvchange -ay I get
practically the same:
# lvchange -ay vg/test -v
Activating logical volume vg/test exclusively.
activation/volume_list configuration setting not defined: Checking only
host tags for vg/test.
Creating vg-test_rmeta_0
Loading vg-test_rmeta_0 table (253:35)
Resuming vg-test_rmeta_0 (253:35)
Creating vg-test_rimage_0
Loading vg-test_rimage_0 table (253:36)
Resuming vg-test_rimage_0 (253:36)
Creating vg-test_rmeta_1
Loading vg-test_rmeta_1 table (253:37)
Resuming vg-test_rmeta_1 (253:37)
Creating vg-test_rimage_1
Loading vg-test_rimage_1 table (253:38)
Resuming vg-test_rimage_1 (253:38)
Creating vg-test_rmeta_2
Loading vg-test_rmeta_2 table (253:39)
Resuming vg-test_rmeta_2 (253:39)
Creating vg-test_rimage_2
Loading vg-test_rimage_2 table (253:40)
Resuming vg-test_rimage_2 (253:40)
Creating vg-test
Loading vg-test table (253:87)
device-mapper: reload ioctl on (253:87) failed: Invalid argument
Removing vg-test (253:87)

device-mapper: table: 253:87: raid: Cannot change device positions in RAID
array
device-mapper: ioctl: error adding target to table

Post by Slava Prisivko
# lvconvert --repair vg/test
vg/test must be active to perform this operation.

And it requires new PVs ("replacement drives") to put the subLVs on.
It's probably not what you want.

Post by Slava Prisivko
# lvm version
LVM version: 2.02.145(2) (2016-03-04)
Library version: 1.02.119 (2016-03-04)
Driver version: 4.34.0

I would update LVM to whatever is in Debian testing as there has been
a fair bit of change this year.

I've updated to the 2.02.166 (the latest version):

# lvm version
LVM version: 2.02.166(2) (2016-09-26)
Library version: 1.02.135 (2016-09-26)
Driver version: 4.34.0

Post by Slava Prisivko
Linux server 4.4.8-hardened-r1-1 #1 SMP

More useful would be the contents of /etc/lvm/backup/vg and the output
of vgs and pvs.

# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg lvm2 a-- 1.82t 1.10t
/dev/sdb2 vg lvm2 a-- 3.64t 1.42t
/dev/sdc2 vg lvm2 a-- 931.51g 195.18g

# vgs
VG #PV #LV #SN Attr VSize VFree
vg 3 18 0 wz--n- 6.37t 2.71t

Here is the relevant /etc/lvm/archive (archive is more recent that
backup) content:
test {

id = "JjiPmi-esfx-vdeF-5zMv-TsJC-6vFf-qNgNnZ" status = ["READ", "WRITE",
"VISIBLE"] flags = [] creation_time = 18446744073709551615 # 1970-01-01
02:59:59 +0300 creation_host = "server" segment_count = 1 segment1 {
start_extent = 0 extent_count = 16 # 64 Megabytes type = "raid5"
device_count = 3 stripe_size = 128 region_size = 1024 raids = [
"test_rmeta_0", "test_rimage_0", "test_rmeta_1", "test_rimage_1",
"test_rmeta_2", "test_rimage_2" ] } }

test_rmeta_0 {
id = "WE3CUg-ayo8-lp1Y-9S2v-zRGi-mV1s-DWYoST"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
creation_time = 18446744073709551615 # 1970-01-01 02:59:59 +0300
creation_host = "server"
segment_count = 1

segment1 {
start_extent = 0
extent_count = 1 # 4 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 0
]
}
}

test_rmeta_1 {
id = "Apk3mc-zy4q-c05I-hiIO-1Kae-9yB6-Cl5lfJ"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
creation_time = 18446744073709551615 # 1970-01-01 02:59:59 +0300
creation_host = "server"
segment_count = 1

segment1 {
start_extent = 0
extent_count = 1 # 4 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv1", 238243
]
}
}

test_rmeta_2 {
id = "j2Waf3-A77y-pvfd-foGK-Hq7B-rHe8-YKzQY0"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
creation_time = 18446744073709551615 # 1970-01-01 02:59:59 +0300
creation_host = "server"
segment_count = 1

segment1 {
start_extent = 0
extent_count = 1 # 4 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv2", 148611
]
}
}

test_rimage_0 {
id = "zaGgJx-YSIl-o2oq-UN9l-02Q8-IS5u-sz4RhQ"
status = ["READ", "WRITE"]
flags = []
creation_time = 18446744073709551615 # 1970-01-01 02:59:59 +0300
creation_host = "server"
segment_count = 1

segment1 {
start_extent = 0
extent_count = 8 # 32 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 1
]
}
}

test_rimage_1 {
id = "0mD5AL-GKj3-siFz-xQmO-ZtQo-L3MM-Ro2SG2"
status = ["READ", "WRITE"]
flags = []
creation_time = 18446744073709551615 # 1970-01-01 02:59:59 +0300
creation_host = "server"
segment_count = 1

segment1 {
start_extent = 0
extent_count = 8 # 32 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv1", 238244
]
}
}

test_rimage_2 {
id = "4FxiHV-j637-ENml-Okm3-uL1p-fuZ0-y9dE8Y"
status = ["READ", "WRITE"]
flags = []
creation_time = 18446744073709551615 # 1970-01-01 02:59:59 +0300
creation_host = "server"
segment_count = 1

segment1 {
start_extent = 0
extent_count = 8 # 32 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv2", 148612
]
}
}

--
Best regards,
Slava Prisivko.

Giuliano Procida

2016-10-05 12:48:44 UTC

This post might be inappropriate. Click to display it.

Giuliano Procida

2016-10-07 06:43:11 UTC

Slava, the main problem I had was that LVM forbade many operations
while I had a PV missing.

In your case, apparently, all PVs are present. So I suggest the following:

1. examine the recent history in /etc/lvm/archive
2. diff each transition and see if you can understand what has
happened at each stage
3. vgcfgrestore the most recent version that you think will allow you
to activate your array, you can work backwards incrementally
4. check kernel logs!
5. scrub (resync) the array if needed

Hope this helps,
Giuliano.

Slava Prisivko

2016-10-09 19:00:46 UTC

Hi!

Post by Giuliano Procida
Slava, the main problem I had was that LVM forbade many operations
while I had a PV missing.
1. examine the recent history in /etc/lvm/archive
2. diff each transition and see if you can understand what has
happened at each stage
3. vgcfgrestore the most recent version that you think will allow you
to activate your array, you can work backwards incrementally
4. check kernel logs!
5. scrub (resync) the array if needed

Since I cannot resync manually using your code and there are not MISSING
flags in /etc/lvm/archive and the different event count for the three
rmetas it seems it would not be helpful, would it?

I diffed the status quo of /etc/lvm/archive with the state before the
troubles and there is no significant difference and there is no difference
in IDs and no MISSING flags.

Post by Giuliano Procida
Hope this helps,
Giuliano.
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Slava Prisivko

2016-10-09 19:00:44 UTC

Hi!

Post by Giuliano Procida
vgextend --restoremissing

# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 vg lvm2 a-- 1.82t 1.10t
/dev/sdb2 vg lvm2 a-- 3.64t 1.42t
/dev/sdc2 vg lvm2 a-- 931.51g 195.18g

Double-check in the metadata for MISSING. This is what I was hoping
might be in your /etc/lvm/backup file.

Post by Giuliano Procida
Actually, always run LVM commands with -v -t before really running them.

Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
for using -t. Am I wrong?

Well, some nasty surprises may be avoidable (particularly if also using -f).

Post by Slava Prisivko
Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
later), so when I switched the computer on for the first time, /dev/sda

was

Post by Slava Prisivko
missing (in the current device allocation). I switched off the computer,
swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
consequences) and switched it on. This time the /dev/sdb was missing. I
replaced the faulty cable with a new one and switched the machine back

on.

Post by Slava Prisivko
This time sda, sdb and sdc were all present, but the RAID went

out-of-sync.
In swapping the cables, you may have changed the sd{a,b,c} enumeration
but this will have no impact on the UUIDs that LVM uses to identify
the PVs.

That's right, but the images went out-of-sync because during the first boot
only sdb and sdc were present (so the content of sda should have been
implied), during the second boot only sda and sdc were present (so the
content of sdb should have been implied), but when I replaced the cable
there was a conflict between these three.

Post by Slava Prisivko
I'm pretty sure there were very few (if any) writing operations during

the

Post by Slava Prisivko
degraded operating mode, so the I could recover by rebuilding the old

mirror

Post by Slava Prisivko
(sda) using the more recent ones (sdb and sdc).

Agreed, based on your check below.

Post by Slava Prisivko
Thanks, I used your raid5_parity_check.cc utility with the default stripe
size (64 * 1024), but it actually doesn't matter since you're just
calculating the total xor and the stripe size acts as a buffer size for
that.

[I was little surprised to discover that RAID 6 works as a byte erasure code.]
The stripe size and layout matters once if you want to adapt the code
to extract or repair the data.

Post by Slava Prisivko
I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would

Post by Slava Prisivko
to try to reconstruct the test_rimage_1 using h the other two. Just in

case,

Post by Slava Prisivko
here are the bad stripe numbers: 16, 48, 49.

I've updated the utility (this is for raid5 = raid5_ls). Warning: not
tested on out-of-sync data.
https://drive.google.com/open?id=0B8dHrWSoVcaDYXlUWXEtZEMwX0E
# Assume the first sub LV has the out-of-date data and dump the
correct(ed) LV content.
./foo stripe $((64*1024)) repair 0 /dev/${lv}_rimage_* | cmp - /dev/${lv}

Thanks!

I tried to reassemble the array using 3 different pairs of correct LV
images, but it doesn't work (I am sure because I cannot luksOpen a LUKS
image which is in the LV, which is almost surely uncorrectable).

/dev/sdc2(1)

Post by Slava Prisivko
[test_rimage_1] vg Iwi-a-r-r- 32.00m /dev/sda2(238244)
[test_rimage_2] vg Iwi-a-r-r- 32.00m /dev/sdb2(148612)
[test_rmeta_0] vg ewi-a-r-r- 4.00m

/dev/sdc2(0)

Post by Slava Prisivko
[test_rmeta_1] vg ewi-a-r-r- 4.00m /dev/sda2(238243)
[test_rmeta_2] vg ewi-a-r-r- 4.00m /dev/sdb2(148611)

The extra r(efresh) attributes suggest trying a resync operation which
may not be possible on inactive LV.
I missed that the RAID device is actually in the list.

Post by Slava Prisivko
After cleaning the dmsetup table of test_* and trying to lvchange -ay I

get

Post by Slava Prisivko
# lvchange -ay vg/test -v

[snip]

Post by Slava Prisivko
device-mapper: reload ioctl on (253:87) failed: Invalid argument
Removing vg-test (253:87)
device-mapper: table: 253:87: raid: Cannot change device positions in

RAID

Post by Slava Prisivko
array
device-mapper: ioctl: error adding target to table

This error occurs when the sub LV metadata says "I am device X in this
array" but dmsetup is being asked to put the sub LV at different
position Y (alas, neither are logged). With lots of -v and -d flags
you can get lvchange to include the dm table entries in the
diagnostics.

This is as useful as it gets (-vvvv -dddd):
Loading vg-test_rmeta_0 table (253:35)
Adding target to (253:35): 0 8192 linear 8:34 2048
dm table (253:35) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rmeta_0 (253:35) identical table reload.
Loading vg-test_rimage_0 table (253:36)
Adding target to (253:36): 0 65536 linear 8:34 10240
dm table (253:36) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rimage_0 (253:36) identical table reload.
Loading vg-test_rmeta_1 table (253:37)
Adding target to (253:37): 0 8192 linear 8:2 1951688704
dm table (253:37) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rmeta_1 (253:37) identical table reload.
Loading vg-test_rimage_1 table (253:38)
Adding target to (253:38): 0 65536 linear 8:2 1951696896
dm table (253:38) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rimage_1 (253:38) identical table reload.
Loading vg-test_rmeta_2 table (253:39)
Adding target to (253:39): 0 8192 linear 8:18 1217423360
dm table (253:39) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rmeta_2 (253:39) identical table reload.
Loading vg-test_rimage_2 table (253:40)
Adding target to (253:40): 0 65536 linear 8:18 1217431552
dm table (253:40) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rimage_2 (253:40) identical table reload.
Creating vg-test
dm create vg-test
LVM-Pgjp5f2PRJipxvoNdsYmq0olg9iWwY5pJjiPmiesfxvdeF5zMvTsJC6vFfqNgNnZ [
noopencount flush ] [16384] (*1)
Loading vg-test table (253:84)
Adding target to (253:84): 0 131072 raid raid5_ls 3 128 region_size
1024 3 253:35 253:36 253:37 253:38 253:39 253:40
dm table (253:84) [ opencount flush ] [16384] (*1)
dm reload (253:84) [ noopencount flush ] [16384] (*1)
device-mapper: reload ioctl on (253:84) failed: Invalid argument

I don't see any problems here.

Post by Giuliano Procida
You can check the rmeta superblocks with
https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg

Thanks, it's very useful!

/dev/mapper/vg-test_rmeta_0
found RAID superblock at offset 0
magic=1683123524
features=0
num_devices=3
array_position=0
events=56
failed_devices=0
disk_recovery_offset=18446744073709551615
array_resync_offset=18446744073709551615
level=5
layout=2
stripe_sectors=128
found bitmap file superblock at offset 4096:
magic: 6d746962
version: 4
uuid: 00000000.00000000.00000000.00000000
events: 56
events cleared: 33
state: 00000000
chunksize: 524288 B
daemon sleep: 5s
sync size: 32768 KB
max write behind: 0

/dev/mapper/vg-test_rmeta_1
found RAID superblock at offset 0
magic=1683123524
features=0
num_devices=3
array_position=4294967295
events=62
failed_devices=1
disk_recovery_offset=0
array_resync_offset=18446744073709551615
level=5
layout=2
stripe_sectors=128
found bitmap file superblock at offset 4096:
magic: 6d746962
version: 4
uuid: 00000000.00000000.00000000.00000000
events: 60
events cleared: 33
state: 00000000
chunksize: 524288 B
daemon sleep: 5s
sync size: 32768 KB
max write behind: 0

/dev/mapper/vg-test_rmeta_2
found RAID superblock at offset 0
magic=1683123524
features=0
num_devices=3
array_position=2
events=62
failed_devices=1
disk_recovery_offset=18446744073709551615
array_resync_offset=18446744073709551615
level=5
layout=2
stripe_sectors=128
found bitmap file superblock at offset 4096:
magic: 6d746962
version: 4
uuid: 00000000.00000000.00000000.00000000
events: 62
events cleared: 33
state: 00000000
chunksize: 524288 B
daemon sleep: 5s
sync size: 32768 KB
max write behind: 0

The problem I see here is that events count is different for the three
rmetas.

Post by Slava Prisivko
Here is the relevant /etc/lvm/archive (archive is more recent that

backup)
That looks sane, but you omitted the physical volumes section so there
is no way to cross-check UUIDs and devices or see if there are MISSING
flags.

The ids are the same and there are no MISSING flags.

Post by Giuliano Procida
If you use
https://drive.google.com/open?id=0B8dHrWSoVcaDQkU5NG1sLWc5cjg
directly, you can get metadata that LVM is reading off the PVs and
double-check for discrepancies.
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Giuliano Procida

2016-10-12 06:57:36 UTC

Post by Slava Prisivko
I tried to reassemble the array using 3 different pairs of correct LV
images, but it doesn't work (I am sure because I cannot luksOpen a LUKS
image which is in the LV, which is almost surely uncorrectable).

I would hope that a luks volume would at least be recognisable using
file -s. If you extract the image data into a regular file you should
be able to losetup that and then luksOpen the loop device.

Post by Slava Prisivko
Loading vg-test_rmeta_0 table (253:35)
Adding target to (253:35): 0 8192 linear 8:34 2048
dm table (253:35) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rmeta_0 (253:35) identical table reload.
Loading vg-test_rimage_0 table (253:36)
Adding target to (253:36): 0 65536 linear 8:34 10240
dm table (253:36) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rimage_0 (253:36) identical table reload.
Loading vg-test_rmeta_1 table (253:37)
Adding target to (253:37): 0 8192 linear 8:2 1951688704
dm table (253:37) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rmeta_1 (253:37) identical table reload.
Loading vg-test_rimage_1 table (253:38)
Adding target to (253:38): 0 65536 linear 8:2 1951696896
dm table (253:38) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rimage_1 (253:38) identical table reload.
Loading vg-test_rmeta_2 table (253:39)
Adding target to (253:39): 0 8192 linear 8:18 1217423360
dm table (253:39) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rmeta_2 (253:39) identical table reload.
Loading vg-test_rimage_2 table (253:40)
Adding target to (253:40): 0 65536 linear 8:18 1217431552
dm table (253:40) [ opencount flush ] [16384] (*1)
Suppressed vg-test_rimage_2 (253:40) identical table reload.
Creating vg-test
dm create vg-test
LVM-Pgjp5f2PRJipxvoNdsYmq0olg9iWwY5pJjiPmiesfxvdeF5zMvTsJC6vFfqNgNnZ [
noopencount flush ] [16384] (*1)
Loading vg-test table (253:84)
Adding target to (253:84): 0 131072 raid raid5_ls 3 128 region_size
1024 3 253:35 253:36 253:37 253:38 253:39 253:40
dm table (253:84) [ opencount flush ] [16384] (*1)
dm reload (253:84) [ noopencount flush ] [16384] (*1)
device-mapper: reload ioctl on (253:84) failed: Invalid argument
I don't see any problems here.

In my case I got (for example, and Gmail is going to fold the lines, sorry):

[...]
Loading vg0-photos table (254:45)
Adding target to (254:45): 0 1258291200 raid raid6_zr 3 128
region_size 1024 5 254:73 254:74 254:37 254:38 254:39 254:40 254:41
254:42 254:43 254:44
dm table (254:45) [ opencount flush ] [16384] (*1)
dm reload (254:45) [ noopencount flush ] [16384] (*1)
device-mapper: reload ioctl on (254:45) failed: Invalid argument

The actual errors are in the kernel logs:

[...]
[144855.931712] device-mapper: raid: New device injected into existing
array without 'rebuild' parameter specified
[144855.935523] device-mapper: table: 254:45: raid: Unable to assemble
array: Invalid superblocks
[144855.939290] device-mapper: ioctl: error adding target to table

128 means 128*512 so this is 64k as in your case. I was able to verify
that my extracted images matched the RAID device. My problem was not
assembling the array, it was that the array would be rebuilt on every
subsequent use:

Loading vg0-var table (254:21)
Adding target to (254:21): 0 52428800 raid raid5_ls 5 128
region_size 1024 rebuild 0 5 254:11 254:12 254:13 254:14 254:15 254:16
254:17 254:18 254:19 254:20
dm table (254:21) [ opencount flush ] [16384] (*1)
dm reload (254:21) [ noopencount flush ] [16384] (*1)
Table size changed from 0 to 52428800 for vg0-var (254:21).

Post by Giuliano Procida
You can check the rmeta superblocks with
https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg

Thanks, it's very useful!
/dev/mapper/vg-test_rmeta_0
found RAID superblock at offset 0
magic=1683123524
features=0
num_devices=3
array_position=0
events=56
failed_devices=0
disk_recovery_offset=18446744073709551615
array_resync_offset=18446744073709551615
level=5
layout=2
stripe_sectors=128
magic: 6d746962
version: 4
uuid: 00000000.00000000.00000000.00000000
events: 56
events cleared: 33
state: 00000000
chunksize: 524288 B
daemon sleep: 5s
sync size: 32768 KB
max write behind: 0
/dev/mapper/vg-test_rmeta_1
found RAID superblock at offset 0
magic=1683123524
features=0
num_devices=3
array_position=4294967295
events=62
failed_devices=1
disk_recovery_offset=0
array_resync_offset=18446744073709551615
level=5
layout=2
stripe_sectors=128
magic: 6d746962
version: 4
uuid: 00000000.00000000.00000000.00000000
events: 60
events cleared: 33
state: 00000000
chunksize: 524288 B
daemon sleep: 5s
sync size: 32768 KB
max write behind: 0
/dev/mapper/vg-test_rmeta_2
found RAID superblock at offset 0
magic=1683123524
features=0
num_devices=3
array_position=2
events=62
failed_devices=1
disk_recovery_offset=18446744073709551615
array_resync_offset=18446744073709551615
level=5
layout=2
stripe_sectors=128
magic: 6d746962
version: 4
uuid: 00000000.00000000.00000000.00000000
events: 62
events cleared: 33
state: 00000000
chunksize: 524288 B
daemon sleep: 5s
sync size: 32768 KB
max write behind: 0
The problem I see here is that events count is different for the three
rmetas.

The event counts relate to the intent bitmap (I believe).

That looks OK, because failed devices is 1, meaning 0b0...01; i.e.,
device 0 of the array is "failed". The real problem is device 1 which
has

Post by Slava Prisivko
array_position=4294967295

This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
that it has special significance in kernel or LVM code. I've not
checked beyond noticing one test: role < 0.

I recommend using diff3 or pairwise diff on the metadata dumps to
ensure you have not missed any other differences.

One possible way forward:

(Optionally) adapt my resync code so it writes back to the original
files instead instead of outputting corrected linear data.

Modify the rmeta data to remove the failed flag and reset the bad
position to the correct value. sync and power off (or otherwise
prevent the device mapper from writing back bad data).

It's possible the RAID volume will fail to sync due to bitmap
inconsistencies. I don't know how to re-write the superblocks to say
"trust me, all data are in sync".

Giuliano Procida

2016-10-12 07:15:15 UTC

On 12 October 2016 at 07:57, Giuliano Procida

Post by Slava Prisivko
array_position=4294967295

This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
that it has special significance in kernel or LVM code. I've not
checked beyond noticing one test: role < 0.

http://lxr.free-electrons.com/source/drivers/md/dm-raid.c

Now role is an int and the RHS of the assignment is le32_to_cpu(...)
which returns a u32. Testing < 0 will *never* succeed on a 64-bit
architecture. This is a kernel bug. If the code is changed so that
role is also u32 and the test is against ~0, it's possible that
different, better things will happen. Please try reporting this to the
dm-devel people.

I still don't know what wrote that value to the superblock though.

Heinz Mauelshagen

2016-10-14 23:19:24 UTC

Post by Giuliano Procida
On 12 October 2016 at 07:57, Giuliano Procida

Post by Slava Prisivko
array_position=4294967295

This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
that it has special significance in kernel or LVM code. I've not
checked beyond noticing one test: role < 0.

Well, the result of le32_to_cpu is assigned to a 32 bit int on 64 bit arch.

The 4294967295 reported by parse_rmeta will thus result in signed int
role = -1
which is used for comparision.

The tool should rather report the array_position signed to avoid iritation.

Post by Giuliano Procida
This is a kernel bug. If the code is changed so that
role is also u32 and the test is against ~0, it's possible that
different, better things will happen. Please try reporting this to the
dm-devel people.
I still don't know what wrote that value to the superblock though.

Must have been the position gotten set to -1 indicating failure to hot add
the image LV back in and thus MD has written it to that superblock.

Did you spot any "Faulty.*device #.*has readable super block.\n
Attempting to revive it."
messages from dm-raid in the kernel log by chance?

Heinz

Post by Giuliano Procida
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Giuliano Procida

2016-10-15 07:21:59 UTC

Post by Heinz Mauelshagen

Post by Giuliano Procida
On 12 October 2016 at 07:57, Giuliano Procida
[lies]

Well, the result of le32_to_cpu is assigned to a 32 bit int on 64 bit arch.
The 4294967295 reported by parse_rmeta will thus result in signed int role =
-1
which is used for comparision.
The tool should rather report the array_position signed to avoid iritation.

Oops, yes. sizeof(int) == 4 in the kernel, even on a 64-bit architecture.
I'd forgotten this. It seemed surprising that compiler would likely warn
but the warning would have been ignored.

Post by Heinz Mauelshagen

Post by Giuliano Procida
I still don't know what wrote that value to the superblock though.

Must have been the position gotten set to -1 indicating failure to hot add
the image LV back in and thus MD has written it to that superblock.

I didn't find the code that did this when I last looked.

Post by Heinz Mauelshagen
Did you spot any "Faulty.*device #.*has readable super block.\n Attempting
to revive it."
messages from dm-raid in the kernel log by chance?

That's a question for Slava.

Slava Prisivko

2016-10-13 20:44:04 UTC

On Wed, Oct 12, 2016 at 10:02 AM Giuliano Procida <

I would hope that a luks volume would at least be recognisable using
file -s. If you extract the image data into a regular file you should
be able to losetup that and then luksOpen the loop device.

Yes, it's recognizable. I can perform luksDump and luksOpen but for the
latter command the password just doesn't work. Well, cryptsetup works with
files just as well as with devices, so it doesn't help. But I tried just to
be sure and, quite naturally, it doesn't work either.

region_size

Post by Slava Prisivko
1024 3 253:35 253:36 253:37 253:38 253:39 253:40
dm table (253:84) [ opencount flush ] [16384] (*1)
dm reload (253:84) [ noopencount flush ] [16384] (*1)
device-mapper: reload ioctl on (253:84) failed: Invalid argument
I don't see any problems here.

[...]
Loading vg0-photos table (254:45)
Adding target to (254:45): 0 1258291200 raid raid6_zr 3 128
region_size 1024 5 254:73 254:74 254:37 254:38 254:39 254:40 254:41
254:42 254:43 254:44
dm table (254:45) [ opencount flush ] [16384] (*1)
dm reload (254:45) [ noopencount flush ] [16384] (*1)
device-mapper: reload ioctl on (254:45) failed: Invalid argument
[...]
[144855.931712] device-mapper: raid: New device injected into existing
array without 'rebuild' parameter specified
[144855.935523] device-mapper: table: 254:45: raid: Unable to assemble
array: Invalid superblocks
[144855.939290] device-mapper: ioctl: error adding target to table

I had the following the first time:
[ 74.743051] device-mapper: raid: Failed to read superblock of device at
position 1
[ 74.761094] md/raid:mdX: device dm-73 operational as raid disk 2
[ 74.765707] md/raid:mdX: device dm-67 operational as raid disk 0
[ 74.770911] md/raid:mdX: allocated 3219kB
[ 74.773571] md/raid:mdX: raid level 5 active with 2 out of 3 devices,
algorithm 2
[ 74.775964] RAID conf printout:
[ 74.775968] --- level:5 rd:3 wd:2
[ 74.775971] disk 0, o:1, dev:dm-67
[ 74.775973] disk 2, o:1, dev:dm-73
[ 74.793120] created bitmap (1 pages) for device mdX
[ 74.822333] mdX: bitmap initialized from disk: read 1 pages, set 2 of 64
bits

After that I had only the previously mentioned errors in the kernel log:

device-mapper: table: 253:84: raid: Cannot change device positions in RAID
array
device-mapper: ioctl: error adding target to table

Post by Giuliano Procida
128 means 128*512 so this is 64k as in your case. I was able to verify
that my extracted images matched the RAID device. My problem was not
assembling the array, it was that the array would be rebuilt on every
Loading vg0-var table (254:21)
Adding target to (254:21): 0 52428800 raid raid5_ls 5 128
region_size 1024 rebuild 0 5 254:11 254:12 254:13 254:14 254:15 254:16
254:17 254:18 254:19 254:20
dm table (254:21) [ opencount flush ] [16384] (*1)
dm reload (254:21) [ noopencount flush ] [16384] (*1)
Table size changed from 0 to 52428800 for vg0-var (254:21).