[linux-lvm] LVM issues after replacing linux mdadm RAID5 drive

Discussion:

L.M.J

2014-04-17 10:23:15 UTC

Hi,

For the third time, I had to change a failed drive from my home linux RAID5
box. Previous time went right and this time, I don't know what I did wrong,
but I broke my RAID5. Well, at least, he won't start.
/dev/sdb was the failed drive
/dev/sdc and /dev/sdd are OK.

I tried to reassamble the RAID with this command after I replace sdb and
create a new partition :
~# mdadm -Cv /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1

Well, I gues I did a mistake here, I should have done this instead :
~# mdadm -Cv /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/sdc1 /dev/sdd1 missing

Maybe this wipe out my data...

Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty
information :-(

Google helped me, and I did this :
~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt

[..]
physical_volumes {
pv0 {
id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
device = "/dev/md0"
status = ["ALLOCATABLE"]
flags = []
dev_size = 7814047360
pe_start = 384
pe_count = 953863
}
}
logical_volumes {

lvdata {
id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
[..]

Since I saw lvm information, I guess I haven't lost all information yet...

I tried an unhoped command :
~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW" --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0

Then,
~# vgcfgrestore lvm-raid

~# lvs -a -o +devices
LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices
lvdata lvm-raid -wi-a- 450,00g /dev/md0(148480)
lvmp lvm-raid -wi-a- 80,00g /dev/md0(263680)

Then :
~# lvchange -ay /dev/lvm-raid/lv*

I was quite happy until now.
Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as ext4 partition :
~# mount /home/foo/RAID_mp/

~# mount | grep -i mp
/dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)

~# df -h /home/foo/RAID_mp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/lvm--raid-lvmp 79G 61G 19G 77% /home/foo/RAID_mp

Here is the big problem
~# ls -la /home/foo/RAID_mp
total 0

Worst on the other LVM :
~# mount /home/foo/RAID_data
mount: wrong fs type, bad option, bad superblock on /dev/mapper/lvm--raid-lvdata,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

I bet I recover the LVM structure but the data are wiped out, don't you think ?

~# fsck -n -v /dev/mapper/lvm--raid-lvdata
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: Bad magic number in super-block when using the backup blocks
fsck.ext4: going back to original superblock
fsck.ext4: Device or resource busy while trying to open /dev/mapper/lvm--raid-lvdata
Filesystem mounted or opened exclusively by another program?

Any help is welcome if you have any idea how to rescue me pleassse !

--
LMJ
"May the source be with you my young padawan"

Stuart Gathman

2014-04-17 19:33:48 UTC

Permalink

Post by L.M.J
For the third time, I had to change a failed drive from my home linux RAID5
box. Previous time went right and this time, I don't know what I did wrong,
but I broke my RAID5. Well, at least, he won't start.
/dev/sdb was the failed drive
/dev/sdc and /dev/sdd are OK.
I tried to reassamble the RAID with this command after I replace sdb and
~# mdadm -Cv /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
~# mdadm -Cv /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/sdc1 /dev/sdd1 missing
Maybe this wipe out my data...

This is not an LVM problem, but an mdadm usage problem.

You told mdadm to create a new empty md device! (-C means create a new
array!) You should have just started the old degraded md array, remove
the failed drive, and add the new drive.

But I don't think your data is gone yet... (because of assume-clean).

Post by L.M.J
Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty
information :-(
~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt
[..]
physical_volumes {
pv0 {
id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
device = "/dev/md0"
status = ["ALLOCATABLE"]
flags = []
dev_size = 7814047360
pe_start = 384
pe_count = 953863
}
}
logical_volumes {
lvdata {
id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
[..]
Since I saw lvm information, I guess I haven't lost all information yet...

nothing is lost ... yet

What you needed to do was REMOVE the blank drive before you write
anything to the RAID5! You didn't add it as a missing drive to be restored,
as you noted.

Post by L.M.J
~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW" --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0

*Now* you are writing to the md and destroying your data!

Post by L.M.J
Then,
~# vgcfgrestore lvm-raid

Overwriting your LVM metadata. But maybe not the end of the world YET...

Post by L.M.J
~# lvs -a -o +devices
LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices
lvdata lvm-raid -wi-a- 450,00g /dev/md0(148480)
lvmp lvm-raid -wi-a- 80,00g /dev/md0(263680)
~# lvchange -ay /dev/lvm-raid/lv*
I was quite happy until now.
~# mount /home/foo/RAID_mp/
~# mount | grep -i mp
/dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)
~# df -h /home/foo/RAID_mp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/lvm--raid-lvmp 79G 61G 19G 77% /home/foo/RAID_mp
Here is the big problem
~# ls -la /home/foo/RAID_mp
total 0
~# mount /home/foo/RAID_data
mount: wrong fs type, bad option, bad superblock on /dev/mapper/lvm--raid-lvdata,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

Yes, you told md that the drive with random/blank data was good data!
If ONLY you had mounted those filesystems
READ ONLY while checking things out, you would still be ok. But now,
you have overwritten stuff!

Post by L.M.J
I bet I recover the LVM structure but the data are wiped out, don't you think ?
~# fsck -n -v /dev/mapper/lvm--raid-lvdata
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: Bad magic number in super-block when using the backup blocks
fsck.ext4: going back to original superblock
fsck.ext4: Device or resource busy while trying to open /dev/mapper/lvm--raid-lvdata
Filesystem mounted or opened exclusively by another program?
Any help is welcome if you have any idea how to rescue me pleassse !

Fortunately, your fsck was read only. At this point, you need to
crash/halt your system with no shutdown (to avoid further writes to the
mounted filesystems).
Then REMOVE the new drive. Start up again, and add the new drive properly.

You should check stuff out READ ONLY. You will need fsck (READ ONLY at
first), and at least some data has been destroyed.
If the data is really important, you need to copy the two old drives
somewhere before you do ANYTHING else. Buy two more drives! That will
let you recover from any more mistakes typing Create instead of Assemble
or Manage. (Note that --assume-clean warns you that you really need to
know what you are doing!)

L.M.J

2014-04-18 21:14:17 UTC

Permalink

Le Thu, 17 Apr 2014 15:33:48 -0400,
Stuart Gathman <***@gathman.org> a écrit :

Thanks for your answer

Post by Stuart Gathman
Fortunately, your fsck was read only. At this point, you need to
crash/halt your system with no shutdown (to avoid further writes to the
mounted filesystems).
Then REMOVE the new drive. Start up again, and add the new drive properly.

RAID5 recreate : started with 2 original drives

~# mdadm --assemble --force /dev/md0 /dev/sdc1 /dev/sdd1

md0 status is normal, missing new drive : sdb(1)
~# cat /proc/mdstat
md0 : active raid5 sdc1[0] sdd1[1]
3907023872 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]

Post by Stuart Gathman
You should check stuff out READ ONLY. You will need fsck (READ ONLY at
first), and at least some data has been destroyed.
If the data is really important, you need to copy the two old drives
somewhere before you do ANYTHING else. Buy two more drives! That will
let you recover from any more mistakes typing Create instead of Assemble
or Manage. (Note that --assume-clean warns you that you really need to
know what you are doing!)

I try a read-only fsck

~# fsck -n -v -f /dev/lvm-raid/lvmp3
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
Resize inode not valid. Recreate? no
Pass 1: Checking inodes, blocks, and sizes
Inode 7, i_blocks is 114136, should be 8. Fix? no
Inode 786433 is in use, but has dtime set. Fix? no
Inode 786433 has imagic flag set. Clear? no
Inode 786433 has compression flag set on filesystem without compression
support. Clear? no Inode 786433 has INDEX_FL flag set but is not a directory.
Clear HTree index? no
HTREE directory inode 786433 has an invalid root node.
Clear HTree index? no
Inode 786433, i_blocks is 4294967295, should be 0. Fix? no
[...]
Directory entry for '.' in ... (11) is big.
Split? no
Missing '.' in directory inode 262145.
Fix? no
Invalid inode number for '.' in directory inode 262145.
Fix? no
Directory entry for '.' in ... (262145) is big.
Split? no
Directory inode 917506, block #0, offset 0: directory corrupted
Salvage? no
e2fsck: aborted

Sounds bad, what should I do know ?

L.M.J

2014-04-26 18:47:44 UTC

Permalink

What do you think about this command :

~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt

[..]
physical_volumes {
pv0 {
id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
device = "/dev/md0"
status = ["ALLOCATABLE"]
flags = []
dev_size = 7814047360
pe_start = 384
pe_count = 953863
}
}
logical_volumes {

lvdata {
id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
[..]

I presume I still have data on my broken RAID5.

I did a pvcreate --restorefile and vgcfgrestore.
I can see now my 2 LVM, but my EXT4 filesystem are empty, df reports some realist disk usage, fsck (Read
only) find tons of errors.

Is there a way to recover my data on the EXT4 FS ?

Le Fri, 18 Apr 2014 23:14:17 +0200,

Post by L.M.J
Le Thu, 17 Apr 2014 15:33:48 -0400,
Thanks for your answer

RAID5 recreate : started with 2 original drives
~# mdadm --assemble --force /dev/md0 /dev/sdc1 /dev/sdd1
md0 status is normal, missing new drive : sdb(1)
~# cat /proc/mdstat
md0 : active raid5 sdc1[0] sdd1[1]
3907023872 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]

I try a read-only fsck
~# fsck -n -v -f /dev/lvm-raid/lvmp3
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
Resize inode not valid. Recreate? no
Pass 1: Checking inodes, blocks, and sizes
Inode 7, i_blocks is 114136, should be 8. Fix? no
Inode 786433 is in use, but has dtime set. Fix? no
Inode 786433 has imagic flag set. Clear? no
Inode 786433 has compression flag set on filesystem without compression
support. Clear? no Inode 786433 has INDEX_FL flag set but is not a directory.
Clear HTree index? no
HTREE directory inode 786433 has an invalid root node.
Clear HTree index? no
Inode 786433, i_blocks is 4294967295, should be 0. Fix? no
[...]
Directory entry for '.' in ... (11) is big.
Split? no
Missing '.' in directory inode 262145.
Fix? no
Invalid inode number for '.' in directory inode 262145.
Fix? no
Directory entry for '.' in ... (262145) is big.
Split? no
Directory inode 917506, block #0, offset 0: directory corrupted
Salvage? no
e2fsck: aborted
Sounds bad, what should I do know ?
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

L. M. J

2014-04-30 20:57:28 UTC

Permalink

Post by L.M.J
~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt
[..]
physical_volumes {
pv0 {
id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
device = "/dev/md0"
status = ["ALLOCATABLE"]
flags = []
dev_size = 7814047360
pe_start = 384
pe_count = 953863
}
}
logical_volumes {
lvdata {
id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
[..]
I presume I still have data on my broken RAID5.
I did a pvcreate --restorefile and vgcfgrestore.
I can see now my 2 LVM, but my EXT4 filesystem are empty, df reports
some realist disk usage, fsck (Read
only) find tons of errors.
Is there a way to recover my data on the EXT4 FS ?
Le Fri, 18 Apr 2014 23:14:17 +0200,

Post by L.M.J
Le Thu, 17 Apr 2014 15:33:48 -0400,
Thanks for your answer

Post by Stuart Gathman
Fortunately, your fsck was read only. At this point, you need to
crash/halt your system with no shutdown (to avoid further writes to

the

Post by L.M.J

Post by Stuart Gathman
mounted filesystems).
Then REMOVE the new drive. Start up again, and add the new drive

properly.

Post by L.M.J
RAID5 recreate : started with 2 original drives
~# mdadm --assemble --force /dev/md0 /dev/sdc1 /dev/sdd1
md0 status is normal, missing new drive : sdb(1)
~# cat /proc/mdstat
md0 : active raid5 sdc1[0] sdd1[1]
3907023872 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]

Post by Stuart Gathman
You should check stuff out READ ONLY. You will need fsck (READ

ONLY at

Post by L.M.J

Post by Stuart Gathman
first), and at least some data has been destroyed.
If the data is really important, you need to copy the two old

drives

Post by L.M.J

Post by Stuart Gathman
somewhere before you do ANYTHING else. Buy two more drives! That

will

Post by L.M.J

Post by Stuart Gathman
let you recover from any more mistakes typing Create instead of

Assemble

Post by L.M.J

Post by Stuart Gathman
or Manage. (Note that --assume-clean warns you that you really

need to

Post by L.M.J

Post by Stuart Gathman
know what you are doing!)

compression

Post by L.M.J
support. Clear? no Inode 786433 has INDEX_FL flag set but is not a

directory.

Post by L.M.J
Clear HTree index? no
HTREE directory inode 786433 has an invalid root node.
Clear HTree index? no
Inode 786433, i_blocks is 4294967295, should be 0. Fix? no
[...]
Directory entry for '.' in ... (11) is big.
Split? no
Missing '.' in directory inode 262145.
Fix? no
Invalid inode number for '.' in directory inode 262145.
Fix? no
Directory entry for '.' in ... (262145) is big.
Split? no
Directory inode 917506, block #0, offset 0: directory corrupted
Salvage? no
e2fsck: aborted
Sounds bad, what should I do know ?
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Help please :-(

--
May the open source be with you my young padawan.

Envoyé de mon téléphone, excusez la brièveté.

Stuart Gathman

2014-05-02 04:02:25 UTC

Permalink

Long ago, Nostradamus foresaw that on 04/26/2014 02:47 PM, L.M.J would

Post by L.M.J
I presume I still have data on my broken RAID5.
I did a pvcreate --restorefile and vgcfgrestore.

Did you restore the raid 5, taking out the new drive, and add00ing it
back in properly? I can't take the time right now to give you a step by
step. The big warning on what you did wasn't a joke. You should get
help from md driver people. You do not have an LVM problem.

Post by L.M.J
I can see now my 2 LVM, but my EXT4 filesystem are empty, df reports

some realist disk usage, fsck (Read only) find ? tons of errors. Is
there a way to recover my data on the EXT4 FS ? Le Fri, 18 Apr 2014
23:14:17 +0200, "L.M.J"

None of that is relevant until you restore your raid 5. Your problem
was that you told md that your new blank drive was good, in-sync data.
Which is obviously isn't. You need to degrade your raid back to the one
missing drive. Then add the new drive normally - NOT using that experts
only shortcut with the big warning, which you used incorrectly.

L.M.J

2014-05-04 07:57:33 UTC

Permalink

Le Fri, 02 May 2014 00:02:25 -0400,

Post by Stuart Gathman

Post by L.M.J
I presume I still have data on my broken RAID5.
I did a pvcreate --restorefile and vgcfgrestore.

Yes, I did that, I had it back properly, but I guess the problem comes from the first sync, which was bad...