Discussion:
[linux-lvm] Uncache a LV when a cache PV is gone, bug ?
Dragan Milivojević
2015-08-20 16:09:34 UTC
Permalink
Hi all

I'm testing a recovery scenario for a NAS server which uses an SSD as
a PV for the LVM cache (dm-cache).
When I remove the SSD and try to uncache the LV I get this:

[***@storage ~]# lvconvert -v --force --uncache /dev/total_storage/test
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
Cannot change VG total_storage while PVs are missing.
Consider vgreduce --removemissing.
There are 1 physical volumes missing.

[***@storage ~]# vgreduce -v --force --removemissing total_storage
Finding volume group "total_storage"
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Trying to open VG total_storage for recovery...
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group "total_storage" metadata (seqno 9).
Removing partial LV test.
activation/volume_list configuration setting not defined: Checking
only host tags for total_storage/test
Executing: /usr/sbin/modprobe dm-cache
Creating total_storage-cache_pool00_cdata-missing_0_0
Loading total_storage-cache_pool00_cdata-missing_0_0 table (253:3)
Resuming total_storage-cache_pool00_cdata-missing_0_0 (253:3)
Creating total_storage-cache_pool00_cdata
Loading total_storage-cache_pool00_cdata table (253:4)
Resuming total_storage-cache_pool00_cdata (253:4)
Creating total_storage-cache_pool00_cmeta-missing_0_0
Loading total_storage-cache_pool00_cmeta-missing_0_0 table (253:5)
Resuming total_storage-cache_pool00_cmeta-missing_0_0 (253:5)
Creating total_storage-cache_pool00_cmeta
Loading total_storage-cache_pool00_cmeta table (253:6)
Resuming total_storage-cache_pool00_cmeta (253:6)
Creating total_storage-test_corig
Loading total_storage-test_corig table (253:7)
Resuming total_storage-test_corig (253:7)
Executing: /usr/sbin/cache_check -q
/dev/mapper/total_storage-cache_pool00_cmeta

vgreduce gets stuck at the last step: /usr/sbin/cache_check

If I run cache_check manually I get this:

[***@storage ~]# /usr/sbin/cache_check
/dev/mapper/total_storage-cache_pool00_cmeta
examining superblock
superblock is corrupt
incomplete io for block 0, e.res = 18446744073709551611, e.res2 =
0, offset = 0, nbytes = 4096

and it waits indefinitely.

If a replace the /usr/sbin/cache_check with a shell script that returns 0 or 1
vgreduce just errors out. It seems that there is no way to uncache the
LV without
replacing the missing PV (which could pose a problem in production use).
The origin LV (test_corig) is fine, I can mount it and use it, there
are no file-system issues etc.

Is this an intended behaviour or a bug?

This is on centos 7, kernel-3.10.0-229.11.1.el7.x86_64,
lvdisplay --version :
LVM version: 2.02.115(2)-RHEL7 (2015-01-28)
Library version: 1.02.93-RHEL7 (2015-01-28)
Driver version: 4.29.0

Thanks
Dragan
Zdenek Kabelac
2015-08-21 07:21:39 UTC
Permalink
Post by Dragan Milivojević
Hi all
I'm testing a recovery scenario for a NAS server which uses an SSD as
a PV for the LVM cache (dm-cache).
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
Cannot change VG total_storage while PVs are missing.
Consider vgreduce --removemissing.
There are 1 physical volumes missing.
Finding volume group "total_storage"
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Trying to open VG total_storage for recovery...
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group "total_storage" metadata (seqno 9).
Removing partial LV test.
activation/volume_list configuration setting not defined: Checking
only host tags for total_storage/test
Executing: /usr/sbin/modprobe dm-cache
Creating total_storage-cache_pool00_cdata-missing_0_0
Loading total_storage-cache_pool00_cdata-missing_0_0 table (253:3)
Resuming total_storage-cache_pool00_cdata-missing_0_0 (253:3)
Creating total_storage-cache_pool00_cdata
Loading total_storage-cache_pool00_cdata table (253:4)
Resuming total_storage-cache_pool00_cdata (253:4)
Creating total_storage-cache_pool00_cmeta-missing_0_0
Loading total_storage-cache_pool00_cmeta-missing_0_0 table (253:5)
Resuming total_storage-cache_pool00_cmeta-missing_0_0 (253:5)
Creating total_storage-cache_pool00_cmeta
Loading total_storage-cache_pool00_cmeta table (253:6)
Resuming total_storage-cache_pool00_cmeta (253:6)
Creating total_storage-test_corig
Loading total_storage-test_corig table (253:7)
Resuming total_storage-test_corig (253:7)
Executing: /usr/sbin/cache_check -q
/dev/mapper/total_storage-cache_pool00_cmeta
vgreduce gets stuck at the last step: /usr/sbin/cache_check
/dev/mapper/total_storage-cache_pool00_cmeta
examining superblock
superblock is corrupt
incomplete io for block 0, e.res = 18446744073709551611, e.res2 =
0, offset = 0, nbytes = 4096
and it waits indefinitely.
If a replace the /usr/sbin/cache_check with a shell script that returns 0 or 1
vgreduce just errors out. It seems that there is no way to uncache the
LV without
replacing the missing PV (which could pose a problem in production use).
The origin LV (test_corig) is fine, I can mount it and use it, there
are no file-system issues etc.
Is this an intended behaviour or a bug?
It's unhandled yet scenario.

Feel free to open BZ at bugzilla.redhat.com


Zdenek
Dragan Milivojević
2015-08-21 12:26:42 UTC
Permalink
That's odd, RHEL 7.1 release notes state that LVM cache is production ready.
Thanks for the clarification, I will post a bugreport upstream.

Dragan

On Fri, Aug 21, 2015 at 9:21 AM, Zdenek Kabelac
Post by Zdenek Kabelac
Post by Dragan Milivojević
Hi all
I'm testing a recovery scenario for a NAS server which uses an SSD as
a PV for the LVM cache (dm-cache).
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
Cannot change VG total_storage while PVs are missing.
Consider vgreduce --removemissing.
There are 1 physical volumes missing.
Finding volume group "total_storage"
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Trying to open VG total_storage for recovery...
WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
found or rejected by a filter.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group "total_storage" metadata (seqno 9).
Removing partial LV test.
activation/volume_list configuration setting not defined: Checking
only host tags for total_storage/test
Executing: /usr/sbin/modprobe dm-cache
Creating total_storage-cache_pool00_cdata-missing_0_0
Loading total_storage-cache_pool00_cdata-missing_0_0 table (253:3)
Resuming total_storage-cache_pool00_cdata-missing_0_0 (253:3)
Creating total_storage-cache_pool00_cdata
Loading total_storage-cache_pool00_cdata table (253:4)
Resuming total_storage-cache_pool00_cdata (253:4)
Creating total_storage-cache_pool00_cmeta-missing_0_0
Loading total_storage-cache_pool00_cmeta-missing_0_0 table (253:5)
Resuming total_storage-cache_pool00_cmeta-missing_0_0 (253:5)
Creating total_storage-cache_pool00_cmeta
Loading total_storage-cache_pool00_cmeta table (253:6)
Resuming total_storage-cache_pool00_cmeta (253:6)
Creating total_storage-test_corig
Loading total_storage-test_corig table (253:7)
Resuming total_storage-test_corig (253:7)
Executing: /usr/sbin/cache_check -q
/dev/mapper/total_storage-cache_pool00_cmeta
vgreduce gets stuck at the last step: /usr/sbin/cache_check
/dev/mapper/total_storage-cache_pool00_cmeta
examining superblock
superblock is corrupt
incomplete io for block 0, e.res = 18446744073709551611, e.res2 =
0, offset = 0, nbytes = 4096
and it waits indefinitely.
If a replace the /usr/sbin/cache_check with a shell script that returns 0 or 1
vgreduce just errors out. It seems that there is no way to uncache the
LV without
replacing the missing PV (which could pose a problem in production use).
The origin LV (test_corig) is fine, I can mount it and use it, there
are no file-system issues etc.
Is this an intended behaviour or a bug?
It's unhandled yet scenario.
Feel free to open BZ at bugzilla.redhat.com
Zdenek
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
matthew patton
2015-08-21 12:22:46 UTC
Permalink
Post by Zdenek Kabelac
Post by Dragan Milivojević
and it waits indefinitely.
If a replace the /usr/sbin/cache_check with a shell script that returns 0 or 1
vgreduce just errors out. It seems that there is no way to uncache the
LV without replacing the missing PV (which could pose a problem in production use).
It's unhandled yet scenario.
Huh? It's a bloody OBVIOUS scenario and frankly the most important one! How does something this fundamental get 'missed'?
Loading...