Ryan Launchbury
2018-06-20 09:18:56 UTC
Hello,
I'm having a problem uncaching logical volumes when the cache data
chunck size is over 1MiB.
The process I'm using to uncache is: lvconvert --uncache vg/lv
The issue occurs across multiple systems with different hardware and
different versions of LVM.
Steps to reproduce:
1. Create origin VG & LV
2. Add cache device over 1TB to the origin VG
3. Create the cache data lv:
lvcreate -n cachedata -L 1770GB cached_vg /dev/nvme0n1
4. Create the cache metadata lv:
lvcreate -n cachemeta -L 1770MB cached_vg /dev/nvme0n1
5. Convert to a cache pool:
lvconvert --type cache-pool --cachemode writethrough --poolmetadata
cached_vg/cachemeta cached_vg/cachedata
6. Enable caching on the origin LVM:
lvconvert --type cache --cachepool cached_vg/cachedata
cached_vg/filestore01
7. Write some data to the main LV so as the cache device is used:
dd if=/dev/zero of=/mnt/filestore01/test.dat bs=1M count=10000
8. Check the cache stats:
lvs -a -o +cache_total_blocks,cache_used_blocks,cache_dirty_blocks
9. Repeating step 8 over time will show that the dirty blocks are not
being written back at all
10. Try to uncache the device:
lvconvert --uncache cached_vg/filestore01
11. You will get a repeating message. This will loop indefinitely and
not decrease or complete:
Flushing x blocks for cache cached_vg/filestore01.
After testing multiple times, the issue seems to be tied to the chunk
size selected in step 5. The LVM man page mentions that the chunk must
be a multiple of 32KiB, however the next chunk size automatically
assigned over 1MiB is usually 1.03MiB. With a chunk size of 1.03MiB or
higher, the cache is not able to flush. Creating a cache device with a
chunk size of 1MiB or less, the cache is flushable.
Now knowing how to avoid the issue, I just need to be able to safely
un-cache systems with do have a cache that will not flush.
Details:
Version info from lvm version:
LVM version: 2.02.171(2)-RHEL7 (2017-05-03)
Library version: 1.02.140-RHEL7 (2017-05-03)
Driver version: 4.35.0
Configuration: ./configure --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
--libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm
--with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm
--with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm
--with-pool=internal --enable-write_install --with-user= --with-group=
--with-device-uid=0 --with-device-gid=6 --with-device-mode=0660
--enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd
--enable-blkid_wiping --enable-python2-bindings --with-cluster=internal
--with-clvmd=corosync --enable-cmirrord
--with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync
--with-thin=internal --enable-lvmetad --with-cache=internal
--enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock
--enable-dmfilemapd
System info:
System 1,2,3:
- Dell R730XD server
- 12x disk in RAID 6 to onboard PERC/Megaraid controller
System 4:
-Dell R630 server
-60x Disk (6 luns) in RAID 6 to PCI megaraid controller
The systems are currently in production, so it's quite hard for me to
change the configuration to enable logging.
Any assistance would be much appreciated! If any more info is needed
please let me know.
Best regards,
Ryan
I'm having a problem uncaching logical volumes when the cache data
chunck size is over 1MiB.
The process I'm using to uncache is: lvconvert --uncache vg/lv
The issue occurs across multiple systems with different hardware and
different versions of LVM.
Steps to reproduce:
1. Create origin VG & LV
2. Add cache device over 1TB to the origin VG
3. Create the cache data lv:
lvcreate -n cachedata -L 1770GB cached_vg /dev/nvme0n1
4. Create the cache metadata lv:
lvcreate -n cachemeta -L 1770MB cached_vg /dev/nvme0n1
5. Convert to a cache pool:
lvconvert --type cache-pool --cachemode writethrough --poolmetadata
cached_vg/cachemeta cached_vg/cachedata
6. Enable caching on the origin LVM:
lvconvert --type cache --cachepool cached_vg/cachedata
cached_vg/filestore01
7. Write some data to the main LV so as the cache device is used:
dd if=/dev/zero of=/mnt/filestore01/test.dat bs=1M count=10000
8. Check the cache stats:
lvs -a -o +cache_total_blocks,cache_used_blocks,cache_dirty_blocks
9. Repeating step 8 over time will show that the dirty blocks are not
being written back at all
10. Try to uncache the device:
lvconvert --uncache cached_vg/filestore01
11. You will get a repeating message. This will loop indefinitely and
not decrease or complete:
Flushing x blocks for cache cached_vg/filestore01.
After testing multiple times, the issue seems to be tied to the chunk
size selected in step 5. The LVM man page mentions that the chunk must
be a multiple of 32KiB, however the next chunk size automatically
assigned over 1MiB is usually 1.03MiB. With a chunk size of 1.03MiB or
higher, the cache is not able to flush. Creating a cache device with a
chunk size of 1MiB or less, the cache is flushable.
Now knowing how to avoid the issue, I just need to be able to safely
un-cache systems with do have a cache that will not flush.
Details:
Version info from lvm version:
LVM version: 2.02.171(2)-RHEL7 (2017-05-03)
Library version: 1.02.140-RHEL7 (2017-05-03)
Driver version: 4.35.0
Configuration: ./configure --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
--libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm
--with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm
--with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm
--with-pool=internal --enable-write_install --with-user= --with-group=
--with-device-uid=0 --with-device-gid=6 --with-device-mode=0660
--enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd
--enable-blkid_wiping --enable-python2-bindings --with-cluster=internal
--with-clvmd=corosync --enable-cmirrord
--with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync
--with-thin=internal --enable-lvmetad --with-cache=internal
--enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock
--enable-dmfilemapd
System info:
System 1,2,3:
- Dell R730XD server
- 12x disk in RAID 6 to onboard PERC/Megaraid controller
System 4:
-Dell R630 server
-60x Disk (6 luns) in RAID 6 to PCI megaraid controller
The systems are currently in production, so it's quite hard for me to
change the configuration to enable logging.
Any assistance would be much appreciated! If any more info is needed
please let me know.
Best regards,
Ryan