Discussion:
[linux-lvm] Recovering from lvm thin metadata exhaustion
Dean Hamstead
2018-07-08 21:36:28 UTC
Permalink
Hi All,

I hope someone with very high LVM wizardy can save me from a pickle...

Ok so this happened:
====
Jul  3 13:16:24 saito kernel: [131695.910332] device-mapper: space map
metadata: unable to allocate new metadata b
lock
Jul  3 13:16:24 saito kernel: [131695.910762] device-mapper: thin:
253:4: metadata operation 'dm_thin_remove_range' failed: error = -28
Jul  3 13:16:24 saito kernel: [131695.911019] device-mapper: thin:
253:4: aborting current metadata transaction
Jul  3 13:16:24 saito kernel: [131695.974977] device-mapper: thin:
253:4: switching pool to read-only mode
Jul  3 13:16:33 saito kernel: [131705.274889] device-mapper: thin:
dm_thin_get_highest_mapped_block returned -61
Jul  3 13:16:43 saito kernel: [131715.351896] device-mapper: thin:
dm_thin_get_highest_mapped_block returned -61
Jul  3 13:16:53 saito kernel: [131725.446482] device-mapper: thin:
dm_thin_get_highest_mapped_block returned -61
====

And sure enough
====
***@saito:/var/log# lvs -a
  Failed to parse thin params: Error.
  LV              VG  Attr       LSize   Pool Origin Data%  Meta% Move
Log Cpy%Sync Convert
  data            pve twi-cotzM- 500.00g             37.28 96.39
  [data_tdata]    pve Twi-ao---- 500.00g
  [data_tmeta]    pve ewi-ao---- 100.00m
  [lvol0_pmspare] pve ewi------- 100.00m
  root            pve -wi-ao---- 93.13g
  swap            pve -wi-ao---- 14.90g
  vm-100-disk-1   pve Vwi-XXtzX- 200.00g data
  vm-100-disk-2   pve Vwi-a-tz-- 100.00g data        23.25
====

So i added more:
====
***@saito:/var/log# lvextend --poolmetadatasize +1G pve/data
  Size of logical volume pve/data_tmeta changed from 100.00 MiB (25
extents) to 1.10 GiB (281 extents).
  Logical volume pve/data_tmeta successfully resized.
====

killed off the stuck qemu processess then

====
***@saito:/var/log# lvchange -an -v /dev/pve/vm-100-disk-1
    Deactivating logical volume pve/vm-100-disk-1.
    Removing pve-vm--100--disk--1 (253:6)
***@saito:/var/log# lvchange -an -v /dev/pve/vm-100-disk-2
    Deactivating logical volume pve/vm-100-disk-2.
    Removing pve-vm--100--disk--2 (253:7)
***@saito:/var/log# lvchange -an -v /dev/pve/data
    Deactivating logical volume pve/data.
    Not monitoring pve/data with libdevmapper-event-lvm2thin.so
    Removing pve-data (253:5)
    Removing pve-data-tpool (253:4)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag
/dev/mapper/pve-data_tmeta
    /usr/sbin/thin_check failed: 1
  WARNING: Integrity check of metadata for pool pve/data failed.
    Removing pve-data_tdata (253:3)
    Removing pve-data_tmeta (253:2)
====

then do repair
====
***@saito:/var/log# lvconvert --repair pve/data
  Using default stripesize 64.00 KiB.
  WARNING: recovery of pools without pool metadata spare LV is not
automated.
  WARNING: If everything works, remove pve/data_meta0 volume.
  WARNING: Use pvmove command to move pve/data_tmeta on the best
fitting PV.
====

looks good i guess, bring it back up and check metadata state:
====
***@saito:/var/log# lvchange -ay -v /dev/pve/data
    Activating logical volume pve/data exclusively.
    activation/volume_list configuration setting not defined: Checking
only host tags for pve/data.
    Creating pve-data_tmeta
    Loading pve-data_tmeta table (253:2)
    Resuming pve-data_tmeta (253:2)
    Creating pve-data_tdata
    Loading pve-data_tdata table (253:3)
    Resuming pve-data_tdata (253:3)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag
/dev/mapper/pve-data_tmeta
    Creating pve-data-tpool
    Loading pve-data-tpool table (253:4)
    Resuming pve-data-tpool (253:4)
    Creating pve-data
    Loading pve-data table (253:5)
    Resuming pve-data (253:5)
    Monitoring pve/data
***@saito:/var/log# lvs -a
  LV            VG  Attr       LSize   Pool Origin Data%  Meta% Move
Log Cpy%Sync Convert
  data          pve twi-aotz-- 500.00g             4.65 1.19
  data_meta0    pve -wi------- 1.15g
  [data_tdata]  pve Twi-ao---- 500.00g
  [data_tmeta]  pve ewi-ao---- 1.15g
  root          pve -wi-ao---- 93.13g
  swap          pve -wi-ao---- 14.90g
  vm-100-disk-1 pve Vwi---tz-- 200.00g data
  vm-100-disk-2 pve Vwi---tz-- 100.00g data
***@saito:/var/log# pvdisplay
====

good news for disk 2...
====
***@saito:/var/log# lvchange -ay -v /dev/pve/vm-100-disk-2
    Activating logical volume pve/vm-100-disk-2 exclusively.
    activation/volume_list configuration setting not defined: Checking
only host tags for pve/vm-100-disk-2.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--2
    Loading pve-vm--100--disk--2 table (253:6)
    Resuming pve-vm--100--disk--2 (253:6)
    pve/data already monitored.
====

now bad news for disk 1...
====
***@saito:/var/log# lvchange -ay -v /dev/pve/vm-100-disk-1
    Activating logical volume pve/vm-100-disk-1 exclusively.
    activation/volume_list configuration setting not defined: Checking
only host tags for pve/vm-100-disk-1.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--1
    Loading pve-vm--100--disk--1 table (253:7)
  device-mapper: reload ioctl on (253:7) failed: No data available
    Removing pve-vm--100--disk--1 (253:7)
====

and from dmesg regarding disk one:
====
[481216.385943] device-mapper: table: 253:7: thin: Couldn't open thin
internal device
[481216.386433] device-mapper: ioctl: error adding target to table
====

I think i may be in serious trouble
====
***@saito:/var/log# thin_dump /dev/pve/data_meta0 > /tmp/foo.txt
***@saito:/var/log# grep superblock /tmp/foo.txt
<superblock uuid="" time="0" transaction="6" data_block_size="128"
nr_data_blocks="8192000">
</superblock>
====


any thoughts on how to bring this disk back? I would be delighted for
someone to be able to point me at how this disk might be saved?



Kind Regards

Dean Hamstead
Dean Hamstead
2018-07-07 09:48:53 UTC
Permalink
Hi All,

I hope someone with very high LVM wizardy can save me from a pickle...

Ok so this happened:
====
Jul  3 13:16:24 saito kernel: [131695.910332] device-mapper: space map
metadata: unable to allocate new metadata b
lock
Jul  3 13:16:24 saito kernel: [131695.910762] device-mapper: thin:
253:4: metadata operation 'dm_thin_remove_range' failed: error = -28
Jul  3 13:16:24 saito kernel: [131695.911019] device-mapper: thin:
253:4: aborting current metadata transaction
Jul  3 13:16:24 saito kernel: [131695.974977] device-mapper: thin:
253:4: switching pool to read-only mode
Jul  3 13:16:33 saito kernel: [131705.274889] device-mapper: thin:
dm_thin_get_highest_mapped_block returned -61
Jul  3 13:16:43 saito kernel: [131715.351896] device-mapper: thin:
dm_thin_get_highest_mapped_block returned -61
Jul  3 13:16:53 saito kernel: [131725.446482] device-mapper: thin:
dm_thin_get_highest_mapped_block returned -61
====

And sure enough
====
***@saito:/var/log# lvs -a
  Failed to parse thin params: Error.
  LV              VG  Attr       LSize   Pool Origin Data%  Meta% Move
Log Cpy%Sync Convert
  data            pve twi-cotzM- 500.00g             37.28 96.39
  [data_tdata]    pve Twi-ao---- 500.00g
  [data_tmeta]    pve ewi-ao---- 100.00m
  [lvol0_pmspare] pve ewi------- 100.00m
  root            pve -wi-ao---- 93.13g
  swap            pve -wi-ao---- 14.90g
  vm-100-disk-1   pve Vwi-XXtzX- 200.00g data
  vm-100-disk-2   pve Vwi-a-tz-- 100.00g data        23.25
====

So i added more:
====
***@saito:/var/log# lvextend --poolmetadatasize +1G pve/data
  Size of logical volume pve/data_tmeta changed from 100.00 MiB (25
extents) to 1.10 GiB (281 extents).
  Logical volume pve/data_tmeta successfully resized.
====

killed off the stuck qemu processess then

====
***@saito:/var/log# lvchange -an -v /dev/pve/vm-100-disk-1
    Deactivating logical volume pve/vm-100-disk-1.
    Removing pve-vm--100--disk--1 (253:6)
***@saito:/var/log# lvchange -an -v /dev/pve/vm-100-disk-2
    Deactivating logical volume pve/vm-100-disk-2.
    Removing pve-vm--100--disk--2 (253:7)
***@saito:/var/log# lvchange -an -v /dev/pve/data
    Deactivating logical volume pve/data.
    Not monitoring pve/data with libdevmapper-event-lvm2thin.so
    Removing pve-data (253:5)
    Removing pve-data-tpool (253:4)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag
/dev/mapper/pve-data_tmeta
    /usr/sbin/thin_check failed: 1
  WARNING: Integrity check of metadata for pool pve/data failed.
    Removing pve-data_tdata (253:3)
    Removing pve-data_tmeta (253:2)
====

then do repair
====
***@saito:/var/log# lvconvert --repair pve/data
  Using default stripesize 64.00 KiB.
  WARNING: recovery of pools without pool metadata spare LV is not
automated.
  WARNING: If everything works, remove pve/data_meta0 volume.
  WARNING: Use pvmove command to move pve/data_tmeta on the best
fitting PV.
====

looks good i guess, bring it back up and check metadata state:
====
***@saito:/var/log# lvchange -ay -v /dev/pve/data
    Activating logical volume pve/data exclusively.
    activation/volume_list configuration setting not defined: Checking
only host tags for pve/data.
    Creating pve-data_tmeta
    Loading pve-data_tmeta table (253:2)
    Resuming pve-data_tmeta (253:2)
    Creating pve-data_tdata
    Loading pve-data_tdata table (253:3)
    Resuming pve-data_tdata (253:3)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag
/dev/mapper/pve-data_tmeta
    Creating pve-data-tpool
    Loading pve-data-tpool table (253:4)
    Resuming pve-data-tpool (253:4)
    Creating pve-data
    Loading pve-data table (253:5)
    Resuming pve-data (253:5)
    Monitoring pve/data
***@saito:/var/log# lvs -a
  LV            VG  Attr       LSize   Pool Origin Data%  Meta% Move
Log Cpy%Sync Convert
  data          pve twi-aotz-- 500.00g             4.65 1.19
  data_meta0    pve -wi------- 1.15g
  [data_tdata]  pve Twi-ao---- 500.00g
  [data_tmeta]  pve ewi-ao---- 1.15g
  root          pve -wi-ao---- 93.13g
  swap          pve -wi-ao---- 14.90g
  vm-100-disk-1 pve Vwi---tz-- 200.00g data
  vm-100-disk-2 pve Vwi---tz-- 100.00g data
***@saito:/var/log# pvdisplay
====

good news for disk 2...
====
***@saito:/var/log# lvchange -ay -v /dev/pve/vm-100-disk-2
    Activating logical volume pve/vm-100-disk-2 exclusively.
    activation/volume_list configuration setting not defined: Checking
only host tags for pve/vm-100-disk-2.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--2
    Loading pve-vm--100--disk--2 table (253:6)
    Resuming pve-vm--100--disk--2 (253:6)
    pve/data already monitored.
====

now bad news for disk 1...
====
***@saito:/var/log# lvchange -ay -v /dev/pve/vm-100-disk-1
    Activating logical volume pve/vm-100-disk-1 exclusively.
    activation/volume_list configuration setting not defined: Checking
only host tags for pve/vm-100-disk-1.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--1
    Loading pve-vm--100--disk--1 table (253:7)
  device-mapper: reload ioctl on (253:7) failed: No data available
    Removing pve-vm--100--disk--1 (253:7)
====

and from dmesg regarding disk one:
====
[481216.385943] device-mapper: table: 253:7: thin: Couldn't open thin
internal device
[481216.386433] device-mapper: ioctl: error adding target to table
====

I think i may be in serious trouble
====
***@saito:/var/log# thin_dump /dev/pve/data_meta0 > /tmp/foo.txt
***@saito:/var/log# grep superblock /tmp/foo.txt
<superblock uuid="" time="0" transaction="6" data_block_size="128"
nr_data_blocks="8192000">
</superblock>
====


any thoughts on how to bring this disk back? I would be delighted for
someone to be able to point me at how this disk might be saved?



Kind Regards

Dean Hamstead
Zdenek Kabelac
2018-07-15 19:47:38 UTC
Permalink
Post by Dean Hamstead
Hi All,
I hope someone with very high LVM wizardy can save me from a pickle...
====
Jul  3 13:16:24 saito kernel: [131695.910332] device-mapper: space map
metadata: unable to allocate new metadata b
lock
metadata operation 'dm_thin_remove_range' failed: error = -28
aborting current metadata transaction
switching pool to read-only mode
dm_thin_get_highest_mapped_block returned -61
dm_thin_get_highest_mapped_block returned -61
dm_thin_get_highest_mapped_block returned -61
====
And sure enough
====
  Failed to parse thin params: Error.
  LV              VG  Attr       LSize   Pool Origin Data%  Meta% Move Log
Cpy%Sync Convert
  data            pve twi-cotzM- 500.00g             37.28 96.39
  [data_tdata]    pve Twi-ao---- 500.00g
  [data_tmeta]    pve ewi-ao---- 100.00m
  [lvol0_pmspare] pve ewi------- 100.00m
  root            pve -wi-ao---- 93.13g
  swap            pve -wi-ao---- 14.90g
  vm-100-disk-1   pve Vwi-XXtzX- 200.00g data
  vm-100-disk-2   pve Vwi-a-tz-- 100.00g data        23.25
====
====
  Size of logical volume pve/data_tmeta changed from 100.00 MiB (25 extents)
to 1.10 GiB (281 extents).
  Logical volume pve/data_tmeta successfully resized.
====
killed off the stuck qemu processess then
====
    Deactivating logical volume pve/vm-100-disk-1.
    Removing pve-vm--100--disk--1 (253:6)
    Deactivating logical volume pve/vm-100-disk-2.
    Removing pve-vm--100--disk--2 (253:7)
    Deactivating logical volume pve/data.
    Not monitoring pve/data with libdevmapper-event-lvm2thin.so
    Removing pve-data (253:5)
    Removing pve-data-tpool (253:4)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag
/dev/mapper/pve-data_tmeta
    /usr/sbin/thin_check failed: 1
  WARNING: Integrity check of metadata for pool pve/data failed.
    Removing pve-data_tdata (253:3)
    Removing pve-data_tmeta (253:2)
====
then do repair
====
  Using default stripesize 64.00 KiB.
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove pve/data_meta0 volume.
  WARNING: Use pvmove command to move pve/data_tmeta on the best fitting PV.
====
====
    Activating logical volume pve/data exclusively.
    activation/volume_list configuration setting not defined: Checking only
host tags for pve/data.
    Creating pve-data_tmeta
    Loading pve-data_tmeta table (253:2)
    Resuming pve-data_tmeta (253:2)
    Creating pve-data_tdata
    Loading pve-data_tdata table (253:3)
    Resuming pve-data_tdata (253:3)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag
/dev/mapper/pve-data_tmeta
    Creating pve-data-tpool
    Loading pve-data-tpool table (253:4)
    Resuming pve-data-tpool (253:4)
    Creating pve-data
    Loading pve-data table (253:5)
    Resuming pve-data (253:5)
    Monitoring pve/data
  LV            VG  Attr       LSize   Pool Origin Data%  Meta% Move Log
Cpy%Sync Convert
  data          pve twi-aotz-- 500.00g             4.65 1.19
  data_meta0    pve -wi------- 1.15g
  [data_tdata]  pve Twi-ao---- 500.00g
  [data_tmeta]  pve ewi-ao---- 1.15g
  root          pve -wi-ao---- 93.13g
  swap          pve -wi-ao---- 14.90g
  vm-100-disk-1 pve Vwi---tz-- 200.00g data
  vm-100-disk-2 pve Vwi---tz-- 100.00g data
====
good news for disk 2...
====
    Activating logical volume pve/vm-100-disk-2 exclusively.
    activation/volume_list configuration setting not defined: Checking only
host tags for pve/vm-100-disk-2.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--2
    Loading pve-vm--100--disk--2 table (253:6)
    Resuming pve-vm--100--disk--2 (253:6)
    pve/data already monitored.
====
now bad news for disk 1...
====
    Activating logical volume pve/vm-100-disk-1 exclusively.
    activation/volume_list configuration setting not defined: Checking only
host tags for pve/vm-100-disk-1.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--1
    Loading pve-vm--100--disk--1 table (253:7)
  device-mapper: reload ioctl on (253:7) failed: No data available
    Removing pve-vm--100--disk--1 (253:7)
====
====
[481216.385943] device-mapper: table: 253:7: thin: Couldn't open thin internal
device
[481216.386433] device-mapper: ioctl: error adding target to table
====
I think i may be in serious trouble
====
<superblock uuid="" time="0" transaction="6" data_block_size="128"
nr_data_blocks="8192000">
</superblock>
====
any thoughts on how to bring this disk back? I would be delighted for someone
to be able to point me at how this disk might be saved?
Hi

Try to open BZ like i.e. this one:

https://bugzilla.redhat.com/show_bug.cgi?id=1532071

Add all possible details (lvm2 version, kernel version, lvm2 metadata,...)
and xz compressed dump of _meta0 device.
There is small hope that some metadata can be possibly restored with
'hand-extension' of thin_repair tool.

Regards

Zdenek
Gionatan Danti
2018-07-15 21:10:05 UTC
Permalink
Il 15-07-2018 21:47 Zdenek Kabelac ha scritto:
Hi Zdenek,
Post by Zdenek Kabelac
Hi
https://bugzilla.redhat.com/show_bug.cgi?id=1532071
this is quite scary, especially considering no updates on the ticket in
recent months. How did the OP solve the issue?
Post by Zdenek Kabelac
Add all possible details (lvm2 version, kernel version, lvm2
metadata,...)
and xz compressed dump of _meta0 device.
There is small hope that some metadata can be possibly restored with
'hand-extension' of thin_repair tool.
Any insight on what went wrong (yes, I know, metadata space was exausted
- but it should not result in total volume loss, right)?

Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Dean Hamstead
2018-07-15 23:57:36 UTC
Permalink
I gave up and rebuilt. But thankfully the data was backed up
sufficiently.

Dean
Post by Gionatan Danti
Hi Zdenek,
Post by Zdenek Kabelac
Hi
https://bugzilla.redhat.com/show_bug.cgi?id=1532071
this is quite scary, especially considering no updates on the ticket
in recent months. How did the OP solve the issue?
Post by Zdenek Kabelac
Add all possible details (lvm2 version, kernel version, lvm2
metadata,...)
and xz compressed dump of _meta0 device.
There is small hope that some metadata can be possibly restored with
'hand-extension' of thin_repair tool.
Any insight on what went wrong (yes, I know, metadata space was
exausted - but it should not result in total volume loss, right)?
Thanks.
Loading...