Discussion:
[linux-lvm] thin provisioned volume failed - perhaps issues extending metadata?
Sean Brisbane
2014-12-09 09:52:53 UTC
Permalink
Hi,

Last night a thin provisioned volume with snapshots failed with errors. I am not sure how to proceed with debugging this.

Errors such as this written to the console:

Buffer I/O error on device dm-6, logical block 153616394
lost page write due to I/O error on dm-6

I cant see what happened initially as the logs were not preserved after hard reboot. Now, when I try to mount (full logs at base of mail):

Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.

So, I suspect I need to extend the metadata on the pool. The pool itself has plenty of space, and I, perhaps naively, assumed that combining metadata and data into one volume would avoid any metadata space issues.

So, I tried:

> ls /dev/mapper/VGData-thinpool1*
/dev/mapper/VGData-thinpool1
/dev/mapper/VGData-thinpool1_tdata
/dev/mapper/VGData-thinpool1_tmeta
/dev/mapper/VGData-thinpool1-tpool

> lvresize --poolmetadata +12M /dev/mapper/VGData-thinpool1

But when I try to mount:
Dec 9 09:37:04 pplxfs13 lvm[1698]: Thin metadata VGData-thinpool1-tpool is now 99% full.

The lvresize operation had some effect:

diff dmsetup_table_post dmsetup_table_pre
11c11
< VGData-thinpool1_tmeta: 163840 98304 linear 8:16 76229298176
---
> VGData-thinpool1_tmeta: 163840 73728 linear 8:16 76229298176


In addition, the snapshots of this volume refuse to activate, so I appear to be unable to delete any of the 40 or so snapshots.

Is there anything I can do to recover from this or other things I can try to help debug the issue?

Thanks in advance,
Sean

Full logs

messages:

Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
Dec 9 08:52:22 pplxfs13 kernel: device-mapper: space map metadata: unable to allocate new metadata block
Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: metadata operation 'dm_thin_insert_block' failed: error = -28
Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: aborting current metadata transaction
Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: switching pool to read-only mode
Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 153616394
Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 153616395
Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
[...more of these...]
Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 154140676
Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
Dec 9 08:52:22 pplxfs13 kernel: JBD: recovery failed
Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-6): error loading journal
Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-14): recovery complete
Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-14): mounted filesystem with ordered data mode. Opts:
Dec 9 08:52:22 pplxfs13 kernel: JBD: recovery failed
Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-6): error loading journal


lvscan | grep thin
thin-lwfa VGData Vwi-a-tz-- 9.20t thinpool1 99.10
thin-lwfa-141025 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141026 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141027 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141028 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141029 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141030 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141031 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141101 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141102 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141103 VGData Vri---tz-k 9.00t thinpool1 thin-lwfa
thin-lwfa-141104 VGData Vri---tz-k 9.10t thinpool1 thin-lwfa
thin-lwfa-141105 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141106 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141107 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141108 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141109 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141110 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141111 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141112 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141113 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141114 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141115 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141116 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141117 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141118 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141119 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141120 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141121 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141122 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141123 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141124 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141125 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141126 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141127 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141128 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141129 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141130 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141201 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141202 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141203 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141204 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141205 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141206 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141207 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141208 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thin-lwfa-141209 VGData Vri---tz-k 9.20t thinpool1 thin-lwfa
thinpool1 VGData twi-a-tz-- 20.00t 64.07
Mike Snitzer
2014-12-09 15:05:14 UTC
Permalink
On Tue, Dec 09 2014 at 4:52am -0500,
Sean Brisbane <***@physics.ox.ac.uk> wrote:

> Hi,
>
> Last night a thin provisioned volume with snapshots failed with errors. I am not sure how to proceed with debugging this.
>
> Errors such as this written to the console:
>
> Buffer I/O error on device dm-6, logical block 153616394
> lost page write due to I/O error on dm-6
>
> I cant see what happened initially as the logs were not preserved after hard reboot. Now, when I try to mount (full logs at base of mail):
>
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
>
> So, I suspect I need to extend the metadata on the pool. The pool itself has plenty of space, and I, perhaps naively, assumed that combining metadata and data into one volume would avoid any metadata space issues.
>
> So, I tried:
>
> > ls /dev/mapper/VGData-thinpool1*
> /dev/mapper/VGData-thinpool1
> /dev/mapper/VGData-thinpool1_tdata
> /dev/mapper/VGData-thinpool1_tmeta
> /dev/mapper/VGData-thinpool1-tpool
>
> > lvresize --poolmetadata +12M /dev/mapper/VGData-thinpool1
>
> But when I try to mount:
> Dec 9 09:37:04 pplxfs13 lvm[1698]: Thin metadata VGData-thinpool1-tpool is now 99% full.
>
> The lvresize operation had some effect:
>
> diff dmsetup_table_post dmsetup_table_pre
> 11c11
> < VGData-thinpool1_tmeta: 163840 98304 linear 8:16 76229298176
> ---
> > VGData-thinpool1_tmeta: 163840 73728 linear 8:16 76229298176
>
>
> In addition, the snapshots of this volume refuse to activate, so I appear to be unable to delete any of the 40 or so snapshots.
>
> Is there anything I can do to recover from this or other things I can try to help debug the issue?
>
> Thanks in advance,
> Sean
>
> Full logs
>
> messages:
>
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: space map metadata: unable to allocate new metadata block
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: metadata operation 'dm_thin_insert_block' failed: error = -28
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: aborting current metadata transaction
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: switching pool to read-only mode
> Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 153616394
> Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
> Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 153616395
> Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
> [...more of these...]
> Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 154140676
> Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
> Dec 9 08:52:22 pplxfs13 kernel: JBD: recovery failed
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-6): error loading journal
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-14): recovery complete
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-14): mounted filesystem with ordered data mode. Opts:
> Dec 9 08:52:22 pplxfs13 kernel: JBD: recovery failed
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-6): error loading journal


You definitely ran out of metadata space. Which version of the kernel
and lvm2 userspace are you using?

See the "Metadata space exhaustion" section of the lvmthin manpage in a
recent lvm2 release to guide you on how to recover.

Also, once you've gotten past ths you really should configure lvm2 to
autoextend the thin-pool (both data and metadata) as needed in response
to low watermark, etc. See "Automatically extend thin pool LV" in
lvmthin manpage.
Sean Brisbane
2014-12-09 21:48:31 UTC
Permalink
Dear All,

Mike's suggestion to essentially RTFM was spot on. The transcript is below, with one issue that I don't understand. It seems that the size requirement for my intermediate LV needed to be larger than the metadata volume I was repairing.

This was all performed with Redhat's el6 kernel 2.6.32-504 from October this year

#At start of restore process, meta data volume in thin pool was 140M, In my previous email, I noted that this was already a large extension compard to it's size when the error initially occurred.
lvcreate -L 140M VGData -n metarestore
lvchange -a y VGData/thinpool1
thin_dump --repair /dev/mapper/VGData-thinpool1_tmeta > meta.xml
thin_restore -o /dev/mapper/VGData-metarestore -i meta.xml
##Complains that block cant be allocated in metarestore
lvextend -L+1g VGData/metarestore
thin_restore -o /dev/mapper/VGData-metarestore -i meta.xml
#works fine now
lvchange -a n /dev/mapper/VGData-thinpool1
lvconvert --poolmetadata VGData/metarestore --thinpool VGData/thinpool1
lvchange -a y VGData/thinpool1
##then activate the thin volumes too
mount -a #all working

Thanks for your help. I'll investigate auto extending metadata.

Cheers,
Sean
________________________________________
From: Mike Snitzer [***@redhat.com]
Sent: 09 December 2014 15:05
To: Sean Brisbane
Cc: linux-***@redhat.com
Subject: Re: thin provisioned volume failed - perhaps issues extending metadata?

On Tue, Dec 09 2014 at 4:52am -0500,
Sean Brisbane <***@physics.ox.ac.uk> wrote:

> Hi,
>
> Last night a thin provisioned volume with snapshots failed with errors. I am not sure how to proceed with debugging this.
>
> Errors such as this written to the console:
>
> Buffer I/O error on device dm-6, logical block 153616394
> lost page write due to I/O error on dm-6
>
> I cant see what happened initially as the logs were not preserved after hard reboot. Now, when I try to mount (full logs at base of mail):
>
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
>
> So, I suspect I need to extend the metadata on the pool. The pool itself has plenty of space, and I, perhaps naively, assumed that combining metadata and data into one volume would avoid any metadata space issues.
>
> So, I tried:
>
> > ls /dev/mapper/VGData-thinpool1*
> /dev/mapper/VGData-thinpool1
> /dev/mapper/VGData-thinpool1_tdata
> /dev/mapper/VGData-thinpool1_tmeta
> /dev/mapper/VGData-thinpool1-tpool
>
> > lvresize --poolmetadata +12M /dev/mapper/VGData-thinpool1
>
> But when I try to mount:
> Dec 9 09:37:04 pplxfs13 lvm[1698]: Thin metadata VGData-thinpool1-tpool is now 99% full.
>
> The lvresize operation had some effect:
>
> diff dmsetup_table_post dmsetup_table_pre
> 11c11
> < VGData-thinpool1_tmeta: 163840 98304 linear 8:16 76229298176
> ---
> > VGData-thinpool1_tmeta: 163840 73728 linear 8:16 76229298176
>
>
> In addition, the snapshots of this volume refuse to activate, so I appear to be unable to delete any of the 40 or so snapshots.
>
> Is there anything I can do to recover from this or other things I can try to help debug the issue?
>
> Thanks in advance,
> Sean
>
> Full logs
>
> messages:
>
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: space map metadata: unable to allocate new metadata block
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: metadata operation 'dm_thin_insert_block' failed: error = -28
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: aborting current metadata transaction
> Dec 9 08:52:22 pplxfs13 kernel: device-mapper: thin: 253:4: switching pool to read-only mode
> Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 153616394
> Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
> Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 153616395
> Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
> [...more of these...]
> Dec 9 08:52:22 pplxfs13 kernel: Buffer I/O error on device dm-6, logical block 154140676
> Dec 9 08:52:22 pplxfs13 kernel: lost page write due to I/O error on dm-6
> Dec 9 08:52:22 pplxfs13 kernel: JBD: recovery failed
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-6): error loading journal
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-14): recovery complete
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-14): mounted filesystem with ordered data mode. Opts:
> Dec 9 08:52:22 pplxfs13 kernel: JBD: recovery failed
> Dec 9 08:52:22 pplxfs13 kernel: EXT4-fs (dm-6): error loading journal


You definitely ran out of metadata space. Which version of the kernel
and lvm2 userspace are you using?

See the "Metadata space exhaustion" section of the lvmthin manpage in a
recent lvm2 release to guide you on how to recover.

Also, once you've gotten past ths you really should configure lvm2 to
autoextend the thin-pool (both data and metadata) as needed in response
to low watermark, etc. See "Automatically extend thin pool LV" in
lvmthin manpage.
Loading...