Discussion:
[linux-lvm] Thin provisioned pool errors
Timur Alperovich
2014-10-01 00:43:29 UTC
Permalink
Hi all,

We are using LVM thin in EC2 and recently bumped into an error that seems
to indicate metadata corruption. I was hoping someone on the list could
clarify what likely happened and point to what we could do to avoid this in
the future (any recent patches or other work).

The error is the following:
Sep 27 20:09:37 magtest-ferenc-307-820-3 kernel: [2076275.947135]
device-mapper: thin: dm_thin_insert_block() failed
Sep 27 20:09:37 magtest-ferenc-307-820-3 kernel: [2076275.947153] Buffer
I/O error on device dm-315, logical block 2430188
Sep 27 20:09:37 magtest-ferenc-307-820-3 kernel: [2076275.947164] Buffer
I/O error on device dm-315, logical block 2430189
Sep 27 20:09:37 magtest-ferenc-307-820-3 kernel: [2076275.947169] Buffer
I/O error on device dm-315, logical block 2430190
Sep 27 20:09:37 magtest-ferenc-307-820-3 kernel: [2076275.947174] Buffer
I/O error on device dm-315, logical block 2430191
Sep 27 20:09:37 magtest-ferenc-307-820-3 kernel: [2076275.947182] EXT4-fs
warning (device dm-315): ext4_end_bio:317: I/O error writing to inode
1305621 (offset 2429468672 size 16384 starting block 2430188)

The failure in dm_thin_insert_block() repeats 10 times and is followed by
the following message:
Sep 27 20:09:41 magtest-ferenc-307-820-3 kernel: [2076279.600905]
device-mapper: thin: dm_thin_get_highest_mapped_block returned -61

The device mapper error is repeated until the VM was taken offline and the
EBS volumes snapshotted. At this point, running thin_check, produces the
following:
examining superblock
examining devices tree
missing devices: [159, 277]
bad checksum in btree node
examining mapping tree
missing all mappings for devices: [229, 229]
bad checksum in btree nodeunknown node typeunknown node typeunknown
node typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node type
thin device 230 is missing mappings [57251, -]
invalid key

The missing mappings error is repeated for many devices.

At this point, would going through the steps of thin_dump/thin_repair
remedy this?

Is there any additional information I can get from the system to understand
what happened? I wonder if there was an issue with the underlying EBS
storage, but have no way of confirming that at the moment.

Lastly, this is Ubuntu 12.04 LTS. I did pull in the thin provisioning tools
version 0.3.1, but the kernel is 3.8.0 (3.8.0-32-generic) and LVM2 package
is:
LVM version: 2.02.98(2) (2012-10-15)
Library version: 1.02.77 (2012-10-15)

Are there known patches that we may be missing that would remedy some of
these issues?

Thank you,
Timur
Zdenek Kabelac
2014-10-01 07:45:22 UTC
Permalink
Post by Timur Alperovich
Hi all,
We are using LVM thin in EC2 and recently bumped into an error that seems to
indicate metadata corruption. I was hoping someone on the list could clarify
what likely happened and point to what we could do to avoid this in the future
(any recent patches or other work).
The device mapper error is repeated until the VM was taken offline and the EBS
examining superblock
examining devices tree
missing devices: [159, 277]
bad checksum in btree node
examining mapping tree
missing all mappings for devices: [229, 229]
bad checksum in btree nodeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node typeunknown node
typeunknown node typeunknown node typeunknown node type
thin device 230 is missing mappings [57251, -]
invalid key
The missing mappings error is repeated for many devices.
At this point, would going through the steps of thin_dump/thin_repair remedy this?
Is there any additional information I can get from the system to understand
what happened? I wonder if there was an issue with the underlying EBS storage,
but have no way of confirming that at the moment.
Lastly, this is Ubuntu 12.04 LTS. I did pull in the thin provisioning tools
LVM version: 2.02.98(2) (2012-10-15)
Library version: 1.02.77 (2012-10-15)
Are there known patches that we may be missing that would remedy some of these
issues?
I'm afraid version 3.8 is very very old for thinp usage.
I'd strongly recommend to use 3.15 or newer kernel with thinp.
(or kernel where patches from this kernel for thinp are backported)

Metadata on these newer kernels have a lot more securing checksums preventing
to do any major damage to them and also they have more hints for repair.

3.8 kernel was from 'early' days of thinp and it's been still not matured enough.

You could try to use latest thin repair tools from git repo - but I've already
seen metadata which are simply too broken to be repaired from older kernels.

Zdenek

Continue reading on narkive:
Loading...