Discussion:
[linux-lvm] Check of pool ernie/cache failed (status:1). Manual repair required!
Dennis Schridde
2018-05-12 11:30:36 UTC
Permalink
Hello!
# lvchange -ay ernie/system
Check of pool ernie/cache failed (status:1). Manual repair required!
# lvconvert --repair ernie/system
bad checksum in superblock
Repair of cache metadata volume of cache ernie/system failed (status:1).
Manual repair required!
How do I recover from this situation and repair the volume group?
For a full log, including the output of pvdisplay, vgdisplay and lvdisplay
-a, please see attached log file. If more information is necessary, please
ask.
I am using one of the infamous AMD Ryzen 2400G with an AMD B350 chipset,
which suffers from random lockups related to CPU C-states [1]. With the
recent AGESA 1.0.0.2a firmware update and the introduction of the "Power
Supply Idle Control = Typical Current Idle" setting, the system is stable,
if it boots at all. But it often takes several attempts to boot -- the
failed attempts ending in weird firmware / EFI or CPU / idle related Linux
kernel stack traces, which sometimes even require a hard reset, since
soft-reboot (ctrl+alt +del) sometimes has no effect, because init dies.
This was such a situation, where I had to reboot (ctrl+alt+del) and reset
(hard) the system several times, at the end of which everything seemed
fine, except that dracut was timing out when activating the disks.
Debugging the situation using a Fedora 28 live system resulted in the
attached log file.
--Dennis
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=196683
P.S. I already found older posts [2,3] describing a similar scenario. At the
time a recovery was impossible for the user, but it seems that the situation
improved somewhat since then. However, I am still stick, with `lvconvert --
repair` asking me to repair "manually" (whatever that means), `cache_dump --
repair` not being able to operate on non-active LVs and LVM refusing to
activate the LV as long as it has not been repaired.

[2]: https://www.redhat.com/archives/linux-lvm/2016-December/msg00013.html
[3]: https://www.redhat.com/archives/linux-lvm/2015-August/msg00008.html
Dennis Schridde
2018-05-15 08:11:36 UTC
Permalink
Hello!

In case the question comes up: Fedora 28 (the live system I am trying to use
for recovery) is using Linux 4.16.3-301.fc28 and `lvm version`:
LVM version 2.02.177(2)
Library version 1.02.146
Driver version 4.37.0

The system I was originally using to break the cache was using Linux 4.16.7-
gentoo, and `LD_LIBRARY_PATH=$PWD/lib64 ./bin/lvm version` from its initrd
reports (running on the Fedora 28 live system):
LVM version: 2.02.173(2)
Library version: 1.02.142
Driver version: 4.37.0

Could someone please give me a hint what is actually broken here? Is it just
the metadata? If so, how can that break -- shouldn't that metadata never be
modified after creating the LV? And could it not be recovered from /etc/lvm/
archive? What does "manual repair" mean in detail? Is there some way to
recover the cache? Or is it at least possible to uncache the LV forcibly, to
hopefully recover the data on the origin LV? What is your recommendation to
minimise data loss?

Best regards,
Dennis
Zdenek Kabelac
2018-05-15 11:15:55 UTC
Permalink
Hello!
In case the question comes up: Fedora 28 (the live system I am trying to use
LVM version 2.02.177(2)
Library version 1.02.146
Driver version 4.37.0
The system I was originally using to break the cache was using Linux 4.16.7-
gentoo, and `LD_LIBRARY_PATH=$PWD/lib64 ./bin/lvm version` from its initrd
LVM version: 2.02.173(2)
Library version: 1.02.142
Driver version: 4.37.0
Could someone please give me a hint what is actually broken here? Is it just
the metadata? If so, how can that break -- shouldn't that metadata never be
modified after creating the LV? And could it not be recovered from /etc/lvm/
archive? What does "manual repair" mean in detail? Is there some way to
recover the cache? Or is it at least possible to uncache the LV forcibly, to
hopefully recover the data on the origin LV? What is your recommendation to
minimise data loss?
Hi


To get direct access to metadata - you could use your latest lvm2 2.02.177
build (or even 'git master'). In these recent versions there is added support
to activate directly these 'subLVs'. So with latest lvm2 you can:

lvchange -ay vg/lv_cmeta

and then you can capture content of this LV via 'dd' into file
and compress and attach 'xz' compressed to BZ.

(dd if=/dev/vg/lv_cmeta of=/tmp/file)

----

With older lvm2 you would need to 'swap-in' temporary LV as metadata, which
has been also documented several in the list and man pages for thin-pool.
But since you already have 2.02.177 - you should be able to activate _cmeta LV
directly.

Regards


Zdenek
Dennis Schridde
2018-05-16 18:31:15 UTC
Permalink
Post by Zdenek Kabelac
To get direct access to metadata - you could use your latest lvm2 2.02.177
build (or even 'git master'). In these recent versions there is added
lvchange -ay vg/lv_cmeta
I tried that (using 2.02.177), but it does not work:
# lvchange -ay ernie/cache_cmeta
Operation not permitted on hidden LV ernie/cache_cmeta.

--Dennis
Dennis Schridde
2018-05-23 19:39:47 UTC
Permalink
Post by Dennis Schridde
Post by Zdenek Kabelac
To get direct access to metadata - you could use your latest lvm2 2.02.177
build (or even 'git master'). In these recent versions there is added
lvchange -ay vg/lv_cmeta
# lvchange -ay ernie/cache_cmeta
Operation not permitted on hidden LV ernie/cache_cmeta.
Does anyone have a suggestion how I can activate this volume in order to
extract the information Zdenek asked for?

For more context please see the start of this thread. The gist is that my
cached LV cannot be activated anymore, and `lvconvert --repair` reports:
bad checksum in superblock
Repair of cache metadata volume of cache ernie/system failed (status:1).
Manual repair required!

My most important questions are:
* What is broken?
- What information does the superblock carry / what is its purpose?
- Where is it located / which part of my disk was damaged?
- What will be the consequence of it being irrecoverably lost?
* What does "manual repair" mean in detail?
- Using a specific tool?
- Flipping bits using a hex editor?
* Is there some way to recover the cache? Or is it at least possible to
uncache the LV forcibly, to hopefully recover the data on the origin LV?
* What is your recommendation to minimise data loss?

--Dennis
Dennis Schridde
2018-05-26 10:15:35 UTC
Permalink
Post by Dennis Schridde
Post by Dennis Schridde
Post by Zdenek Kabelac
To get direct access to metadata - you could use your latest lvm2 2.02.177
build (or even 'git master'). In these recent versions there is added
lvchange -ay vg/lv_cmeta
# lvchange -ay ernie/cache_cmeta
Operation not permitted on hidden LV ernie/cache_cmeta.
Does anyone have a suggestion how I can activate this volume in order to
extract the information Zdenek asked for?
For more context please see the start of this thread. The gist is that my
bad checksum in superblock
Repair of cache metadata volume of cache ernie/system failed (status:1).
Manual repair required!
* What is broken?
- What information does the superblock carry / what is its purpose?
- Where is it located / which part of my disk was damaged?
- What will be the consequence of it being irrecoverably lost?
* What does "manual repair" mean in detail?
- Using a specific tool?
- Flipping bits using a hex editor?
* Is there some way to recover the cache? Or is it at least possible to
uncache the LV forcibly, to hopefully recover the data on the origin LV?
* What is your recommendation to minimise data loss?
How would I create a backup of the affected LVs, so that dangerous commands
would not destroy the data and get me into an unrecoverable state? Is that
possible without activating the volumes? Is it possible to force-activate the
volumes, without changing the bits on the disk -- just so that I can read from
the LV and create the backup?

--Dennis
Dennis Schridde
2018-06-05 10:08:27 UTC
Permalink
Post by Zdenek Kabelac
To get direct access to metadata - you could use your latest lvm2
2.02.177
build (or even 'git master'). In these recent versions there is added
support to activate directly these 'subLVs'. So with latest lvm2 you
lvchange -ay vg/lv_cmeta
This command was only successful in LVM 2.02.178-rc1, but failed in
2.02.177.>
Post by Zdenek Kabelac
and then you can capture content of this LV via 'dd' into file
and compress and attach 'xz' compressed to BZ.
I created a report in bugzilla [4], but since the `cache_cmeta` LV appears
to contain parts of my personal files, I cannot attach it. However, I
gained
It definitely should not. _cdata is where fragments you your data are
stored, and _cmeta contains only metadata (e.g. counters and references
to cdata.)
If there really are fragments of data in _cmeta LV, something must have
gone wrong elsewhere.
Thanks for the insight. I attached the first 4890783 bytes (i.e. those before
the blocks of seemingly unstructured data, which I saw in `hexdump -C`) of
that LV to the bug report [4]. Hopefully it contains enough information to
figure out what is broken.

[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1585670

Loading...