[linux-lvm] Detaching a failed cache pool

Liwei

2018-11-07 17:19:54 UTC

Post by Liwei
Hi list,
I have an LV with a corrupted cache. Looking at the cmeta using
<superblock uuid="" block_size="128" nr_cache_blocks="0" policy=""
hint_width="4">
<mappings>
metadata contains errors (run cache_check for details).
perhaps you wanted to run with --repair ?
Running with --repair just produces metadata with no mappings and
hints, which is incorrect too.
Obviously the metadata volume is corrupted beyond recovery, but I
am okay with that as it is a read-only LV and we are using the cache
for read acceleration. However, there seems to be no way for us to get
out of this state into something usable? I can't seem to remove the
cache from the LV. I've read through the documentation for lvmcache
and can't find any information about this.
At the moment I can't access the original LV anymore. Whatever I
try to do always results in "Check of pool vg/lv-cache failed
(status:1). Manual repair required!" What am I missing?
Thanks!
Liwei

Okay, just a quick response to close up this issue. There is a simple solution.

Note that this only makes sense for a writethrough cache as whatever
is in the cache should be safe to discard. I'm not sure if it makes
sense to try this for other cache types. Please do make sure you
understand what's happening before trying the following.

Also, this is based on very limited understanding I have obtained from
trying to search for a solution. Those who are more experienced please
do correct me if what I'm doing is superfluous, stupid or just
outright dangerously wrong!

Assuming the block_size of 128 sectors is correct, I simply have to
calculate the correct nr_cache_blocks and edit the generated cmeta xml
file.

Once edited, run cache_restore using the modified xml file to generate
the required metadata (albeit an empty cache). Swap the cmeta LV back
into the cache LV and it should stop complaining. At this point, I
could split off the cache pool from the origin LV and work with my
origin LV as per a normal uncached LV.

To be sure of the block_size and nr_cache_blocks, you can try creating
a dummy cache pool LV of the exact size and dump its cmeta as xml.
Assuming you're using the same kernel and lvm tools versions, the
block_size should match the corrupted one.

(Then again, I realise if we're going to thrash the cache anyway, we
don't need the block_size to be correct, just make sure that
block_size and nr_cache_blocks make sense in covering the entire cdata
LV)

Hope this helps!
Liwei