[linux-lvm] LVM thin pool advice

Xen

2017-02-15 09:25:49 UTC

Post by David Shaw
Is there some way to cap the amount of data that the snapshot can
allocate from the pool? Also, is there some way to allocate enough
metadata space that it can't run out? By way of analogy, using the
old snapshot system, if the COW is sufficiently large (larger than the
volume being snapshotted), it cannot overflow because even if every
block of the original volume is dirtied, the COW can handle all of it.
Is there some similar way to size the metadata space of a thin pool
such that overflow is "impossible"?

Personally I do not know the current state of affairs but the response
I've often got here is that there is no such mechanic and it is up to
the administrator to find out.

Maybe this is a bit ghastly to say it like this, my apologies.

I would very much like to be called wrong here.

The problem is although the LVM monitor (I think) does respond, or can
be configured to respond to a "thin pool fillup" it does so as a kind of
daemon, a watch-dog, but it is not an in-system guard.

Typically what I've found in the past is that a fill-up will just hang
your system.

So I am probably very wrong about some things so I would rather let the
developers answer.

But as you've found it, the snapshot for a thin volume is always
allocated with the same size as the origin volume. That means unless you
have double the space available, your system can crash.

I have personally once ventured -- but I am just some by-stander right
-- that a proper solution would have to involve inter-layer
communication between filesystems and block devices, but that is even
outside of the problem here. The problem as far as I can see it is that
there is very unexpected behaviour when the thin pool fills up.

Zdenek once pointed out that the allocator does not have a full map of
what is available. For efficiency reasons, it goes "in search" of the
next block to allocate. (Next extent).

It does so in response to a filesystem read or write (a write,
supposedly). The filesystem knows of no limits in the thin pool and
expects sufficient behaviour. The block layer (in this case LVM) can
respond with failure or success but I do not know how it is handled or
what results it produces when the thin pool is full and no new blocks
can be allocated.

However I expect your system to freeze when the snapshot allocates more
space than is available. I think the designated behaviour is for the
snapshot to be dropped but I doubt this happens?

After all the snapsnot might be mounted, etc?...

It seems to me the first thing to do is to create safety margins, but
then... I do not develop this thing right now :p.

I think what is required is advance-allocation where each (individual)
volume allocates a pre-defined number of blocks in advance. Then, any
out of space message from the thin volume manager would implicate the
pre-allocation and not the actual allocation for the filesystem.

You create a bit of a buffer. In time. Once the individual pool
allocator knows the thin pool is having problems, but it still has
extents available to itself that it pre-allocated, it can already start
informing the filesystem -- ideally -- that there is mayhem to be
coming.

But also it means that a snapshot could recognise problems ahead of time
and be told that it needs to start failing if a certain minimum of free
space is not to be found.

But also, all of this requires that the central thin volume manager
knows ahead of time, or in any case, at any single moment, how many
extents are available. If this is concurrently done and there are many
such allocators operating, all of them would need to operate on
synchronized numbers of available space. Particularly when space is
running out I feel there should be some sort of emergency mode where
restrictions start to apply.

It is just unacceptable to me that the system will crash when space runs
out. In case of a depleted thin pool, any snapshot should really be
discarded by default I feel. Otherwise the entire thin pool should be
readily frozen. But why the system should crash on this is beyond me.

My apologies for this perhaps petulant message. I just think it should
not be understated how important it is that a system does not crash,

and I just was indicating that in the past the message has often been
that it is _your_ job to create safety.

But this is slightly impossible. This would indicate... well whatever.

The failure case of a filled-up thin pool should not be relegated to the
shadows.

I hope to be made wrong here and good luck with your endeavour. I would
suggest that a thin pool is very sexy ;-). But thus far there are no
safeguards.

Please be advised that I do not know if such limits currently exist that
you ask of. I have just been told here that the thin snapshot is of
equal size to origin volume and there is nothing you can do about it?

Regards.