Discussion:
[linux-lvm] about the lying nature of thin
Xen
2016-04-28 22:37:23 UTC
Permalink
You know mr. Patton made the interesting allusion that thin provisioning
is designed to lie and is meant to lie, and I beg to differ.

Under normal operating conditions any thin volume should be allowed to
grow to its maximum V-size provided not everyone is doing that at the
same time.


There is nowhere in the thin contract something that says "this space I
have available to you, you don't have to share it".

That is like saying the basement container room used as bike and motor
space in my apartment complex is a like because if I were to fill it up,
other people couldn't use it again anymore.

The visuals clearly indicate available physical space, but I know that
if I use it, others won't be able to. It's called sharing.

In practical matters a thin volume only starts to lie when "real space"
< "virtual space" -- a condition you are normally trying to avoid.

So I would not even say that by definition a thin volume or thin volume
manager lies.

It only starts "lying" the moment real available space goes below
virtual available space, something you would normally be trying to
avoid.

Since your guarantee to your customers (for instance) is that this space
IS going to be available, you're actually lying to them by not informing
them of the condition that this guarantee can not actually be met in
some instance of time.

Thin pools do not lie by default. They lie when they cannot fulfill
their obligations, and this is precisely the reason for the idea I
suggested: to stop the lie, to be honest.

It was said (by Marek Podmaka) that you don't want customers / users to
know about the reality behind the thin pool, in some or many use cases
(liberally interpreted). That there are use cases where you don't want
the client to know about the thin nature.

But if you don't do your job right and the thin pool does start to fill
up, that starts to sound like lying to your client and saying
"everything is all right" while behind the scenes everyone is in
calamity mode.

"Is something wrong? No no, not at all".

You're usually aware that you're being lied to ;-) if you are talking to
a real human.

So basically:
* either you do your job right and nothing is the matter
* you don't do your job right but you don't tell anyone
* you don't do your job right and you own up.

Saying that thin pools habitually lie is not right. The question is not
what happens or what you do while the system is functioning as intended.
The question is what you do when that is no longer the case:

* do you inform the guest system?
* do you keep silent until shit breaks loose?

IF you had an autoextend mechanism present you could also equally well
decide to not "inform" clients as long as that was the case. After all,
if you have automatic extending configured and it is operational, then
the "real size" is actually larger than what you currently have.

In that case "virtual size < real size" does not hold or does not
happen, and there is no need to communicate anything. This is also a
question about ethics, perhaps.

Personally I like to be informed. I don't know what you do or want.

But I can think of any number of analogies or life situations where I
would definitely choose to be informed instead of being lied to.

Thin LVM does not lie by default. It may only start to lie when
conditions are no longer met.

Regards, Xen.
Marek Podmaka
2016-04-29 08:44:20 UTC
Permalink
Hello Xen,
Post by Xen
In practical matters a thin volume only starts to lie when "real space"
< "virtual space" -- a condition you are normally trying to avoid.
Thin pools do not lie by default. They lie when they cannot fulfill
their obligations, and this is precisely the reason for the idea I
suggested: to stop the lie, to be honest.
I would say that thin provisioning is designed to lie about the
available space. This is what it was invented for. As long as the used
space (not virtual space) is not greater then real space, everything
is ok. Your analogy with customers still applies and whole IT business
is based on it (over-provisioning home internet connection speed,
"guaranteed" webhosting disk space). It seems to me that disk space
was the last thing to get over- (or thin-) provisioned :)

Now I'm not sure what your use-case for thin pools is.

I don't see it much useful if the presented space is smaller than
available physical space. In that case I can just use plain LVM with
PV/VG/LV. For snaphosts you don't care much as if the snapshot
overfills, it just becomes invalid, but won't influence the original
LV.

But their use case is to simplify the complexity of adding storage.
Traditionally you need to add new physical disks to the storage /
server, add it to LVM as new PV, add this PV to VG, extend LV and
finally extend filesystem. Usually the storage part and server (LVM)
part is done by different people / teams. By using thinp, you create
big enough VG, LV and filesystem. Then as it is needed you just add
physical disks and you're done.

Another benefit is disk space saving. Traditionally you need to have
some reserve as free space in each filesystem for growth. With many
filesystems you just wasted a lot of space. With thinp, this free
space is "shared".

And regarding your other mail about presenting parts / chunks of
blocks from block layer... This is what device mapper (and LVM built
on top of it) does - it takes many parts of many block devices and
creates new linear block device out of them (whether it is stripped
LV, mirrored LV, dm-crypt or just concatenation of 2 disks).
--
bYE, Marki
Gionatan Danti
2016-04-29 10:06:34 UTC
Permalink
Post by Marek Podmaka
Hello Xen,
Now I'm not sure what your use-case for thin pools is.
I don't see it much useful if the presented space is smaller than
available physical space. In that case I can just use plain LVM with
PV/VG/LV. For snaphosts you don't care much as if the snapshot
overfills, it just becomes invalid, but won't influence the original
LV.
Let me add one important use case: have fast, flexible snapshots.

In the past I used classic LVM to build our virtualization servers, but
this means I was basically forced to use a separate volume for each VM:
using a single big volume and filesystem for all the VMs means that,
while snapshotting it for backup purpose, I/O become VERY slow on ALL
virtual machines.

On the other hand, thin pools provide much faster snapshots. On the
latest builds, I begin using a single large thin volume, on top of a
single large thin pool, to host a single filesystem that can be
snapshotted with no big slowdown on the I/O part.

I understand that it is a tradeoff - classic LVM mostly provides
contiguous blocks, so fragmentation remain quite low, while thin
pools/volumes are much more prone to fragament, but with large enough
chunks it is not such a big problem.

Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Xen
2016-04-29 13:16:12 UTC
Permalink
Post by Gionatan Danti
Let me add one important use case: have fast, flexible snapshots.
One more huge reason for using it in a desktop system.

I didn't know about the performance benefits.

I just know that providing snapshot space in advance by registering in
advance LVs for that purpose, is not a good way of working (for me, or
anyone).

Although the idea of using LVM thin to provide only a single thin volume
might be rather odd ;-).

Still, the snapshotting is clearly superior to that of traditional LVM,
right?

Regards.
Xen
2016-04-29 22:32:21 UTC
Permalink
I guess this Patton guy knows everything about everything, but I'm not
responding to him anymore.

As he sets up his business empire he leaves us all in the dust anyway.

So I guess I will just keep it to the thing I know something about,
which is talking to real people.
Mark Mielke
2016-04-30 04:46:08 UTC
Permalink
Lots of interesting ideas in this thread.

But the practical of things is that there is a need for thin volumes that
are over provisioned. Call it a lie if you must, but I want to have
multiple snapshots, and not be forced to have 10X the storage, just so that
I can *guarantee* that I will have the technical capability to fully
allocate every snapshot without running out of space. This is for my
requirements, where I am not being naive or irresponsible. I'm not
representing the situation to myself. I know exactly what to expect, and I
know that it isn't only important to monitor, but it is also important to
understand the usage patterns. For example, in some of our use cases, files
will only normally be extended or created as new, at which point the
overhead of a snapshot is close to zero.

If people find this model unacceptable, then I think they should not use
thin volumes. It's a technology choice.

We have many systems like this beyond LVM... For example, the NetApp FAS
devices we have are set up with this type of model, and IT normally
allocates 10% or more for "snapshots", and when we get this wrong, it does
hurt in various ways, usually requiring that the snapshots get dumped, and
that we figure out why the monitoring failed. Normally, IT adds to the
aggregate as it passes a threshold. In the particular case that is
important for me - we have a fixed size local SSD for maximum performance,
and we still want to take frequent snapshots (and prune them behind),
similar to what we do on NetApp, but all in the context of local storage. I
don't use the word "lie" to IT in these cases. It's a partnership, and
attempt to make the most use of the storage and the technology.

There was some discussion about how data is presented to the higher layers.
I didn't follow the suggestion exactly (communicating layout information?),
but I did have these thoughts:

1. When the storage runs out, it clearly communicates layout information
to the caller in the form of a boolean "does it work or not?"
2. There are other ways that information does get communicated, such as
if a device becomes read only. For example, an iSCSI LUN.

I didn't follow communication of specific layout information as this didn't
really make sense to me when it comes to dynamic allocation. But, if the
intent is to provide early warning of the likelihood of failure, compared
to waiting to the very last minute where it has already failed, it seems
like early warning would be useful. I did have a question about the
performance of this type of communication, however, as I wouldn't want the
host to be constantly polling the storage to recalculate the up-to-date
storage space available.
Xen
2016-05-03 13:03:37 UTC
Permalink
Post by Mark Mielke
Lots of interesting ideas in this thread.
Thank you for your sane response.
Post by Mark Mielke
There was some discussion about how data is presented to the higher
layers. I didn't follow the suggestion exactly (communicating layout
* When the storage runs out, it clearly communicates layout
information to the caller in the form of a boolean "does it work or
not?"
* There are other ways that information does get communicated, such
as if a device becomes read only. For example, an iSCSI LUN.
I didn't follow communication of specific layout information as this
didn't really make sense to me when it comes to dynamic allocation.
But, if the intent is to provide early warning of the likelihood of
failure, compared to waiting to the very last minute where it has
already failed, it seems like early warning would be useful. I did
have a question about the performance of this type of communication,
however, as I wouldn't want the host to be constantly polling the
storage to recalculate the up-to-date storage space available.
Zdenec alluded to the idea and fact that this continuous polling would
either be required or deeply ungrateful to the hardware. In the sense of
being hugely expensive. Of course I do not know everything about a
system before I start thinking. If I have an idea it is usually possible
to implement it but I only find out later down the road if this is
actually so and if it needs amending. I could not progress with life if
every idea needed to be 100% sure before I could commence with it,
because in that sense the commencing and the learning would never
happen.

I didn't know thin (or LVM) doesn't maintain maps of used blocks.

Of course for regular LVM it makes no sense if the usage of the blocks
you have allocated to a system is none of your concern at all.

The recent DISCARD improvements apparently just signal some special case
(?) but SSDs DO maintain maps or it wouldn't even work (?).

I don't know, it would seem that having a map of used extents in a thin
pool is in some way deeply important in being able to allocate unused
ones?

I would have to dig into it of course but I am sure I would be able to
find some information (and not lies ;-))).

I guess continuous polling would be deeply disrespectful of the hardware
and software resources.

In the theoretical system I proposed it would be a constant
communication between systems bogging down resources. But we must agree
we are typically talking about 4MB blocks here (and mutations to them).
In a sense you could easily increase that to 16MB, or 32MB, or whatever.

You could even update a filesystem when mutations of a thousand
gigabytes have happened.

We are talking about a map of regions and these regions can be as large
as you want.

It would say to a filesystem: these regions are currently unavailable.

You would even get more flags:

- this region is entirely unavailable
- this region is now more expensive to allocate to
- this region is the preferred place

When you allocate memory in the kernel (like with kmalloc) you specify
what kind of requirements you have.

This is more of the same kind, I guess.

Typically a thin system is a system of extent allocation, the way we
have it.

It is the thin volume that allocates this space, but the filesystem that
causes it.

The thin volume would be able to say "don't use these parts".

Or "all parts are equal, but don't use more than X currently".

Actually the latter is a false statement, you need real information.

I know in ext filesystems the inodes are scattered everywhere (and the
tables) so the blocks are already getting used, in that sense. And if
you had very large blocks that you would want to make totally
unavailable, you would get weird issues. "That's funny, I'm already
using it".

So in order to make sense they would have to be contiguous regions (in
the virtual space) that are really not used yet.

I don't know, it seems fun to make something like that. Maybe I'll do it
some day.
Xen
2016-04-29 11:53:00 UTC
Permalink
Post by Marek Podmaka
I would say that thin provisioning is designed to lie about the
available space. This is what it was invented for. As long as the used
space (not virtual space) is not greater then real space, everything
is ok. Your analogy with customers still applies and whole IT business
is based on it (over-provisioning home internet connection speed,
"guaranteed" webhosting disk space). It seems to me that disk space
was the last thing to get over- (or thin-) provisioned :)
But you see if my landlord tells me I can use the entire container room,
except that I have to share it with others, does he lie?

I *can* use the entire container room. I just have to ensure it is empty
again by the end of the day (or even sooner).

Those ISPs do not say "Every client can use the full bandwidth all at
the same time." They don't say that. They say "Fair use policies apply".
That's what they say. And they mean that no, you can't do that stuff
24/7/365.

So let's talk then about two things you can lie about:
* available space
* the thought that all of the space is available to everyone at all
times.

In a normal use case, only the latter would be a lie. But that's not
what companies tell their clients. Maybe implicitly, at times. But not
explicitly at all (hence fair use policy).

The former is not a lie. If you have a 1000 customers, and each has 50GB
available total, and the average use at this point is 25GB, and you have
provisioned for ~35GB each, meaning 35000 GB is available and 25000 is
in use, then it is not a lie to say to any individual customer: you can
use 50GB if you want.

The guarantee that everyone can do it all at the same time, just doesn't
hold, but that is never communicated.

As a customer you are not aware of how many other clients there are, or
how many other thin volumes (ordinarily) or what the max capacity is
across all the volumes. So you are not being lied to.

For it to be a lie, you would have to be concerned about the total
picture. You would have to have an awareness of other clients and then
you would need to make the assumption that all of these clients at the
same time can use all of that bandwidth/data/space.

But your personal scenario doesn't extend that far.

Just as a funny example. Nearby there was a supermarket that advertized
with that (to my mind) stupid thought "if there are more than 4
customers in line, and you are the 5th, you get your groceries for
free".

What did a local student's house do? They went to the supermarket with
about 20 people and got a lot of stuff free.

I mean in statistics you have queue calculations too but it gets
defeated if people start doing that stuff (thwarting the mechanism on
purpose). For example, the traditional statistics example is that of
customers at a hairsalon. Based on a certain distribution and an average
number of new arrivals, a conclusion is reached and certain data is
found.

But this data is thwarted the moment customers on purpose start to pile
up just to thwart this data, you get what I mean?

Any /intentional/ purpose to thwart the average, means it is no longer
the average.

Normal people wanting a haircut do not show up at a salon to thwart the
salons calculations. Ordinary use cases do not apply to this.

If you can expect a command normal amount of use, then there is no
"intent" with those clients to be doing anything out of the ordinary.

Just like that "hairsalon" can normally depend on those "calculations"
(you could, you know) and provision for that (number of employees
present) so too can a thin provisioning setup depend on expected
averages (in a distribution, the "expected" value of a stochast is the
expected average) (as a prediction in that sense).

There's no lying in that. If this hairsalon now says "You can get cut
within 10 minutes without an appointment" then yes people could thwart
that by suddenly all showing up at the same time.

Doesn't work like that in reality when people do not have such
intentions.

We call that "innocence" ;-) not doing something on purpose.

That hairsalon is not lying if it guarantees 10 minute wait time in
general. It just cannot guarantee it if people start to bugger.

Statistics is all about averages and large numbers.

"A "law of large numbers" is one of several theorems expressing the idea
that as the number of trials of a random process increases, the
percentage difference between the expected and actual values goes to
zero."

That means that if you have enough numbers (enough thin volumes) the
likelihood in actuality between what you promise and what you can
deliver, the difference goes to zero and in effect you are always
speaking the truth.

Remember: you are speaking the truth given normal expected reality.
You are no longer speaking the truth if people start to mess with you on
purpose.

If you have 10.000 clients and 5.000 of them are one person intending to
bug you out, just like in the supermarket example, well, then you've
lost. But, that is an intentional devious thing to do just in order to
make use of some monetary loophole in the system, so to speak.

And in general your terms of use could guard against that (and many
companies do, I'm sure).
Post by Marek Podmaka
Now I'm not sure what your use-case for thin pools is.
Presently maximizing space efficiency across a small number of volumes,
as well as access to superior snapshotting ability.
Post by Marek Podmaka
I don't see it much useful if the presented space is smaller than
available physical space. In that case I can just use plain LVM with
PV/VG/LV. For snaphosts you don't care much as if the snapshot
overfills, it just becomes invalid, but won't influence the original
LV.
You mean there'd not be any use for thin, right. I agree. The whole idea
is to be more efficient with space.

If the presented space is smaller than you HAVE room for those
snapshots. But with thin, you don't need to care.

Space is always there.
Post by Marek Podmaka
But their use case is to simplify the complexity of adding storage.
Traditionally you need to add new physical disks to the storage /
server, add it to LVM as new PV, add this PV to VG, extend LV and
finally extend filesystem. Usually the storage part and server (LVM)
part is done by different people / teams. By using thinp, you create
big enough VG, LV and filesystem. Then as it is needed you just add
physical disks and you're done.
True but let's call it "sharing" resources.

Sharing resources is the whole idea of any advanced society.

Our western mindset doesn't work in the sense of everyone needing to be
able to possess everything.

The example was given that everyone owns a car, that they may not use
every day, a washing machine, that they may use 5 hours a week, a vacuum
cleaner, that they may use 1 hour a week, and so on and so on. The
example was given that a commercial airliner could *never* do something
like that.

Commercial airplanes are in operation pretty much 24/7. Disuse is way
too costly. They cannot afford to not use their machines 24/7.

Our society cannot either, but the way we live and operate with each
other currently ensures vasts amounts of wasted materials, energy and so
on.

Resource sharing is an advanced concept in that sense. Let's just call
thin pools an advanced concept :p.

And let's not call it a lie just like that :) :P.
Post by Marek Podmaka
Another benefit is disk space saving. Traditionally you need to have
some reserve as free space in each filesystem for growth. With many
filesystems you just wasted a lot of space. With thinp, this free
space is "shared".
My reason exactly.
Post by Marek Podmaka
And regarding your other mail about presenting parts / chunks of
blocks from block layer... This is what device mapper (and LVM built
on top of it) does - it takes many parts of many block devices and
creates new linear block device out of them (whether it is stripped
LV, mirrored LV, dm-crypt or just concatenation of 2 disks).
I know. But that is the reverse thing.

DM/LVM takes dispersed stuff and presents a whole.

In this case we were talking about presenting holes.

That's because in this case .....

If you are that barber/haircutter and suddenly you get an influx of
clients you cannot handle.

Are you going to put up a sign saying "sorry, too busy" or are you going
to try to keep your "promise" to each and every one of them? I hope you
didn't offer financial compensation in that sense ;-).

Personally I think that as a client you making use of such "financial
promises" is very intolerant and unforgiving and greedy and even
avaricious ;-).

So what if your thin pool does fill up and you have no measure in place
to handle it?

Are you going to be honest?

This question is not whether thin is currently lying. This is about
whether you will continue to choose for it to lie.

It is not about the present. It is about the choice you are going to
make.

Do you choose to lie or not?

Traditionally companies have always tried to keep up the pretense until
all hell broke loose so badly that it spilled out like a tidal wave.

You can find any number of examples in the history of our world. I am
currently thinking of the Exxon Valdez, and Enron. I don't know if that
is applicable. Also thinking of that platform in recent times, of BP.
Deepwater Horizon, which was said to have been deeply undermaintained.

I mean you can keep pretending everything is going just perfect, or you
can own up a little sooner. That is a choice to make for each individual
I guess.
Chris Friesen
2016-04-29 20:37:00 UTC
Permalink
Post by Marek Podmaka
Now I'm not sure what your use-case for thin pools is.
I don't see it much useful if the presented space is smaller than
available physical space. In that case I can just use plain LVM with
PV/VG/LV. For snaphosts you don't care much as if the snapshot
overfills, it just becomes invalid, but won't influence the original
LV.
One useful case for "presented space equal to physical space" with thin volumes
is that it simplifies security issues.

With raw LVM volumes I generally need to zero out the whole volume prior to
deleting it (to avoid leaking the contents to other users). This takes time,
and also seriously hammers the disks when you have multiple volumes being zeroed
in parallel.

With thin, deletion is essentially instantaneous, and the zeroing penalty is
paid when the disk block is actually written. Any disk blocks which have not
been written are simply read as all-zeros.

Chris
matthew patton
2016-04-29 15:45:31 UTC
Permalink
Post by Xen
~35GB each, meaning 35000 GB is available and 25000 is
in use, then it is not a lie to say to any individual customer: you can
use 50GB if you want.
If enough of your so-called customers decide to use the space you promised them AND THAT THEY PAID FOR and instead they get massive data loss and outages, you can bet your hiney they'll sue you silly.

If you want to play fast and loose in your basement that's one thing - Thin-away. If you try to pull a similar stunt in a commercial setting you either do your homework and put all necessary safeguards in place to prevent customer demand from overwhelming your cheap-sh*t corner cutting, or better have an attorney on retainer and budgeted for breach of contract settlements.
Post by Xen
hold, but that is never communicated.
Then you sir, will no doubt find yourself in front of a magistrate for no less than false representation. If the storage capacity you SOLD is not also explained in the terms of service that it doesn't really exist and that if they (or anyone they are unlucky enough to be co-located with) just so happen to write too fast to their storage that they may well lose their data.
Post by Xen
As a customer you are not aware of how many other clients there are, or
how many other thin volumes (ordinarily) or what the max capacity is across all the
volumes. So you are not being lied to.
I strongly suggest you go take a class on contract law (since OS basics is apparently beyond your grasp) and familiarize yourself with your country's prison conditions. At the very least, go talk to an attorney and pay him a consultation fee.

As to the rest of your message, perhaps you'd get more insight and traction by doing your own blog to wax philosophical over cheating paying customers, engaging in data-losing computing practices, and being too cheap, lazy, or opinionated to run a responsible service. And for trotting out example after example of non-computer social conditions as if they had any relevance to the matter at hand.

Now if you have something useful to say/ask about LVM, please continue.
Mark H. Wood
2016-05-02 13:18:12 UTC
Permalink
Post by matthew patton
Post by Xen
~35GB each, meaning 35000 GB is available and 25000 is
in use, then it is not a lie to say to any individual customer: you can
use 50GB if you want.
If enough of your so-called customers decide to use the space you promised them AND THAT THEY PAID FOR and instead they get massive data loss and outages, you can bet your hiney they'll sue you silly.
Executive summary: you shouldn't just take a wild guess and then turn
your back on a thin-provisioned setup; you must understand your
consumers and monitor your resources.

It's reasonable in certain circumstances for a service provider to
over-subscribe his hardware. He would be well advised to monitor
actual allocation closely, to keep some cash or ready credit on hand
for quick expansion of his real hardware, and to respond promptly by
adding capacity when usage nears real hardware limits. He is taking a
risk, betting that most customers won't max out their promised
storage, and should manage that risk. Indeed, he should first gather
statistics to understand the behavior of typical customers and
determine whether he would be taking a *foolish* risk.

Failure to adequately manage resources to redeem contracted promises
is the provider's lie, not LVM's. Failure to plan is planning to
fail.

If that's too scary, don't use thin provisioning.
--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu
Xen
2016-05-03 11:57:26 UTC
Permalink
Post by Mark H. Wood
Failure to adequately manage resources to redeem contracted promises
is the provider's lie, not LVM's. Failure to plan is planning to
fail.
Exactly. And it starts being a lie when resources don't outlast use, and
in some way the provider doesn't own up to that but let's it happen.

That is irrespective however of the thought that choosing to or not
choosing to communicate any part of that when it does happen or would
happen, is a choice you can make and it doesn't take away from thin
provisioning at all.

If you feel you can always meet your expectations and those of your
clients and work hard to achieve that, you may never run into the
situation. However if you do run into the situation the choice becomes
how to deal with that.

You can also make a proactive choice in advance to either then be open,
or to stick your head in the sand, as they proverbially say.

I bet many contingency plans used in business everywhere have choices
surrounding this being made in advance. When do we alert the public.
When do we open up. When does it go so far that we cannot hide it
anymore.

In Dutch we call this "keeping in the dirty laundry" -- you only take
the clean laundry out to dry (on a line). It is quite customary and
usual for a human being not to want to give insight into private matters
that might only confuse the other person.

At the same time there is also the question of when to own up to stuff
that is actually important to another person and I think this is a
question of ethics.

Sometimes people are not harmed by not knowing things, but you would be
harmed by them knowing it.
Sometimes people are harmed by not knowing things, and you are not
harmed by them knowing it.

I think that if we are talking about a business setting where you have
promised a certain thing to people who are now depending on it, that the
thing shifts in the direction of the second statement.

If you have a contractual responsibility to deliver, you also have a
contractual responsibility to inform. That is my opinion on the subject,
at least.
matthew patton
2016-05-03 13:43:57 UTC
Permalink
Post by Xen
I didn't know thin (or LVM) doesn't maintain maps of used blocks.
Right, so you're ignorant of basics like how the various subsystems work. Like I said, go find a text on OS and filesystem design. Hell, read the EXT and LVM code or even just the design docs.
Post by Xen
The recent DISCARD improvements apparently just signal some special case
(?) but SSDs DO maintain maps or it wouldn't even work (?).
Again, read up on the inner workings of SSDs. To over-simplify, SSDs have their own "LVM". No different really than a hardware RAID controller does - admittedly most raid controllers don't do anything particularly advanced.
Post by Xen
I don't know, it would seem that having a map of used extents in a thin
pool is in some way deeply important in being able to allocate unused
ones?
clearly you are in need of much more studying. LVM knows exactly out of all of it's defined extents which ones are free and which ones have been assigned to an LV - aka written to. What individual blocks (aka range of bytes) inside those extents have FS-managed data in them it knows not nor does it care.
Post by Xen
I guess continuous polling would be deeply disrespectful of the hardware
and software resources.
Not to mention instantaneously invalid. So you poll LVM, "what is your allocation map and do you have any free extents?" You get the results. Then the FS having been assured there is free space issues writes. But oh no, in the round-trip some other LV has grabbed the extent you had intended to use! IO=FAIL.

The ONLY way for a FS to "reserve" a set of blocks (aka extent) to itself is to write to it - but mind the FS has NO IDEA if needs to do an reservation in the first place nor if this IO just so happens to fit inside the allocated range but the next IO at offset +1 will require a new extent to be allocated from the THINP.

I haven't checked, but it's perfectly possible for LVM THINP to respond to FS issued DISCARD notices and thus build an allocation map of an extent. And should an extent be fully empty to return the extent to the thin pool. Only to have to allocate a new extent if any IO hits the same block range in the future. This kind of extent churn is probably not very useful unless your workload is in the habit of writing tons of data, freeing it and waiting a reasonable amount of time and potentially doing it again. SSDs resort to it because they must - it's the nature of the silicon device itself.
Post by Xen
It would say to a filesystem: these regions are currently unavailable.
- this region is entirely unavailable
- this region is now more expensive to allocate to
- this region is the preferred place
All of this "inside knowledge" and "coordination" you so desperately seem to want is called integration. And again spelled BTRFS and ZFS. et. al.
Post by Xen
In the theoretical system I proposed it would be a constant
yeah, have fun with that theoretical system.

...

Xen, dude seriously. Go do a LOT more reading.
Xen
2016-05-03 17:42:03 UTC
Permalink
Post by matthew patton
Post by Xen
I didn't know thin (or LVM) doesn't maintain maps of used blocks.
Right, so you're ignorant of basics like how the various subsystems
work. Like I said, go find a text on OS and filesystem design. Hell,
read the EXT and LVM code or even just the design docs.
Why don't you do it for me and then report back? I could use a slave
like you are trying to make me.
Post by matthew patton
Post by Xen
The recent DISCARD improvements apparently just signal some special case
(?) but SSDs DO maintain maps or it wouldn't even work (?).
Again, read up on the inner workings of SSDs. To over-simplify, SSDs
have their own "LVM". No different really than a hardware RAID
controller does - admittedly most raid controllers don't do anything
particularly advanced.
It almost seems like you want me to succeed.
Post by matthew patton
clearly you are in need of much more studying. LVM knows exactly out
of all of it's defined extents which ones are free and which ones have
been assigned to an LV - aka written to. What individual blocks (aka
range of bytes) inside those extents have FS-managed data in them it
knows not nor does it care.
Then what is the issue here? That means my assumptions were all entirely
correct and that Zdenek has said must have been false.

But what you are saying now is extent assignments to LVs, do you imply
this is also true of assignment to thin volumes?

Yes, when you say "written to" you clearly mean thin pools.

I never alluded that it needed to know or care about the actual usage of
its blocks (extents).

If a filesystem DISCARDs blocks, then with enough blocks it could
discard an extent.

I don't even know what will happen if a filesystem stops using the data
that's on it, but I will test that now. And of course it should just
free those blocks. It didn't work with mkswap just now, but creating a
new filesystem causes lvs to report a lower thin pool usage.

Of course, common and commonsensical. So these extents are being
librated right? And it knows exactly how many are in use?
Post by matthew patton
Thin pool is not constructing 'free-maps' for each LV all the time -
that's why tools like 'thin_ls' are meant to be used from the
user-space.
It IS very EXPENSIVE operation.
It is saying that e.g. lvs creates this free-map.

But LVM needs to know at every moment in time what extents are
available. It also needs to runtime liberate them.

So it needs to be able to at least search for free ones and if none is
found, to report that or do something with it. Of course that is
different from having a map.

But in-the-moment update operations to filesystems would not require a
map. They would require mutations being communicated. Mutations that LVM
already knows about.

So it is nothing special. You don't need those "maps". You need to
communicate (to other thin volumes) which extents have become
unavailable. And which have become available once more.

Then the thin volume translates this (possibly) to whatever block system
the underlying filesystem uses.

Logical blocks, physical blocks.

The main organisation principle is the extent. It is not the LVM that
needs to maintain a map. It is the filesystem.

It needs to know about its potential for further allocation of the block
space.
Post by matthew patton
Post by Xen
I guess continuous polling would be deeply disrespectful of the hardware
and software resources.
Not to mention instantaneously invalid. So you poll LVM, "what is your
allocation map and do you have any free extents?" You get the results.
Then the FS having been assured there is free space issues writes. But
oh no, in the round-trip some other LV has grabbed the extent you had
intended to use! IO=FAIL.
You know those contentions issues are everywhere and in the kernel also
and they are always taken care of.

Don't confront me with a situation that has already been solved by
numerous other people.

You forget, for once, that real software systems running on the
filesystem would be aware of the lack of space to begin with. You are
now approaching a corner case where the last free extent is being
contended for. I am sure there would be an elegant solution to that.

This corner case is not what it's all about. What it's about is that the
filesystem has the means to predict what is going to happen, or at least
the software running on it.

If the situation you are describing is really an issue, you could simply
reserve a last block (extent) for this scenario that is only written to
if all other blocks are taken, and each filesystem (volume) has this
free block of its own.

PROBLEM SOLVED.

You sound like Einstein when he tried to disprove Bohr's theory at that
convention. In the end Bohr refuted everything and he (Einstein) had to
accept he was right.

A filesystem will simply reserve the equivalent of an extent. More
importantly, the thin volume (logical volume) will. The thin LV will
reserve one last extent in advance from the thin pool that is only
really given to the filesystem under conditions that the entire thin
pool is already taken now and the filesystem is still issueing a write
to a new block because of a race condition that prevented it from
knowing about the space issue.

These are not difficult engineering problems.
Post by matthew patton
The ONLY way for a FS to "reserve" a set of blocks (aka extent) to
itself is to write to it - but mind the FS has NO IDEA if needs to do
an reservation in the first place nor if this IO just so happens to
fit inside the allocated range but the next IO at offset +1 will
require a new extent to be allocated from the THINP.
If you write to a full extent, you are guaranteed to get a new one. It's
not more difficult than that. Don't make everything so difficult.

I have not talked about reservations myself (prior to this). As we just
said, if it is only about the very last block of the entire thin pool?
Reserve it in advance and don't let the FS do it?

If the race condition is such that larger amounts are needed for safety,
do it? Reserve 200MB in advance if you need it?

You could configure a thin pool / volume to reserve a certain amount of
free space that is only going to be used if the thin pool is 100% filled
and it wasn't possible to inform the file systems fast enough.

Proportional to the size of the volume (LV). Who cares if you reserve 1%
in each volume for this. Or less. A 2TB volume with 1GB of reserved
space is not so bad, is it?

That's just 0.05% give or take.

If then free space is reported to the filesystem, it can:

1) simply inform programs by way of its normal operation
2) stop writing when the space known to it is gone
3) not have to worry about anything else because race conditions are
taken care of.

In the event that a filesystem starts randomly writing a single byte to
every possible block in order to defeat this system.

The filesystem can redirect these writes to other blocks when the LVM
starts reporting no block for you and the filesystem still has space in
the blocks it has.

It will just have to invalidate some of its own blocks (extents). IT
needs to maintain a map, not LVM.

It can deduce its own free space from its own map.

It would be like allocating a thin (sparse) file but then writing to
every possible address of it along the range. Yes the system is going to
bug but you can take care of it. Some writes will just fail when out of
blocks, but the filesystem can redirect it, or in the end just fail
writing / allocating.

Any block being invalidated would instantly update its free space
calculations.

You don't need to communicate full maps unless you were creating a new
filesystem or trying to recover from corruption. You would query "is
this block available" for instance. That would require a new command. It
would take a while but that way the filesystem could reconstruct the
blockmap.

Or it could query about ranges of blocks.

This querying is the first thing you'd introduce. Blocks N to M, are
they available? Yes or not. Or a list of the ones that are and the ones
that aren't (a bitmap).

To query 2000 extents you only need 2000 bits. That's 250 bytes, not a
whole lot. A 2 TB volume would have a free map of 64k bytes.
Do you imagine how small this is?

How would maintaining free maps be an expensive operation, really?

You need a fucking 64k field with a xor operation. That fits inside a
16-bit 8086 segment.

I mean don't bullshit me here. There is no way it could be hard to
maintain free maps.

I'm a programmer too, you know. And I have been doing it since 1989 too.

I have programmed in pascal and assembler and I have studied Java's
BitMap class for instance. It can be done very elegantly.

Any free map the thin LV would conjure up would be a lie in that sense,
a choice. Because you would arbitrarily invalidate blocks at the end of
the space.

At the end of the virtual space.

The pool communicates to the volume the acquisition and release of new
and old extents.

The volume at that point doesn't care which they are. It only needs to
know the number.

With every mutation it randomly invalidates a single block if it needs
to (or enables it again).

It sets a bit flag in a 64k field. So let's assume we have a 1PB volume.
A petabyte. That's 2^50 / 2^20 / 4 number of extents, is 2^28 bits is
2^25 bytes. Is 2^5 megabytes is 32MB worth of data.

For a volume with 1125899906842624 bytes. Just needs 33554432 bytes to
maintain a map, if done in 4MB extents.

If done in 4KB blocks the extent communication remains the same but it
could amount to 1024x that number of bytes needed, is 32GB for a PB
volume.

Is still 1/32678 of its available address space, so to speak.

But the filesystem could maintain maps of extents and not individual
'blocks'.

Maybe 32GB is hard to communicate, but 32MB is not. And there are
systems that have a terabyte of ram.
Post by matthew patton
I haven't checked, but it's perfectly possible for LVM THINP to
respond to FS issued DISCARD notices and thus build an allocation map
of an extent. And should an extent be fully empty to return the extent
to the thin pool.
I don't know how it is done currently, because clearly the system knows,
right?

As you say this is perfeclty possible.
Post by matthew patton
Only to have to allocate a new extent if any IO hits
the same block range in the future. This kind of extent churn is
probably not very useful unless your workload is in the habit of
writing tons of data, freeing it and waiting a reasonable amount of
time and potentially doing it again. SSDs resort to it because they
must - it's the nature of the silicon device itself.
Unused blocks need to be made available anyway. A filesystem on which
80% of data is deleted, and still using all those blocks in the thin
pool? Please tell me this isn't reality (I know it isn't).


So I make this test right I am just curious what will happen:

1. Create thin pool on other hard disk 400M
2. Create 3 thin volumes totalling 600M
3. Create filesystems (ext3) and mount them.
4. Copy 90MB file to them. After 4 files 360MB of pool is used.
5. Copy 5th file. Nothing happens. No errors, nothing.
6. Copy 6th file. Nothing happens. No errors, nothing.

7. I check volumes. Nothing seems the matter. Lvdisplay no unusual.

df works and appears as though everything is normal. All volumes now 97%
filled and pool 100% filled.

Can't last right. I see kernel block device page errors come by.

I go to one of the files that should have been successfully written (the
4th file). I try to copy it to my main disk.

cp hangs. Terminal (tty) switching still works. Vim (I had vim open in 2
ttys or 3) stops responding. Alt-7 (should open KDE) nothing happens.
Cannot switch back, ie. cannot switch TTYs anymore. System hangs
completely.

Mind you this was on a harddisk with no used volumes. No other volumes
were mounted other than those 3 although of course they were loaded in
LVM.

There are no dropped volumes. There are no frozen volumes. The system
just crashes. Very graceful I must say.

I mean if this is the best you can do?

No wonder you are suggesting every admin needs to hire a drill
instructor to get him through the day.
Post by matthew patton
Post by Xen
It would say to a filesystem: these regions are currently unavailable.
- this region is entirely unavailable
- this region is now more expensive to allocate to
- this region is the preferred place
All of this "inside knowledge" and "coordination" you so desperately
seem to want is called integration. And again spelled BTRFS and ZFS.
et. al.
BTRFS is spelled "monopoly" and "wants to be all" and "I'm friends with
SystemD" ;-).

ZFS I don't know, I haven't cared about it. All I see on IRC is people
talking about it like some new toy they desperately can't live without
even though it doesn't serve them any real purpose.

A bit like a toy drone worth 4k dollars.

The only thing that changes is that filesystems maintain bitmaps of
available sectors/blocks or of extents, and are capable of intelligently
allocating to the ones they have and that are available.

That's it!

You can still choose what filesystem to use. You could even choose what
volume manager to use.

We have seen how little data it costs if the extent size is at least
4MB.
We have seen how easy it would be to query again with the underlying
layer in case you're not sure.

If you want a block to have more bits, easy too! If you have only 4
possible states, you can put it in 2 bits.

That would probably be enough for any probably use case. A 2TB volume
costs 128k bytes for this bitmap with 4 states. That's something you can
achieve on a 286 if you are crazy enough.
Post by matthew patton
yeah, have fun with that theoretical system.
Why won't you?
Post by matthew patton
Xen, dude seriously. Go do a LOT more reading.
I am being called by name :O! I think she likes me.
Linda A. Walsh
2016-05-10 21:47:38 UTC
Permalink
Post by Xen
You know mr. Patton made the interesting allusion that thin
provisioning is designed to lie and is meant to lie, and I beg to differ.
----
Isn't using a thin memory pool for disk space similar to using
a virtual memory/swap space that is smaller than the combined sizes of all
processes?

I.e. Administrators can choose to decide whether to over-allocate
swap or paging file space or to have it be a hard limit -- and forgive me
if I'm wrong, but isn't this a configurable in /proc/sys/vm with the
over-commit parms (among others)?

Doesn't over-commit in the LVM space have similar checks and balances
as over-commit in the VM space? Whether it does or doesn't, shouldn't
the reasoning be similar in how they can be controlled?

In regards to LVM overcommit -- does it matter (at least in the short
term), if that over-committed space is filled with "SPARSE" data files?.
I mean, suppose I allocate space for astronomical bodies -- in some
areas/directions, I might have very SPARSE usage, vs. towards the core
of a galaxy, I might expect less sparce usage.

If a file system can be successfully closed with 'no errors' --
doesn't that still mean it is "integrous" -- even if its sparse files
don't all have enough room to be expanded?

Does it make sense to think about a OOTS (OutOfThinSpace) daemon that
can be setup with priorities to reclaim space?

I see 2 types of "quota" here. And I can see the metaphor of these
types being extended into disk space: Direct space, that physically
present, and "indirect or *temporary* space" -- which you might try to
reserve at the beginning of a job. Your job could be configured to wait
until the indirect space is available, or die immediately. But
conceivably indirect space is space on a robot-cartridge retrieval
system that has huge amount of virtual space, but at the cost of needing
to be loaded before your job can run.

Extending that idea -- the indirect space could be configured as
"high priority space" -- meaning once it is allocated, it stays
allocated *until* the job completes (in other words the job would have a
low chance of being "evicted" by an OOTS damon), vs. most "extended
space would have the priority of "temporary space" -- with processes
using large amounts of such 'indirect space and having a low expectation
of quick completion being high on the oots-daemon's list?

Processes could also be willing to "give up memory and suspend" --
where, when called, a handler could give back Giga-or Tera bytes of memory
and save it's state as needing to restart the last pass.

Lots of possibilities -- if LVM-this space is managed like
memory-virtual space. That means some outfits might choose to never
over-allocate, while others might allow fraction.

From how it sounds -- when you run out of thin space, what happens
now is that the OS keeps allocating more Virtual space that has no
backing store (in memory or on disk)...with a notification buried in a
system log
somewhere.

On my own machine, I've seen >50% of memory returned after
sending a '3' to /proc/sys/vm/drop_caches -- maybe similar emergency
measures could help in the short term, with long term handling being as
similarly flexible as VM policies.

Does any of this sound sensible or desirable? How much effort is
needed for how much 'bang'?
Xen
2016-05-10 23:58:27 UTC
Permalink
Hey sweet Linda,

this is beyond me at the moment. You go very far with this.
Post by Linda A. Walsh
Isn't using a thin memory pool for disk space similar to using
a virtual memory/swap space that is smaller than the combined sizes of all
processes?
I think there is a point to that, but for me the concordance is in the
idea that filesystems should perhaps have different modes of requesting
memory (space) as you detail below.

Virtual memory typically cannot be expanded (automatically) although you
could.

Even with virtual memory there is normally a hard limit, and unless you
include shared memory, there is not really any relation with
overprovisioned space, unless you started talking about prior allotment,
and promises being given to processes (programs) that a certain amount
of (disk) space is going to be available when it is needed.

So what you are talking about here I think is expectation and
reservation.

A process or application claims a certain amount of space in advance.
The system agrees to it. Maybe the total amount of claimed space is
greater than what is available.

Now processes (through the filesystem) are notified whether the space
they have reserved is actually going be there, or whether they need to
wait for that "robot cartridge retrieval system" and whether they want
to wait or will quit.

They knew they needed space and they reserved it in advance. The system
had a way of knowing whether the promises could be met and the requests
could be met.

So the concept that keeps recurring here seems to be reservation of
space in advance.

That seems to be the holy grail now.

Now I don't know but I assume you could develop a good model for this
like you are trying here.

Sparse files are difficult for me, I have never used them.

I assume they could be considered sparse by nature and not likely to
fill up.

Filling up is of the same nature as expanding.

The space they require is virtual space, their real space is the
condensed space they actually take up.

It is a different concept. You really need two measures for reporting on
these files: real and virtual.

So your filesystem might have 20G real space.
Your sparse file is the only file. It uses 10G actual space.
Its virtual file size is 2T.

Free space is reported as 10G.

Used space is given two measures: actual used space, and virtual used
space.

The question is how you store these. I think you should store them
condensed.

As such only the condensed blocks are given to the underlying block
layer / LVM.

I doubt you would want to create a virtual space from LVM such that your
sparse files can use a huge filesystem in a non-condensed state sitting
on that virtual space?

But you can?

Then the filesystem doesn't need to maintain blocklists or whatever, but
keep in mind that normally a filesystem will take up a lot of space in
inode structres and the like, when the filesystem is huge but the actual
volume is not.

If you create one thin pool, and a bunch of filesystems (thin volumes)
of the same size, with default parameters, your entire thin pool will
quickly fill up with just metadata structures.

I don't know. I feel that sparse files are weird anyway, but if you use
them, you'd want them to be condensed in the first place and existing in
a sort of mapped state where virtual blocks are mapped to actual blocks.
That doesn't need to be LVM and would feel odd there. That's not its
purpose right.

So for sparse you need a mapping at some point but I wouldn't abuse LVM
for that primarily. I would say that is 80% filesystem and 20% LVM, or
maybe even 60% custom system, 20% filesystem and 20% LVM.

Many games pack their own filesystems, like we talked about earlier
(when you discussed inefficiency of many small files in relation to 4k
block sizes).

If I really wanted sparse personally, as an application data storage
model, I would first develop this model myself. I would probably want to
map it myself. Maybe I'd want a custom filesystem for that. Maybe a
loopback mounted custom filesystem, provided that its actual block file
could grow.

I would imagine allocating containers for it, and I would want the
"real" filesystem to expand my containers or to create new instances of
them. So instead of mapping my sectors directly, I would want to map
them myself first, in a tiered system, and the filesystem to map the
higher hierarchy level for me. E.g. I might have containers of 10G each
allocated in advance, and when I need more, the filesystem allocates
another one. So I map the virtual sectors to another virtual space, such
that my containers

container virtual space / container size = outer container addressing
container virtual space % container size = inner container addressing

outer container addressing goes to filesystem structure telling me (or
it) where to write my data to.

inner container addressing follows normal procedure, and writes "within
a file".

so you would have an overflow where the most significant bits cause
container change.

At that point I've already mapped my "real" sparse space to container
space, its just that the filesystem allows me to address it without
breaking a beat.

What's the difference with a regular file that grows? You can attribute
even more significant bits to filesystem change as well. You can have as
many tiers as you want. You would get "falls outside of my jurisdiction"
behaviour, "passing it on to someone else".

LVM thin? Hardly relates to it.

You could have addressing bits that reach to another planet ;-) :).
Post by Linda A. Walsh
If a file system can be successfully closed with 'no errors' --
doesn't that still mean it is "integrous" -- even if its sparse files
don't all have enough room to be expanded?
Well that makes sense. But that's the same as saying that a thin pool is
still "integrous" even though it is over-allocated. You are saying the
same thing here, almost.

You are basically saying: v-space > r-space == ok?

Which is the basic premise of overprovisioning to begin with.

With the added distinction of "assumed possible intent to go and fill up
that space".

Which comes down to:

"I have a total real space of 2GB, but my filesystem is already 8GB.
It's a bit deceitful, but I expect to be able to add more real space
when required."

There are two distinct cases:

- total allotment > real space, but individual allotments < real space
- total allotment > real space, AND individual allotments > real space

I consider the first acceptable. The second is spending money you don't
have.

I would consider not ever creating an indvidual filesystem (volume) that
is actually bigger (ON ITS OWN) than all the space that exists.

I would never consider that. I think it is like living on debt.

You borrow money to buy a house. It is that system.

You borrow future time.

You get something today but you will have to work for it for a long
time, paying for something you bought years ago.

So how do we deal with future time? That is the question. Is it
acceptable to borrow money from the future?

Is it acceptable to use space now, that you will only have tomorrow?
Post by Linda A. Walsh
If a file system can be successfully closed with 'no errors' --
doesn't that still mean it is "integrous" -- even if its sparse files
don't all have enough room to be expanded?
If your sparse file has no intent to become non-sparse, then it is no
issue.

If your sparse file already tells you it is going to get you in trouble,
it is different.

This system is integrous depending on planned actions.

Same is true for LVM now. The system is safe until some program decides
to allocate the entire filesystem.

And there are no checks and balances, the system will just crash.

The peculiar condition is that you have built a floor. You have a floor,
like a circular area of a certain surface area. But 1/3 of the floor is
not actually there.

You keep telling yourself not to go there.

The entire circle appears to be there. But you know some parts are
missing.

That is the current nature of LVM thin.

You know that if you step on certain zones, you will fall through and
crash to the ground below.

(I have had that happen as a kid. We were in the attic and we had
covered the ladder gap with cardboard. Then, we (or at least I) forgot
that the floor was not actually real and I walked on it, instantly
falling through and ending on a step on the ladder below.)

[ People here keep saying that a real admin would not walk on that
ladder gap. A real admin would know where the gap was at all times. He
would not step on it, and not fall though.

But I've had it happen that I forgot where the gap was and I stepped on
it anyway. ]
Post by Linda A. Walsh
Does it make sense to think about a OOTS (OutOfThinSpace) daemon that
can be setup with priorities to reclaim space?
Does make some sense, certainly, to me at least, no matter if I
understand little or are of no real importance here, but, I don't really
understand the implications at this point.
Post by Linda A. Walsh
Processes could also be willing to "give up memory and suspend" --
where, when called, a handler could give back Giga-or Tera bytes of memory
and save it's state as needing to restart the last pass.
That is almost a calamity mode. I need to shower, but I was actually
just painting the walls. Need to stop painting that shower, so I can use
it for something else.

I think it makes sense to lay a claim to some uncovered land, but when
someone else also claims it, you discuss who needs it most, whether you
feel like letting the other one have it, whose turn it is now, will it
hurt you to let go of that.

It is almost the same as reserving classrooms.

So like I said, reservation. And like you say, only temporary space that
you need for jobs. In a normal user system that is not computationally
heavy, these things do not really arise, except maybe for video editing
and the like.

If you have large data jobs like you are talking about, I think you
would need a different kind of scheduling system anyway. But not so much
automatic. Running out of space is not a serious issue if the
administrator system allots space to jobs. Doesn't have to be a
filesystem doing that.

But I guess your proposed daemon is just a layer above that, knowing
about space constraints, and then allotting space to jobs based on
priority queues. Again doesn't really have much to do with thin, unless
every "job" would have its own "thin volume". And the "thin pool-volume
system" would get used to "allot space" (the V-size of the volume) but
if too much space was allotted, the system would get in trouble
(overprovisioning) if all jobs run. Again, borrowing money from the
future.

The premise of LVM is not that every volume is going to be able to use
all its space. It's not that it should, has to, or is going to fill up
as a matter of course, as an expected and normal thing.

You see thin LVM only works if the volumes are independent.

In that job system they are not independent. The independence entails an
expected growth that does happen on purpose. It involves a probability
distribution in which the average of expected space usage to be less
than the maximum.

LVM thin is really a statistical thing basing itself on the laws of
large numbers, averaging, and the expectation that if ONE volume is
going to be max, another one won't.

If you are going to allot jobs that are expected to completely fill up
the reserved space, you are talking about an entirely different thing.

You should provision based on average, but if average is max, it makes
no sense anymore and you should just proportion according to available
real space. You do not need thin volumes or a thin pool to do that sort
of thing: just regular fixed-size filesystem with jobs and space
requests.

In other words, the amount of sane overprovisioning you can do is
related to the difference between max and average.

The different (max - average) is the amount you can safely overprovision
given normal circumstances.

You do not "on purpose" and willfully provision less than the average
you expect. Average is your criterium. Max is the individual max size.
Overprovisioning is the ability of an individual volume to grow beyond
average towards max. If the calculations hold, some other volume will be
below average.

However if your numbers are smaller (not 1000s of volumes, but just a
few) the variance grows enormously. And with the growth in variance you
can no longer predict what is going to happen. But the real question is
whether there is going to be any covariance, and in a real thin system,
there should be none (independent).

For instance, if there is some hype and all your clients suddenly start
downloading the next best movie from 200G television, you already have
covariance.

Social unrest always indicates covariance. People stop making their own
choices, and your predications and business and usual no longer hold
true. Not because your values weren't sane. More like because people
don't act naturally in those circumstances.

Covariance indicates that there is a tertiary factor, causing (for
instance) growth in (volumes) across the line.

John buys a car, and Mary buys a house, but actually it is because they
are getting married.

Or, John buys a car, and Mary buys a house, but the common element is
that they have both been brainwashed by contemporary economists working
at the World Bank.

All in all the insanity happens when you start to borrow from the
future, which causes you to have to work your ass off to meet the
demands you placed on yourself earlier, always having to rush, panic,
and be under pressure.

Better not overprovision beyond your average, in the sense of not even
having enough for what you expect to happen.
Post by Linda A. Walsh
From how it sounds -- when you run out of thin space, what happens
now is that the OS keeps allocating more Virtual space that has no
backing store (in memory or on disk)...with a notification buried in a
system log
somewhere.
Sounds like the gold standard and having money that has no gold behind
it or anything else of value.

Continue reading on narkive:
Loading...