Discussion:
[linux-lvm] Possible bug in thin metadata size with Linux MDRAID
Gionatan Danti
2017-03-08 16:14:05 UTC
Permalink
Hi list,
I would like to understand if this is a lvmthin metadata size bug of if
I am simply missing something.

These are my system specs:
- CentOS 7.3 64 bit with kernel 3.10.0-514.6.1.el7
- LVM version 2.02.166-1.el7_3.2
- two linux software RAID device, md127 (root) and md126 (storage)

MD array specs (the interesting one is md126)
Personalities : [raid10]
md126 : active raid10 sdd2[3] sda3[0] sdb2[1] sdc2[2]
557632000 blocks super 1.2 128K chunks 2 near-copies [4/4] [UUUU]
bitmap: 1/5 pages [4KB], 65536KB chunk

md127 : active raid10 sdc1[2] sda2[0] sdd1[3] sdb1[1]
67178496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk

As you can see, /dev/md126 has a 128KB chunk size. I used this device to
host a physical volume and volume group on which I created a thinpool of
512GB. Then, I create a thin logical volume of the same size (512 GB)
and started to fill it. Somewhere near (but not at) the full capacity, I
saw the volume offline due to metadata exhaustion.

Let see how the logical volume was created and how it appear:
[***@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G; lvs -a -o
+chunk_size
Using default stripesize 64.00 KiB.
Logical volume "thinpool" created.
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 512.00g 0.00 0.83
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 512.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 7.62g
0

The metadata volume is quite smaller (~2x) than I expected, and not big
enough to reach 100% data utilization. Indeed, thin_metadata_size show a
minimum metadata volume size of over 130 MB:

[***@blackhole ]# thin_metadata_size -b 128k -s 512g -m 1 -u m
thin_metadata_size - 130.04 mebibytes estimated metadata area size for
"--block-size=128kibibytes --pool-size=512gibibytes --max-thins=1"

Now, the interesting thing: by explicitly setting --chunksize=128, the
metadata volume is 2x bigger (and in line with my expectations):
[***@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G
--chunksize=128; lvs -a -o +chunk_size
Using default stripesize 64.00 KiB.
Logical volume "thinpool" created.
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 256.00m
0
thinpool vg_kvm twi-a-tz-- 512.00g 0.00 0.42
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 512.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 256.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 7.62g
0

Why I saw two very different metadata volume sizes? Chunksize was 128 KB
in both cases; the only difference is that I explicitly specified it on
the command line...

Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Zdenek Kabelac
2017-03-08 18:55:00 UTC
Permalink
Post by Gionatan Danti
Hi list,
I would like to understand if this is a lvmthin metadata size bug of if I am
simply missing something.
- CentOS 7.3 64 bit with kernel 3.10.0-514.6.1.el7
- LVM version 2.02.166-1.el7_3.2
- two linux software RAID device, md127 (root) and md126 (storage)
MD array specs (the interesting one is md126)
Personalities : [raid10]
md126 : active raid10 sdd2[3] sda3[0] sdb2[1] sdc2[2]
557632000 blocks super 1.2 128K chunks 2 near-copies [4/4] [UUUU]
bitmap: 1/5 pages [4KB], 65536KB chunk
md127 : active raid10 sdc1[2] sda2[0] sdd1[3] sdb1[1]
67178496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk
As you can see, /dev/md126 has a 128KB chunk size. I used this device to host
a physical volume and volume group on which I created a thinpool of 512GB.
Then, I create a thin logical volume of the same size (512 GB) and started to
fill it. Somewhere near (but not at) the full capacity, I saw the volume
offline due to metadata exhaustion.
Using default stripesize 64.00 KiB.
Logical volume "thinpool" created.
LV VG Attr LSize Pool Origin Data% Meta% Move
Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 512.00g 0.00 0.83
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 512.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 7.62g
0
The metadata volume is quite smaller (~2x) than I expected, and not big enough
to reach 100% data utilization. Indeed, thin_metadata_size show a minimum
thin_metadata_size - 130.04 mebibytes estimated metadata area size for
"--block-size=128kibibytes --pool-size=512gibibytes --max-thins=1"
Now, the interesting thing: by explicitly setting --chunksize=128, the
Hi

If you do NOT specify any setting - lvm2 targets 128M metadata size.

If you specify '--chunksize' lvm2 tries to find better fit and it happens
to be slightly better with 256M metadata size.

Basically - you could specify anything to the last bit - and if you don't lvm2
does a little 'magic' and tries to come with 'reasonable' defaults for given
kernel and time.

That said - I've in my git tree some rework of this code - mainly for better
support of metadata profiles.
(And my git calculation gives me 256K chunksize + 128M metadata size - so
there was possibly something not completely right in version 166)
Post by Gionatan Danti
Why I saw two very different metadata volume sizes? Chunksize was 128 KB in
both cases; the only difference is that I explicitly specified it on the
command line...
You should NOT forget - that using 'thin-pool' without any monitoring and
automatic resize is somewhat 'dangerous'.

So while lvm2 is not (ATM) enforcing automatic resize when data or metadata
space has reached predefined threshold - I'd highly recommnend to use it.

Upcoming version 169 will provide even support for 'external tool' to be
called when threshold levels are surpassed for even more advanced
configuration options.


Regards

Zdenek


NB. metadata size is not related to mdraid in any way.
Gionatan Danti
2017-03-09 11:24:04 UTC
Permalink
Post by Zdenek Kabelac
Hi
If you do NOT specify any setting - lvm2 targets 128M metadata size.
If you specify '--chunksize' lvm2 tries to find better fit and it happens
to be slightly better with 256M metadata size.
Basically - you could specify anything to the last bit - and if you
don't lvm2 does a little 'magic' and tries to come with 'reasonable'
defaults for given kernel and time.
That said - I've in my git tree some rework of this code - mainly for
better support of metadata profiles.
(And my git calculation gives me 256K chunksize + 128M metadata size -
so there was possibly something not completely right in version 166)
256 KB chunksize would be perfectly reasonable
Post by Zdenek Kabelac
Post by Gionatan Danti
Why I saw two very different metadata volume sizes? Chunksize was 128 KB in
both cases; the only difference is that I explicitly specified it on the
command line...
You should NOT forget - that using 'thin-pool' without any monitoring
and automatic resize is somewhat 'dangerous'.
True, but I should have no problem if not using snapshot or
overprovisioning - ie when all data chunks are allocated (filesystem
full) but no overprovisioned. This time, however, the created metadata
pool was *insufficient* to even address the provisioned data chunks.
Post by Zdenek Kabelac
So while lvm2 is not (ATM) enforcing automatic resize when data or
metadata space has reached predefined threshold - I'd highly recommnend
to use it.
Upcoming version 169 will provide even support for 'external tool' to be
called when threshold levels are surpassed for even more advanced
configuration options.
Regards
Zdenek
NB. metadata size is not related to mdraid in any way.
I am under impression that 128 KB size was chosen because this was MD
chunk size. Indeed further tests seem to confirm this.

WITH 128 KB MD CHUNK SIZE:
[***@gdanti-laptop test]# mdadm --create md127 --level=raid10
--assume-clean --chunk=128 --raid-devices=4 /dev/loop0 /dev/loop1
/dev/loop2 /dev/loop3

[***@gdanti-laptop test]# pvcreate /dev/md127; vgcreate vg_kvm
/dev/md127; lvcreate --thin vg_kvm --name thinpool -L 500G

[***@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.80
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0


WITH 256 KB MD CHUNK SIZE:
[***@gdanti-laptop test]# mdadm --create md127 --level=raid10
--assume-clean --chunk=256 --raid-devices=4 /dev/loop0 /dev/loop1
/dev/loop2 /dev/loop3

[***@gdanti-laptop test]# pvcreate /dev/md127; vgcreate vg_kvm
/dev/md127; lvcreate --thin vg_kvm --name thinpool -L 500G

[***@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.42
256.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0


So it seems MD chunk size has a strong influence on LVM thin chunk choice.

Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Zdenek Kabelac
2017-03-09 11:53:57 UTC
Permalink
Post by Gionatan Danti
Post by Zdenek Kabelac
Hi
If you do NOT specify any setting - lvm2 targets 128M metadata size.
If you specify '--chunksize' lvm2 tries to find better fit and it happens
to be slightly better with 256M metadata size.
Basically - you could specify anything to the last bit - and if you
don't lvm2 does a little 'magic' and tries to come with 'reasonable'
defaults for given kernel and time.
That said - I've in my git tree some rework of this code - mainly for
better support of metadata profiles.
(And my git calculation gives me 256K chunksize + 128M metadata size -
so there was possibly something not completely right in version 166)
256 KB chunksize would be perfectly reasonable
Post by Zdenek Kabelac
Post by Gionatan Danti
Why I saw two very different metadata volume sizes? Chunksize was 128 KB in
both cases; the only difference is that I explicitly specified it on the
command line...
You should NOT forget - that using 'thin-pool' without any monitoring
and automatic resize is somewhat 'dangerous'.
True, but I should have no problem if not using snapshot or overprovisioning -
ie when all data chunks are allocated (filesystem full) but no
overprovisioned. This time, however, the created metadata pool was
*insufficient* to even address the provisioned data chunks.
Hmm - it would be interesting to see your 'metadata' - it should be still
quite good fit 128M of metadata for 512G when you are not using snapshots.

What's been your actual test scenario ?? (Lots of LVs??)

But as said - there is no guarantee of the size to fit for any possible use
case - user is supposed to understand what kind of technology he is using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.

Otherwise you would have to simply always create 16G metadata LV if you do not
want to run out of metadata space.
Post by Gionatan Danti
I am under impression that 128 KB size was chosen because this was MD chunk
size. Indeed further tests seem to confirm this.
Ahh yeah - there was small issue - when the 'hint' for device geometry was
used it has started from 'default' 64K size - instead of already counted 256K
chunk size.


Regards

Zdenek
Gionatan Danti
2017-03-09 15:33:45 UTC
Permalink
Post by Zdenek Kabelac
Hmm - it would be interesting to see your 'metadata' - it should be still
quite good fit 128M of metadata for 512G when you are not using snapshots.
What's been your actual test scenario ?? (Lots of LVs??)
Nothing unusual - I had a single thinvol with an XFS filesystem used to
store an HDD image gathered using ddrescue.

Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
512GB volume with 128 KB chunks? My testing suggests something
different. For example, give it a look at this empty thinpool/thinvol:

[***@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 0.00
0.81 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 0.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0

As you can see, as it is a empty volume, metadata is at only 0.81% Let
write 5 GB (1% of thin data volume):

[***@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 1.00
1.80 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 1.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0

Metadata grown by the same 1%. Accounting for the initial 0.81
utilization, this means that a near full data volume (with *no*
overprovisionig nor snapshots) will exhaust its metadata *before* really
becoming 100% full.

While I can absolutely understand that this is expected behavior when
using snapshots and/or overprovisioning, in this extremely simple case
metadata should not be exhausted before data. In other words, the
initial metadata creation process should be *at least* consider that a
plain volume can be 100% full, and allocate according.

The interesting part is that when not using MD, all is working properly:
metadata are about 2x their minimal value (as reported by
thin_metadata_size), and this provide ample buffer for
snapshotting/overprovisioning. When using MD, the bad iteration between
RAID chunks and thin metadata chunks ends with a too small metadata volume.

This can become very bad. Give a look at what happens when creating a
thin pool on a MD raid whose chunks are at 64 KB:

[***@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0

Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.
Post by Zdenek Kabelac
But as said - there is no guarantee of the size to fit for any possible
use case - user is supposed to understand what kind of technology he is
using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.
True, but this trivial case should really works without
tuning/monitoring. In short, let fail gracefully on a simple case...
Post by Zdenek Kabelac
Otherwise you would have to simply always create 16G metadata LV if you
do not want to run out of metadata space.
Absolutely true. I've written this email to report a bug, indeed ;)
Thank you all for this outstanding work.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Gionatan Danti
2017-03-20 09:47:16 UTC
Permalink
Hi all,
any comments on the report below?

Thanks.
Post by Gionatan Danti
Post by Zdenek Kabelac
Hmm - it would be interesting to see your 'metadata' - it should be still
quite good fit 128M of metadata for 512G when you are not using snapshots.
What's been your actual test scenario ?? (Lots of LVs??)
Nothing unusual - I had a single thinvol with an XFS filesystem used to
store an HDD image gathered using ddrescue.
Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
512GB volume with 128 KB chunks? My testing suggests something
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 0.00
0.81 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 0.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
As you can see, as it is a empty volume, metadata is at only 0.81% Let
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 1.00
1.80 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 1.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Metadata grown by the same 1%. Accounting for the initial 0.81
utilization, this means that a near full data volume (with *no*
overprovisionig nor snapshots) will exhaust its metadata *before* really
becoming 100% full.
While I can absolutely understand that this is expected behavior when
using snapshots and/or overprovisioning, in this extremely simple case
metadata should not be exhausted before data. In other words, the
initial metadata creation process should be *at least* consider that a
plain volume can be 100% full, and allocate according.
metadata are about 2x their minimal value (as reported by
thin_metadata_size), and this provide ample buffer for
snapshotting/overprovisioning. When using MD, the bad iteration between
RAID chunks and thin metadata chunks ends with a too small metadata volume.
This can become very bad. Give a look at what happens when creating a
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.
Post by Zdenek Kabelac
But as said - there is no guarantee of the size to fit for any possible
use case - user is supposed to understand what kind of technology he is
using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.
True, but this trivial case should really works without
tuning/monitoring. In short, let fail gracefully on a simple case...
Post by Zdenek Kabelac
Otherwise you would have to simply always create 16G metadata LV if you
do not want to run out of metadata space.
Absolutely true. I've written this email to report a bug, indeed ;)
Thank you all for this outstanding work.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Zdenek Kabelac
2017-03-20 09:51:55 UTC
Permalink
Post by Gionatan Danti
Hi all,
any comments on the report below?
Thanks.
Please check upstream behavior (git HEAD)
It will still take a while before final release so do not use it
regularly yet (as few things still may change).

Not sure for which other comment you look for.

Zdenek
Post by Gionatan Danti
Post by Gionatan Danti
Post by Zdenek Kabelac
Hmm - it would be interesting to see your 'metadata' - it should be still
quite good fit 128M of metadata for 512G when you are not using snapshots.
What's been your actual test scenario ?? (Lots of LVs??)
Nothing unusual - I had a single thinvol with an XFS filesystem used to
store an HDD image gathered using ddrescue.
Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
512GB volume with 128 KB chunks? My testing suggests something
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 0.00
0.81 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 0.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
As you can see, as it is a empty volume, metadata is at only 0.81% Let
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 1.00
1.80 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 1.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Metadata grown by the same 1%. Accounting for the initial 0.81
utilization, this means that a near full data volume (with *no*
overprovisionig nor snapshots) will exhaust its metadata *before* really
becoming 100% full.
While I can absolutely understand that this is expected behavior when
using snapshots and/or overprovisioning, in this extremely simple case
metadata should not be exhausted before data. In other words, the
initial metadata creation process should be *at least* consider that a
plain volume can be 100% full, and allocate according.
metadata are about 2x their minimal value (as reported by
thin_metadata_size), and this provide ample buffer for
snapshotting/overprovisioning. When using MD, the bad iteration between
RAID chunks and thin metadata chunks ends with a too small metadata volume.
This can become very bad. Give a look at what happens when creating a
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.
Post by Zdenek Kabelac
But as said - there is no guarantee of the size to fit for any possible
use case - user is supposed to understand what kind of technology he is
using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.
True, but this trivial case should really works without
tuning/monitoring. In short, let fail gracefully on a simple case...
Post by Zdenek Kabelac
Otherwise you would have to simply always create 16G metadata LV if you
do not want to run out of metadata space.
Absolutely true. I've written this email to report a bug, indeed ;)
Thank you all for this outstanding work.
Gionatan Danti
2017-03-20 10:45:11 UTC
Permalink
Post by Zdenek Kabelac
Please check upstream behavior (git HEAD)
It will still take a while before final release so do not use it
regularly yet (as few things still may change).
I will surely try with git head and report back here.
Post by Zdenek Kabelac
Not sure for which other comment you look for.
Zdenek
1. you suggested that a 128 MB metadata volume is "quite good" for a
512GB volume and 128KB chunkgs. However, my tests show that a near full
data volume (with *no* overprovisionig nor snapshots) will exhaust its
metadata *before* really becoming 100% full.

2. On a MD RAID with 64KB chunk size, things become much worse:
[***@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0

Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.

So, I am missing something or the RHEL 7.3-provided LVM has some serious
problems identifing correct metadata volume size when running on top of
a MD RAID device?

Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Zdenek Kabelac
2017-03-20 11:01:43 UTC
Permalink
Post by Gionatan Danti
Post by Zdenek Kabelac
Please check upstream behavior (git HEAD)
It will still take a while before final release so do not use it
regularly yet (as few things still may change).
I will surely try with git head and report back here.
Post by Zdenek Kabelac
Not sure for which other comment you look for.
Zdenek
1. you suggested that a 128 MB metadata volume is "quite good" for a 512GB
volume and 128KB chunkgs. However, my tests show that a near full data volume
(with *no* overprovisionig nor snapshots) will exhaust its metadata *before*
really becoming 100% full.
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.
So, I am missing something or the RHEL 7.3-provided LVM has some serious
problems identifing correct metadata volume size when running on top of a MD
RAID device?
As said - please try with HEAD - and report back if you still see a problem.
There were couple issue fixed along this path.

In my test it seems 500G needs at least 258M with 64K chunksize.

On the other hand - it's never been documented that thin-pool without
monitoring is supposed to fit single LV AFAIK - it's basically needed that
user knows what he is using when he uses thin-provisioning - but of course
we continuously try to improve things to be more usable.

Zdenek
Gionatan Danti
2017-03-20 11:52:07 UTC
Permalink
Post by Zdenek Kabelac
As said - please try with HEAD - and report back if you still see a problem.
There were couple issue fixed along this path.
Ok, I tried now with tools and library from git:

LVM version: 2.02.169(2)-git (2016-11-30)
Library version: 1.02.138-git (2016-11-30)
Driver version: 4.34.0

I can confirm that now thin chunk size is no more bound (by default) by
MD RAID chunk. For example, having created a ~500 GB MD RAID 10 array
with 64 KB chunks, creating a thinpool shows that:

[***@blackhole ~]# lvcreate --thinpool vg_kvm/thinpool -L 500G
[***@blackhole ~]# lvs -a -o +chunk_size
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.42
256.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-a----- 7.62g

Should I open a bug against the RHEL-provided packages?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Zdenek Kabelac
2017-03-20 13:57:08 UTC
Permalink
Post by Gionatan Danti
Post by Zdenek Kabelac
As said - please try with HEAD - and report back if you still see a problem.
There were couple issue fixed along this path.
LVM version: 2.02.169(2)-git (2016-11-30)
Library version: 1.02.138-git (2016-11-30)
Driver version: 4.34.0
I can confirm that now thin chunk size is no more bound (by default) by MD
RAID chunk. For example, having created a ~500 GB MD RAID 10 array with 64 KB
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
LV VG Attr LSize Pool Origin Data% Meta% Move
Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.42
256.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-a----- 7.62g
Should I open a bug against the RHEL-provided packages?
Well if you want to get support for your existing packages - you would
need to go via 'GSS' channel.

You may open BZ - which will get closed with next release of RHEL7.4
(as you already confirmed upstream has resolved the issue).

Zdenek
Gionatan Danti
2017-03-20 14:25:34 UTC
Permalink
Post by Zdenek Kabelac
Well if you want to get support for your existing packages - you would
need to go via 'GSS' channel.
Sorry, but what do you means for "GSS channel"?
Post by Zdenek Kabelac
You may open BZ - which will get closed with next release of RHEL7.4
(as you already confirmed upstream has resolved the issue).
Zdenek
I'll surely do that.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: ***@assyoma.it - ***@assyoma.it
GPG public key ID: FF5F32A8
Loading...