Discussion:
[linux-lvm] dm-cache size is limited by 946GB
CoolCold
2017-07-23 13:48:37 UTC
Permalink
Hello!
We started to adopting new servers for image storages, and hit the
strange problem - no caching happens for cache lv > 946GB (so 947GB
and above do no work).

Storage box looks like:
2x240GB SSD for system (sw raid 1, lvm on top)
20x1.8TB SATA HDD for data (sw raid 10, md124 + lvm on top)
4x960GB SSD for dm-caching puprose (sw raid5, md125).

Our naive approach was to create PV from md125 and make it all cache -
around 2.6TB of cache for 16TB of "raw" data should be quite good.
Cache created successfully, has seen the whole 2.6TB, but after
copying ~ 3TB data from old box, we still got only misses for reads
and writes in statistics and almost no activity in iostat for md125.
When i say "almost no activity" it was having some write operations,
but zeroes in KB -
https://gist.github.com/CoolCold/f79bb706d4dd1c083a4f4ed0ebd850d5 -
where dm-2 and dm-3 are cache data and cache meta volumes accordingly.

We have "old" servers which are running a bit different setup in
number of drives, they have 350-750GB of space for caching and it
works well. We tried to reduce cache size for new box, it worked for
80GB, so bisected to 946GB.

It doesn't look like any "magic" number (I though may be some problems
around 2TB for signed/unsigned or so) and right now i'm out of ideas
what the problem may be and need your advice.

Kernel version we are using:
3.10.0-514.26.2.el7.x86_64

LVM:
[***@xxx rovchinnikov]# lvs --version
LVM version: 2.02.166(2)-RHEL7 (2016-11-16)
Library version: 1.02.135-RHEL7 (2016-11-16)
Driver version: 4.34.0
--
Best regards,
[COOLCOLD-RIPN]
Mike Snitzer
2017-07-24 16:07:20 UTC
Permalink
On Sun, Jul 23 2017 at 9:48am -0400,
Post by CoolCold
Hello!
We started to adopting new servers for image storages, and hit the
strange problem - no caching happens for cache lv > 946GB (so 947GB
and above do no work).
2x240GB SSD for system (sw raid 1, lvm on top)
20x1.8TB SATA HDD for data (sw raid 10, md124 + lvm on top)
4x960GB SSD for dm-caching puprose (sw raid5, md125).
Our naive approach was to create PV from md125 and make it all cache -
around 2.6TB of cache for 16TB of "raw" data should be quite good.
Cache created successfully, has seen the whole 2.6TB, but after
copying ~ 3TB data from old box, we still got only misses for reads
and writes in statistics and almost no activity in iostat for md125.
When i say "almost no activity" it was having some write operations,
but zeroes in KB -
https://gist.github.com/CoolCold/f79bb706d4dd1c083a4f4ed0ebd850d5 -
where dm-2 and dm-3 are cache data and cache meta volumes accordingly.
We have "old" servers which are running a bit different setup in
number of drives, they have 350-750GB of space for caching and it
works well. We tried to reduce cache size for new box, it worked for
80GB, so bisected to 946GB.
It doesn't look like any "magic" number (I though may be some problems
around 2TB for signed/unsigned or so) and right now i'm out of ideas
what the problem may be and need your advice.
3.10.0-514.26.2.el7.x86_64
the 7.4 dm-cache will be much more performant than the 7.3 cache you
appear to be using.

As for you "no caching happens for cache lv > 946GB". Cache shouldn't
have any concerns about the size. It could be that your workload isn't
accessing the data enough to warrant promotion to the cache. dm-cache
is a "hotspot" cache. If you aren't accessing the data repeatedly then
you won't see much benefit (particularly with the 7.3 and earlier
releases).

Just to get a feel, you could try the latest upstream 4.12 kernel to see
how effective the 7.4 dm-cache will be for your setup.

Mike
CoolCold
2017-07-24 16:12:38 UTC
Permalink
Hello!
I will try to check with 4.12 kernel, but doesn't it looks suspicious
to you that 946GB works almost instantly, while 947GB and more do not
at all? (waited 2.6TB for ~ 1.6 days and 3TB of data).
Post by Mike Snitzer
On Sun, Jul 23 2017 at 9:48am -0400,
Post by CoolCold
Hello!
We started to adopting new servers for image storages, and hit the
strange problem - no caching happens for cache lv > 946GB (so 947GB
and above do no work).
2x240GB SSD for system (sw raid 1, lvm on top)
20x1.8TB SATA HDD for data (sw raid 10, md124 + lvm on top)
4x960GB SSD for dm-caching puprose (sw raid5, md125).
Our naive approach was to create PV from md125 and make it all cache -
around 2.6TB of cache for 16TB of "raw" data should be quite good.
Cache created successfully, has seen the whole 2.6TB, but after
copying ~ 3TB data from old box, we still got only misses for reads
and writes in statistics and almost no activity in iostat for md125.
When i say "almost no activity" it was having some write operations,
but zeroes in KB -
https://gist.github.com/CoolCold/f79bb706d4dd1c083a4f4ed0ebd850d5 -
where dm-2 and dm-3 are cache data and cache meta volumes accordingly.
We have "old" servers which are running a bit different setup in
number of drives, they have 350-750GB of space for caching and it
works well. We tried to reduce cache size for new box, it worked for
80GB, so bisected to 946GB.
It doesn't look like any "magic" number (I though may be some problems
around 2TB for signed/unsigned or so) and right now i'm out of ideas
what the problem may be and need your advice.
3.10.0-514.26.2.el7.x86_64
the 7.4 dm-cache will be much more performant than the 7.3 cache you
appear to be using.
As for you "no caching happens for cache lv > 946GB". Cache shouldn't
have any concerns about the size. It could be that your workload isn't
accessing the data enough to warrant promotion to the cache. dm-cache
is a "hotspot" cache. If you aren't accessing the data repeatedly then
you won't see much benefit (particularly with the 7.3 and earlier
releases).
Just to get a feel, you could try the latest upstream 4.12 kernel to see
how effective the 7.4 dm-cache will be for your setup.
Mike
--
Best regards,
[COOLCOLD-RIPN]
Joe Thornber
2017-07-25 11:46:18 UTC
Permalink
Post by CoolCold
Hello!
I will try to check with 4.12 kernel, but doesn't it looks suspicious
to you that 946GB works almost instantly, while 947GB and more do not
at all? (waited 2.6TB for ~ 1.6 days and 3TB of data).
Yes it looks very suspicious. I'll get round to checking it out this
week.

- Joe

Loading...