Discussion:
[linux-lvm] Alignment: XFS + LVM2
Marc Caubet
2014-05-06 15:54:21 UTC
Permalink
Hi all,

I am trying to setup a storage pool with correct disk alignment and I hope
somebody can help me to understand some unclear parts to me when
configuring XFS over LVM2.

Actually we have few storage pools with the following settings each:

- LSI Controller with 3xRAID6
- Each RAID6 is configured with 10 data disks + 2 for double-parity.
- Each disk has a capacity of 4TB, 512e and physical sector size of 4K.
- 3x(10+2) configuration was considered in order to gain best performance
and data safety (less disks per RAID less probability of data corruption)
Mike Snitzer
2014-05-07 15:27:57 UTC
Permalink
On Tue, May 06 2014 at 11:54am -0400,
Post by Marc Caubet
Hi all,
I am trying to setup a storage pool with correct disk alignment and I hope
somebody can help me to understand some unclear parts to me when
configuring XFS over LVM2.
- LSI Controller with 3xRAID6
- Each RAID6 is configured with 10 data disks + 2 for double-parity.
- Each disk has a capacity of 4TB, 512e and physical sector size of 4K.
- 3x(10+2) configuration was considered in order to gain best performance
and data safety (less disks per RAID less probability of data corruption)
What is the chunk size used for these RAID6 devices?
Say it is 256K, you have 10 data devices, so the full stripe would be
2560K.

Which version of lvm2 and kernel are you using? Newer versions support
a striped LV stripesize that is not a power-of-2.
Marc Caubet
2014-05-08 09:29:54 UTC
Permalink
Hi Mike,

thanks a lot for your answer.
Post by Marc Caubet
Hi all,
Post by Marc Caubet
I am trying to setup a storage pool with correct disk alignment and I
hope
Post by Marc Caubet
somebody can help me to understand some unclear parts to me when
configuring XFS over LVM2.
- LSI Controller with 3xRAID6
- Each RAID6 is configured with 10 data disks + 2 for double-parity.
- Each disk has a capacity of 4TB, 512e and physical sector size of 4K.
- 3x(10+2) configuration was considered in order to gain best performance
and data safety (less disks per RAID less probability of data corruption)
What is the chunk size used for these RAID6 devices?
Say it is 256K, you have 10 data devices, so the full stripe would be
2560K.
Actually chunk size is 256KB (in a near future we will try 1MB as we are
managing large files but actually we want to keep the current configuration
of 256KB)

Which version of lvm2 and kernel are you using? Newer versions support
Post by Marc Caubet
a striped LV stripesize that is not a power-of-2.
Current LVM2 version is lvm2-2.02.100-8.el6.x86_64
Linda A. Walsh
2014-06-04 05:39:20 UTC
Permalink
Post by Mike Snitzer
On Tue, May 06 2014 at 11:54am -0400,
Post by Marc Caubet
Hi all,
I am trying to setup a storage pool with correct disk alignment and I hope
somebody can help me to understand some unclear parts to me when
configuring XFS over LVM2.
- LSI Controller with 3xRAID6
- Each RAID6 is configured with 10 data disks + 2 for double-parity.
- Each disk has a capacity of 4TB, 512e and physical sector size of 4K.
- 3x(10+2) configuration was considered in order to gain best performance
and data safety (less disks per RAID less probability of data corruption)
----
I have a similar setup and am almost certain I have 2 of them wrong as
shown below:


Model: LSI MR9280DE-8e (scsi)
Disk /dev/sda: 24.0TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt_sync_mbr

Number Start End Size File system Name Flags
1 17.4kB 24.0TB 24.0TB home+shar lvm

Model: LSI MR9280DE-8e (scsi)
Disk /dev/sdb: 12.0TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 1049kB 12.0TB 12.0TB Backups lvm


Model: DELL PERC 6/i (scsi)
Disk /dev/sdd: 7999GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt_sync_mbr

Number Start End Size File system Name Flags
1 17.4kB 7999GB 7999GB Media lvm

pvs says:
# pvs
PV VG Fmt Attr PSize PFree
/dev/sda1 HnS lvm2 a-- 21.83t 2.73t
/dev/sdb1 Backups lvm2 a-- 10.91t 3.15g
/dev/sdd1 Media lvm2 a-- 7.28t 0
-----

Notice how each of them are starting at some weird offset.

I thought I started /dev/sdb @ 1MB, which comes out to 1048576.. so sdb
might
be aligned on a sector boundary.....but has 6 data disks x 64K stripe, =
384K, which
doesn't divide into 1MB evenly.

/dev/sda has a strip-size of 768K, BUT since it is a RAID50 (3 RAID5's in a
RAID0 config), I can use 256K as a strip-size for writes, as a write of
any aligned 256K chunk will only affect 4 data disks (+ 1 parity).
Post by Mike Snitzer
Post by Marc Caubet
And here is my first question: How can I check if the storage and the LV
are correctly aligned?
mkfs.xfs -d su=256k,sw=10 -l size=128m,lazy-count=1 /dev/dcvg_a/dcpool
So my second question is, are the above 'su' and 'sw' parameters correct on
the current LV configuration? If not, which values should I have and why?
AFAIK su is the stripe size configured in the controller side, but in this
case we have a LV. Also, sw is the number of data disks in a RAID, but
again, we have a LV with 3 stripes, and I am not sure if the number of data
disks should be 30 instead.
Newer versions of mkfs.xfs _should_ pick up the hints exposed (as
minimum_io_size and optimal_io_size) by the striped LV.
----
But mkfs.xfs won't pick up the io_size optimal inside the LSI
controller.
That's underlying all of this. LVM didn't try to align space to even
some even amount
based on starting at 17.4k (i.e. would hve to round up to nearest 256 or
384 or 768K depending
on subsystem.
Post by Mike Snitzer
But if not you definitely don't want to be trying to pierce through the
striped LV config to establish settings of the underlying RAID6.
----
You have to.
Post by Mike Snitzer
Each
layer in the stack should respect the layer beneath it.
They don't. LV doesn't determine optimal start based on partition
start, so all of its
alignments are off.

My writes are noticeably slower than my reads sometimes by close to 10x
(5x in more general
case).


I hope to get another disk subsystem so I can dump those partitions and
align them, but
also, follow Stan Hoepper's advice from the xfs list -- go with a RAID
1+0... Then each
pair of RAID1 is independent of every other. The worst has to be that
768K. It triggers a bug
in the gnu database format which assumes the optimal I/O size will be a
power of 2
(which it is not, in my case).

Loading...