Hector Martin 'marcan'
2018-06-29 16:02:22 UTC
Some (many?)* USB UAS drives report an optimal IO size of 65535 sectors
(64K - 1), which is also the maximum value:
# sg_inq -p 0xb0 /dev/sda
VPD INQUIRY: Block limits page (SBC)
Maximum compare and write length: 0 blocks
Optimal transfer length granularity: 8 blocks
Maximum transfer length: 65535 blocks
Optimal transfer length: 65535 blocks
Maximum prefetch transfer length: 65535 blocks
[...]
# cat /sys/block/sdb/queue/optimal_io_size
33553920
LVM decides that is a good 1st PE alignment value:
# pvcreate -vv /dev/sda4 2>&1 | grep -C1 optim
Device /dev/sda4: queue/minimum_io_size is 4096 bytes.
Device /dev/sda4: queue/optimal_io_size is 33553920 bytes.
/dev/sda4: Setting PE alignment to 65535 sectors.
# pvs -o +pe_start --units s /dev/sda4
PV VG Fmt Attr PSize PFree 1st PE
/dev/sda4 eib lvm2 a-- 1421557760S 412827648S 65535S
Unfortunately, this throws off the logical/physical sector alignment on
4K/512e sector drives, which completely kills performance:
# cat /sys/block/sdb/queue/physical_block_size
4096
(also "Optimal transfer length granularity" above)
I see this was already reported in November 2016, but apparently has not
been fixed:
https://www.redhat.com/archives/linux-lvm/2016-November/msg00035.html
# lvm version
LVM version: 2.02.166(2) (2016-09-26)
Library version: 1.02.135 (2016-09-26)
Driver version: 4.37.0
I think LVM should always align the 1st PE offset up to the physical
sector size, or outright ignore the optimal_io_size if it isn't aligned.
As far as I can tell the goal here is to align to RAID stripe sizes
(which are reported as optimal_io_size), but if that and the physical
sector size are mismatched, it's probably not a RAID (at least not a
properly configured RAID).
The performance hit when this goes wrong is absolutely massive with
certain workloads. An rsync that takes 30 seconds with an aligned
filesystem takes 10 minutes without it (on one of my 2.5" SATA drives
behind a UAS bridge enclosure).
[*] At least two different enclosures I own, with different vendors of
USB-SATA UAS bridges in them.
(64K - 1), which is also the maximum value:
# sg_inq -p 0xb0 /dev/sda
VPD INQUIRY: Block limits page (SBC)
Maximum compare and write length: 0 blocks
Optimal transfer length granularity: 8 blocks
Maximum transfer length: 65535 blocks
Optimal transfer length: 65535 blocks
Maximum prefetch transfer length: 65535 blocks
[...]
# cat /sys/block/sdb/queue/optimal_io_size
33553920
LVM decides that is a good 1st PE alignment value:
# pvcreate -vv /dev/sda4 2>&1 | grep -C1 optim
Device /dev/sda4: queue/minimum_io_size is 4096 bytes.
Device /dev/sda4: queue/optimal_io_size is 33553920 bytes.
/dev/sda4: Setting PE alignment to 65535 sectors.
# pvs -o +pe_start --units s /dev/sda4
PV VG Fmt Attr PSize PFree 1st PE
/dev/sda4 eib lvm2 a-- 1421557760S 412827648S 65535S
Unfortunately, this throws off the logical/physical sector alignment on
4K/512e sector drives, which completely kills performance:
# cat /sys/block/sdb/queue/physical_block_size
4096
(also "Optimal transfer length granularity" above)
I see this was already reported in November 2016, but apparently has not
been fixed:
https://www.redhat.com/archives/linux-lvm/2016-November/msg00035.html
# lvm version
LVM version: 2.02.166(2) (2016-09-26)
Library version: 1.02.135 (2016-09-26)
Driver version: 4.37.0
I think LVM should always align the 1st PE offset up to the physical
sector size, or outright ignore the optimal_io_size if it isn't aligned.
As far as I can tell the goal here is to align to RAID stripe sizes
(which are reported as optimal_io_size), but if that and the physical
sector size are mismatched, it's probably not a RAID (at least not a
properly configured RAID).
The performance hit when this goes wrong is absolutely massive with
certain workloads. An rsync that takes 30 seconds with an aligned
filesystem takes 10 minutes without it (on one of my 2.5" SATA drives
behind a UAS bridge enclosure).
[*] At least two different enclosures I own, with different vendors of
USB-SATA UAS bridges in them.
--
Hector Martin "marcan" (***@marcan.st)
Public Key: https://mrcn.st/pub
Hector Martin "marcan" (***@marcan.st)
Public Key: https://mrcn.st/pub