Discussion:
[linux-lvm] devices.filter changed behaviour in 80ac8f37d6
Chris Webb
2015-09-05 11:56:41 UTC
Permalink
Our hosts use local md arrays as LVM2 PVs in a very straightforward way.
They also have lots of slower iscsi devices which LVM shouldn't scan or
touch, so we use a simple filter in lvm.conf:

devices {
filter = [ "a|^/dev/md.*|", "r|.*|" ]
}

Upon upgrading from lvm-2.02.106 to 2.02.129, commands like lvdisplay and
lvs dramatically slowed down. Investigating, we found the filter wasn't
excluding the unwanted devices anymore: they were being scanned despite
being explicitly excluded.

A simple way to reproduce this in a VM is

# cat /etc/lvm/lvm.conf
devices {
filter = [ "r|.*|" ]
}
# lvm-2.02.129 pvscan -vv 2>&1 | grep /dev/
/dev/md0: size is 0 sectors
/dev/vda: size is 2147483648 sectors
/dev/vda: size is 2147483648 sectors
# lvm-2.02.106 pvscan -vv 2>&1 | grep /dev/
#

If I replace filter with global_filter, both versions behave identically:

# cat /etc/lvm/lvm.conf
devices {
global_filter = [ "r|.*|" ]
}
# lvm-2.02.129 pvscan -vv 2>&1 | grep /dev/
# lvm-2.02.106 pvscan -vv 2>&1 | grep /dev/
#

Deleting /etc/lvm/cache before running each pvscan doesn't affect the result.

A quick git bisect finds the behaviour changed with the following:

commit 80ac8f37d6ac5f8c5228678d4ee07187b5d4db7b
Author: Peter Rajnoha <***@redhat.com>
Date: Thu Sep 11 09:30:03 2014 +0200

filters: fix incorrect filter indexing in composite filter array

Caused by recent changes - a7be3b12dfe7388d1648595e6cc4c7a1379bb8a7.
If global filter was not defined, then part of the code
creating composite filter (the cmd->lvmetad_filter) incorrectly
increased index value even if this global filter was not created
as part of the composite filter. This caused a gap with "NULL"
value in the composite filter array which ended up with the rest
of the filters after the gap to be ignored and also it caused a mem
leak when destroying the composite filter.


Presumably this change in the behaviour of devices.filter is an
unintentional consequence rather than deliberate? Our use is copied
more-or-less directly from the example in conf/example.conf.

Best wishes,

Chris.
Peter Rajnoha
2015-09-07 07:30:24 UTC
Permalink
Post by Chris Webb
Our hosts use local md arrays as LVM2 PVs in a very straightforward way.
They also have lots of slower iscsi devices which LVM shouldn't scan or
devices {
filter = [ "a|^/dev/md.*|", "r|.*|" ]
}
Upon upgrading from lvm-2.02.106 to 2.02.129, commands like lvdisplay
and lvs dramatically slowed down. Investigating, we found the filter
wasn't excluding the unwanted devices anymore: they were being scanned
despite being explicitly excluded.
I'll check it, thanks for the report...
--
Peter
Peter Rajnoha
2015-09-07 11:40:26 UTC
Permalink
Post by Peter Rajnoha
Post by Chris Webb
Our hosts use local md arrays as LVM2 PVs in a very straightforward way.
They also have lots of slower iscsi devices which LVM shouldn't scan or
devices {
filter = [ "a|^/dev/md.*|", "r|.*|" ]
}
Upon upgrading from lvm-2.02.106 to 2.02.129, commands like lvdisplay
and lvs dramatically slowed down. Investigating, we found the filter
wasn't excluding the unwanted devices anymore: they were being scanned
despite being explicitly excluded.
I'll check it, thanks for the report...
OK, I've refreshed my memory now on this topic...

I'd advise you to use global_filter instead which should do the job
you need as global_filter is always evaluated at the beginning of the filter
chain. It's also more appropriate to use global_filter in your case if you
want to be sure that devices are filtered completely all the time, never
scanned, even if you switched to using lvmetad.


Now, more about this delicate part of the code and how it actually works.
Feel free to skip this if you're not interested in details, I'm putting
this here just for the record.

There were lots of changes around filters, mainly due to the introduction of
lvmetad daemon which is used to cache LVM metadata. For both the lvmetad and
non-lvmetad case to still work properly and effectively, we needed to do some
changes to the LVM filter chain. This is history of LVM filter chain throughout
versions that happened:


Original filter chain before lvmetad and its global_filter (before v2.02.98):
=============================================================================
persistent_filter -> sysfs_filter -> regex_filter -> type_filter ->
md_component_filter -> mpath_component_filter

Filter chain right after introduction of global_filter (v2.02.98):
==================================================================
global_regex_filter -> persistent_filter -> sysfs_filter -> regex_filter ->
type_filter -> md_component_filter -> mpath_component_filter


Filter chain you mentioned worked for you well (v2.02.106):
===========================================================

Without lvmetad v106:
---------------------
persistent_filter -> regex_filter -> type_filter ->
mpath_component_filter -> partition_filter -> md_component_filter


With lvmetad v106:
------------------
- to update lvmetad:
global_regex_filter -> persistent_filter -> regex_filter -> type_filter ->
mpath_filter -> partiton_filter -> md_filter

- to retrieve info from lvmetad:
persistent_filter -> regex_filter -> type_filter -> mpath_filter ->
partiton_filter -> md_filter


Then later on, filter chain changed so some parts were not uselessly
reevaluated when using lvmetad and also fixing a few other
filter-related problems where some filters were not even evaluated
by mistake (v2.02.112):
====================================================================

Without lvmetad v112:
---------------------
persistent_filter -> sysfs_filter -> global_regex_filter -> type_filter ->
usable->filter -> mpath_component_filter -> partition_filter ->
md_component_filter -> regex_filter

With lvmetad v112:
------------------
- to update lvmetad:
sysfs_filter -> global_regex_filter -> type_filter -> usable_filter ->
mpath_component_filter -> partition_filter -> md_component_filter

- to retrieve info from lvmetad:
persistent_filter -> usable_filter -> regex_filter

The regex_filter moved at the end of the filter chain as part of this patch:
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=a7be3b12dfe7388d1648595e6cc4c7a1379bb8a7


====

The key here is that the position of the regex_filter (controlled by
devices/filter setting) changed in that v112. That should be moved back
to the from of the particular filter chain if possible.

The chains should be probably fixed so they look like this:

Without lvmetad:
----------------
persistent_filter -> sysfs_filter -> global_regex_filter -> regex_filter ->
type_filter -> usable->filter -> mpath_component_filter -> partition_filter ->
md_component_filter

With lvmetad:
-------------
- to update lvmetad:
sysfs_filter -> global_regex_filter -> type_filter -> usable_filter ->
mpath_component_filter -> partition_filter -> md_component_filter

- to retrieve info from lvmetad:
persistent_filter -> regex_filter -> usable_filter

But first I need to make sure this won't cause any regressions, mainly after
taking into account the https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=a7be3b12dfe7388d1648595e6cc4c7a1379bb8a7
and the reasons why the move has been done at all...
--
Peter
Peter Rajnoha
2015-09-07 11:48:49 UTC
Permalink
Post by Peter Rajnoha
Post by Peter Rajnoha
Post by Chris Webb
Our hosts use local md arrays as LVM2 PVs in a very straightforward way.
They also have lots of slower iscsi devices which LVM shouldn't scan or
devices {
filter = [ "a|^/dev/md.*|", "r|.*|" ]
}
Upon upgrading from lvm-2.02.106 to 2.02.129, commands like lvdisplay
and lvs dramatically slowed down. Investigating, we found the filter
wasn't excluding the unwanted devices anymore: they were being scanned
despite being explicitly excluded.
I'll check it, thanks for the report...
OK, I've refreshed my memory now on this topic...
I'd advise you to use global_filter instead which should do the job
you need as global_filter is always evaluated at the beginning of the filter
chain. It's also more appropriate to use global_filter in your case if you
want to be sure that devices are filtered completely all the time, never
scanned, even if you switched to using lvmetad.
Note:
Since lvm2 v2.02.116, we have a new feature in LVM which causes it
to reuse existing scanned information from udev database. See also
"devices/external_device_info_source" setting in lvm.conf. If you
switch that from "none" (default) to "udev", lots of the information
used for filtering is gathered from udev database, hence avoiding
the scans which are already done by udev (and mostly the blkid
call while processing udev rules).

This can avoid scanning for partition tables and md component
scanning which takes probably the most time (other scans do not
read from the disk directly, but they read information from /sysfs
and/or they do operations on devices via ioctl).

So besides filtering out some devices completely just because
of the time it takes to scan the device, you can also try the
"udev" external device info source to be used by LVM instead
so LVM doesn't gather the same info again when it's already
in udev database.
--
Peter
Peter Rajnoha
2015-09-07 13:14:40 UTC
Permalink
Post by Peter Rajnoha
I'd advise you to use global_filter instead which should do the job
you need as global_filter is always evaluated at the beginning of the filter
chain. It's also more appropriate to use global_filter in your case if you
want to be sure that devices are filtered completely all the time, never
scanned, even if you switched to using lvmetad.
Hi Peter. Yes, we switched our config to use global_filter when we hit
the problem. Great to hear that's the more efficient way to leave it
long-term anyway. (global_filter didn't exist as an option when I last
touched that lvm.conf, but it definitely makes sense to be able to
restrict the device tree before any other, potentially expensive,
processing.)
We should probably look at running the new lvmetad ourselves too: thanks
for the pointer. We do quite a few read only queries from our management
layer using lvs, vgs and so on. Caching the meta data between
invocations could make a lot of sense if the daemon's reasonably
lightweight and doesn't pull in extra dependencies. (LVM can't
interrogate our udev database, though, as we don't use udev to handle
our kernel uevents.)
Yes, without lvmetad daemon, each LVM command needs to run full device
scan at its start to look for PVs and then when a PV is found, it needs
to read all LVM metadata it finds (and check for consistency etc etc).
Of course, not including those devices which are filtered out.

However, with lvmetad, you can save some disk access here since all the
metadata are taken from lvmetad cache...

If you're not using udev to handle kernel uevents (hmm, what are you using
then?), then you need to make sure that whatever is processing the uevents,
it needs to do the same or similar job as usual udev rules would do with udev.
See also https://git.fedorahosted.org/cgit/lvm2.git/tree/udev/69-dm-lvm-metad.rules.in.

That means, you need to make sure that lvmetad knows about each new PV
that appears - normally we run "pvscan --cache major:minor" to do this -
this is triggered on uevents.

For now, we officially support only udev with lvmetad. Lvmetad doesn't have
any extra deps except what lvm binary already has.
--
Peter
Peter Rajnoha
2015-09-07 13:54:01 UTC
Permalink
(I note from the pvscan(8) man page that global_filter applies to pvscan
--cache but not standard filter, which sounds like another good reason
to switch to global_filter!)
Yes, exactly. The idea behind filter and "global_filter" split is that
the global_filter applies globally on system scope (in our context of
lvmetad, it means lvmetad will see everything that "global_filter" allows,
not taking care about the "filter").

Then each LVM client (all LVM commands except pvscan --cache) can use
different "filter" for the information that lvmetad returns.

So you have two levels of filtering here: global one and client-side one
(that's exactly the split between "to update lvmetad" and
"to retrieve info from lvmetad" filter chain I described in my previous
post).
--
Peter
Peter Rajnoha
2015-09-07 13:23:25 UTC
Permalink
Just one final thought. A second reason we deliberately exclude those
iSCSI devices is that they're actually the drives backing customer VMs,
so any LVM metadata on them should be interpreted by an untrusted guest
kernel and not by the host. Untrusted third parties have complete
control over the contents of the block devices.
Is LVM well-secured against attacks from block devices containing
malicious LVM metadata? If not, an unexpected change in filtering
behaviour might potentially be a security issue in some environments.
Before, we advised use of the filters to filter out all the LVM
layout from guest's disks that is not supposed to be visible on
host side that may interfere heavily with the LVM layout on the
host then (e.g. same VG/LV names used inside guest as in host).

There's a new feature called "systemid" in LVM which got included
in lvm2 v2.02.117. This one can be also used to solve this issue
(without a need to define filters). Check also
https://git.fedorahosted.org/cgit/lvm2.git/tree/man/lvmsystemid.7.in.

The trick here is that metadata are marked with an ID of where
they were created and you can effectively use this to filter out
guest LVs automatically (and vice versa) so these two environments
do not interfere with each other...
--
Peter
Peter Rajnoha
2015-09-07 13:35:32 UTC
Permalink
Post by Peter Rajnoha
Just one final thought. A second reason we deliberately exclude those
iSCSI devices is that they're actually the drives backing customer VMs,
so any LVM metadata on them should be interpreted by an untrusted guest
kernel and not by the host. Untrusted third parties have complete
control over the contents of the block devices.
Is LVM well-secured against attacks from block devices containing
malicious LVM metadata? If not, an unexpected change in filtering
behaviour might potentially be a security issue in some environments.
Before, we advised use of the filters to filter out all the LVM
layout from guest's disks that is not supposed to be visible on
host side that may interfere heavily with the LVM layout on the
host then (e.g. same VG/LV names used inside guest as in host).
There's a new feature called "systemid" in LVM which got included
in lvm2 v2.02.117. This one can be also used to solve this issue
(without a need to define filters). Check also
https://git.fedorahosted.org/cgit/lvm2.git/tree/man/lvmsystemid.7.in.
Of course, from security point of view, you need to take care that
your systemid is not stolen so that someone doesn't fake metadata inside
guest with that systemid.

There are several sources for systemid and you can choose which one you
want to use so it's pretty configurable. One of the sources is completely
in your own hands - the "lvmlocal.conf" settings where you can define systemid
of your own (may it be a long, random and very hard to guess string).
--
Peter
Peter Rajnoha
2015-09-07 14:12:28 UTC
Permalink
Post by Peter Rajnoha
Post by Peter Rajnoha
Just one final thought. A second reason we deliberately exclude those
iSCSI devices is that they're actually the drives backing customer VMs,
so any LVM metadata on them should be interpreted by an untrusted guest
kernel and not by the host. Untrusted third parties have complete
control over the contents of the block devices.
Is LVM well-secured against attacks from block devices containing
malicious LVM metadata? If not, an unexpected change in filtering
behaviour might potentially be a security issue in some environments.
Before, we advised use of the filters to filter out all the LVM
layout from guest's disks that is not supposed to be visible on
host side that may interfere heavily with the LVM layout on the
host then (e.g. same VG/LV names used inside guest as in host).
There's a new feature called "systemid" in LVM which got included
in lvm2 v2.02.117. This one can be also used to solve this issue
(without a need to define filters). Check also
https://git.fedorahosted.org/cgit/lvm2.git/tree/man/lvmsystemid.7.in.
Of course, from security point of view, you need to take care that
your systemid is not stolen so that someone doesn't fake metadata inside
guest with that systemid.
There are several sources for systemid and you can choose which one you
want to use so it's pretty configurable. One of the sources is
completely
in your own hands - the "lvmlocal.conf" settings where you can define systemid
of your own (may it be a long, random and very hard to guess string).
Yes, you caught me halfway through replying that guests might be
malicious rather than accidental! Given there's space for enough entropy
in there to secure it, it sounds like an excellent idea. I assume it's
possible to tell LVM to ignore PVs with no systemid as well as a
systemid that doesn't match the secret one?
Out of curiosity, at what level do you filter PVs based on this
systemid? Is it a fixed offset byte string in the PV header, or do you
have to do quite a bit of metadata parsing before you can ignore the PV?
(I'm just wondering what the security exposure from malicious
foreign/non-systemid PVs is like.)
Well, the systemid is written in VG metadata. So yes, we need to read
VG metadata to decide then based on systemid value whether it's going
to be processed further or not.

In this case you need to weigh pros and cons. The systemid is more
automatic, but it doesn't prevent the VG metadata to be read (but
lvmetad can help with its caching here). If you go the "set proper
filters" way, you need to specify very clearly which devices to allow
and which not - the best is to allow only specific devices and disallow
everything else on the host. But then, sooner or later, when you want
to add more disks to your system/host, you have to update the filters
accordingly all the time.
--
Peter
Continue reading on narkive:
Search results for '[linux-lvm] devices.filter changed behaviour in 80ac8f37d6' (Questions and Answers)
13
replies
Any GOOD SCIENCE FAIR PROJECT IDEAS?!?!?!?!?!?
started 2006-05-15 17:22:42 UTC
science & mathematics
Loading...