[linux-lvm] auto_activation_volume_list in lvm.conf not honored

Discussion:

Stefan Bauer

2016-11-24 12:21:15 UTC

hi folks,

howto avoid pvescan to initialize lvm volume groups on startup (it's for a cluster setup)? auto_activation_volume_list was filled with the remaining VGs we want to setup. Manually it does what it should

/sbin/lvm pvescan --config 'activation { auto_activation_volume_list = "vg2" }' only activates vg2 but keeps vg1 "untouched".

But on system startup, all VGs are activated. (ubuntu 14.04.5 LTS)

We also updated the ramdisk and verified, the changes are also in the ramdisks lvm.conf

However vg1 is also enabled.

Any help is greatly appreciated.

Stefan

Peter Rajnoha

2016-11-24 13:38:23 UTC

Permalink

Post by Stefan Bauer
hi folks,
howto avoid pvescan to initialize lvm volume groups on startup (it's for a cluster setup)? auto_activation_volume_list was filled with the remaining VGs we want to setup. Manually it does what it should
/sbin/lvm pvescan --config 'activation { auto_activation_volume_list = "vg2" }' only activates vg2 but keeps vg1 "untouched".
But on system startup, all VGs are activated. (ubuntu 14.04.5 LTS)
We also updated the ramdisk and verified, the changes are also in the ramdisks lvm.conf
However vg1 is also enabled.
Any help is greatly appreciated.

It's important that all scripts which handle LVM activation at boot are
executed with vgchange -aay that honours the auto_activation_volume_list
(so not "vgchange -ay").

All init scripts and systemd units which upstream LVM2 provides are
executed with "-aay" already.

You mentioned cluster setup - so are your VGs clustered and are you
using clvmd? If that's the case, the clustered VGs are activated either
by clvmd init script/systemd unit or external cluster resource agent
(e.g. pacemaker and clvm ocf file) which calls the vgchange to activate
the clustered VGs - that one needs to use "-aay" too.

--
Peter

Peter Rajnoha

2016-11-24 13:55:50 UTC

Permalink

Post by Peter Rajnoha

It's important that all scripts which handle LVM activation at boot are
executed with vgchange -aay that honours the auto_activation_volume_list
(so not "vgchange -ay").
All init scripts and systemd units which upstream LVM2 provides are
executed with "-aay" already.
You mentioned cluster setup - so are your VGs clustered and are you
using clvmd? If that's the case, the clustered VGs are activated either
by clvmd init script/systemd unit or external cluster resource agent
(e.g. pacemaker and clvm ocf file) which calls the vgchange to activate
the clustered VGs - that one needs to use "-aay" too.

I looked at Ubuntu specific environment and I can see there's
/lib/udev/rules.d/85-lvm2.rules with:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}="lvm*|LVM*",
RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"

So that "watershed" should use vgchange -aay. Please report this for
Ubuntu directly for them to fix this (as the "watershed" helper binary
is specific to Debian/Ubuntu only).

--
Peter

Stefan Bauer

2016-11-24 14:02:32 UTC

Permalink

Hi Peter,

now all make sense. On this ubuntu machine upstartd with udev is taking care of vgchange.

After some digging, /lib/udev/rules.d/85-lvm2.rules shows, that vgchange is only executed with -a y

We will test this on weekend but I'm certain now, that this was the problem.

We wanted to keep things simple and do not use clvmd. We have master/slave setup without concurrent write/read.

So according to your documentations ;) it should work without clvmd.

Thank you!

Stefan

-----Ursprüngliche Nachricht-----

Gesendet: Don 24 November 2016 14:39
Betreff: Re: [linux-lvm] auto_activation_volume_list in lvm.conf not honored

It's important that all scripts which handle LVM activation at boot are
executed with vgchange -aay that honours the auto_activation_volume_list
(so not "vgchange -ay").
All init scripts and systemd units which upstream LVM2 provides are
executed with "-aay" already.
You mentioned cluster setup - so are your VGs clustered and are you
using clvmd? If that's the case, the clustered VGs are activated either
by clvmd init script/systemd unit or external cluster resource agent
(e.g. pacemaker and clvm ocf file) which calls the vgchange to activate
the clustered VGs - that one needs to use "-aay" too.
--
Peter

Zdenek Kabelac

2016-11-25 08:48:38 UTC

Permalink

Post by Stefan Bauer
Hi Peter,
now all make sense. On this ubuntu machine upstartd with udev is taking care of vgchange.
After some digging, /lib/udev/rules.d/85-lvm2.rules shows, that vgchange is only executed with -a y
We will test this on weekend but I'm certain now, that this was the problem.
We wanted to keep things simple and do not use clvmd. We have master/slave setup without concurrent write/read.

Hi

Wondering what are you trying to do when you say 'not use clvmd'.

If you are working with 'shared' storage and manipulating same VG from
multiple nodes (i.e. activation) - it's not so easy to go without really good
locking manager.

If you don't like clvmd - maybe you could take a look at lvmlockd & sanlock.

But you should be aware there are NOT many people who are able to ensure
correct locking of lvm2 commands - it's really not just 'master/slave'.
There are big number of error cases which do not proper locking around
the whole cluster.

Regards

Zdenek

Stefan Bauer

2016-11-24 14:20:20 UTC

Permalink

Its already gone in newer releases of Ubuntu. They replaced upstartd with sytemd. The systemd scripts look sane and use -aay.

Stefan

-----Ursprüngliche Nachricht-----

Post by Peter Rajnoha
SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}="lvm*|LVM*",
RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"
So that "watershed" should use vgchange -aay. Please report this for
Ubuntu directly for them to fix this (as the "watershed" helper binary
is specific to Debian/Ubuntu only).

Stefan Bauer

2016-11-25 09:17:02 UTC

Permalink

Hi Peter,

as i said, we have master/slave setup _without_ concurrent write/read. So i do not see a reason why i should take care of locking as only one node is activating the volume group at the same time.

That should be fine - right?

Stefan
-----Ursprüngliche Nachricht-----

If you are working with 'shared' storage and manipulating same VG from
multiple nodes (i.e. activation) - it's not so easy to go without really good
locking manager.
If you don't like clvmd - maybe you could take a look at lvmlockd & sanlock.
But you should be aware there are NOT many people who are able to ensure
correct locking of lvm2 commands - it's really not just 'master/slave'.
There are big number of error cases which do not proper locking around
the whole cluster.

Zdenek Kabelac

2016-11-25 09:30:39 UTC

Permalink

Post by Stefan Bauer
Hi Peter,
as i said, we have master/slave setup _without_ concurrent write/read. So i do not see a reason why i should take care of locking as only one node is activating the volume group at the same time.
That should be fine - right?

Nope it's not.

Every i.e. activation DOES validation of all resources and takes ACTION when
something is wrong.

Sorry, but there is NO way to do this properly without locking manager.

Although many lvm2 users always do try to be 'innovative' and try to use in
lock-less way - this seems to work most of the time - till the moment some
disaster happens - then just lvm2 is blamed about data loss..

Interestingly they never tried to think why we invested so much time into
locking manager when there is such 'easy-fix' in their eyes...

IMHO lvmlockd is relatively 'low-resource/overhead' solution worth to be
explored if you don't like clvmd...

Regards

Zdenek

David Teigland

2016-11-28 22:24:03 UTC

Permalink

Post by Zdenek Kabelac

Nope it's not.
Every i.e. activation DOES validation of all resources and takes ACTION
when something is wrong.
Sorry, but there is NO way to do this properly without locking manager.
Although many lvm2 users always do try to be 'innovative' and try to use in
lock-less way - this seems to work most of the time - till the moment some
disaster happens - then just lvm2 is blamed about data loss..
Interestingly they never tried to think why we invested so much time into
locking manager when there is such 'easy-fix' in their eyes...
IMHO lvmlockd is relatively 'low-resource/overhead' solution worth to be
explored if you don't like clvmd...

Stefan, as Zdenek points out, even reading VGs on shared storage is not
entirely safe, because lvm may attempt to fix/repair things on disk while
it is reading (this becomes more likely if one machine reads while another
is making changes). Using some kind of locking or clustering (lvmlockd or
clvm) is a solution.

Another fairly new option is to use "system ID", which assigns one host as
the owner of the VG. This avoids the problems mentioned above with
reading->fixing. But, system ID on its own cannot be used dynamically.
If you want to fail-over the VG between hosts, the system ID needs to be
changed, and this needs to be done carefully (e.g. by a resource manager
or something that takes fencing into account,
https://bugzilla.redhat.com/show_bug.cgi?id=1336346#c2)

Also https://www.redhat.com/archives/linux-lvm/2016-November/msg00022.html

Dave

Stefan Bauer

2016-12-02 07:07:10 UTC

Permalink

Tried now clvmd to set it up but fails with

Dec 2 07:09:46 vm1 LVM(vg)[1278]: ERROR: connect() failed on local socket: No such file or directory Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Reading all physical volumes. This may take a while... Found volume group "vm1-vg" using metadata type lvm2 Skipping clustered volume group vg
Dec 2 07:09:46 vm1 LVM(vg)[1278]: ERROR: connect() failed on local socket: No such file or directory Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group vg
Dec 2 07:09:46 vm1 crmd[1075]: notice: process_lrm_event: LRM operation vg_start_0 (call=22, rc=1, cib-update=26, confirmed=true) unknown error
Dec 2 07:09:46 vm1 crmd[1075]: warning: status_from_rc: Action 23 (vg_start_0) on vm1 failed (target: 0 vs. rc: 1): Error
Dec 2 07:09:46 vm1 crmd[1075]: warning: update_failcount: Updating failcount for vg on vm1 after failed start: rc=1 (update=INFINITY, time=1480658986)
Dec 2 07:09:46 vm1 attrd[1073]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-vg (INFINITY)

If i do a cleanup of the ressource - it is started.

syslog attached.

Any help is greatly appreciated.

Thank you.

Stefan
-----UrsprÃŒngliche Nachricht-----

Gesendet: Mon 28 November 2016 23:14
Betreff: Re: [linux-lvm] auto_activation_volume_list in lvm.conf not honored

Post by Zdenek Kabelac

Nope it's not.
Every i.e. activation DOES validation of all resources and takes ACTION
when something is wrong.
Sorry, but there is NO way to do this properly without locking manager.
Although many lvm2 users always do try to be 'innovative' and try to use in
lock-less way - this seems to work most of the time - till the moment some
disaster happens - then just lvm2 is blamed about data loss..
Interestingly they never tried to think why we invested so much time into
locking manager when there is such 'easy-fix' in their eyes...
IMHO lvmlockd is relatively 'low-resource/overhead' solution worth to be
explored if you don't like clvmd...

Stefan, as Zdenek points out, even reading VGs on shared storage is not
entirely safe, because lvm may attempt to fix/repair things on disk while
it is reading (this becomes more likely if one machine reads while another
is making changes). Using some kind of locking or clustering (lvmlockd or
clvm) is a solution.
Another fairly new option is to use "system ID", which assigns one host as
the owner of the VG. This avoids the problems mentioned above with
reading->fixing. But, system ID on its own cannot be used dynamically.
If you want to fail-over the VG between hosts, the system ID needs to be
changed, and this needs to be done carefully (e.g. by a resource manager
or something that takes fencing into account,
https://bugzilla.redhat.com/show_bug.cgi?id=1336346#c2)
Also https://www.redhat.com/archives/linux-lvm/2016-November/msg00022.html
Dave
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/