[linux-lvm] LVM VG is not activated during system boot

Post by MegaBrutal
Hi all,
http://askubuntu.com/questions/542656/lvm-vg-is-not-activated-during-system-boot
Unfortunately, I didn't receive any answers.
I don't know if my problem is distro-specific or not, but I try to ask
here too.
I have 2 VGs on my system, and for some reason, only one of them gets
activated during the initrd boot sequence, which doesn't have my root
LV, so my boot sequence halts with an initrd prompt. I already specified
in lvm.conf that I wish both my VGs to be auto-activated with
auto_activation_volume_list, but it has no effect.
Could anyone please advise what might be the problem?

Hi!

What's the exact lvm2 version used (lvm --version)?

Is lvmetad enabled in your setup? (global/use_lvmetad=1 setting
in lvm.conf and lvmetad daemon running?)

Does it activate when you run vgchange -aay vmdata-vg vmhost-vg
directly on the busybox cmd line?

--
Peter

Peter Rajnoha

2014-11-25 14:33:34 UTC

Post by Peter Rajnoha
What's the exact lvm2 version used (lvm --version)?
LVM version: 2.02.98(2) (2012-10-15)
Library version: 1.02.77 (2012-10-15)
Driver version: 4.27.0
Is lvmetad enabled in your setup? (global/use_lvmetad=1 setting
in lvm.conf and lvmetad daemon running?)
use_lvmetad = 0
No such daemon is running.

This means that LV autoactivation is not enabled in that case too
(as it depends on lvmetad to be active) and there must a direct
call for the activation (vgchange/lvchange -ay/-aay)

However, most distributions do not use lvmetad in initrd anyway
(the only I know of at the moment is Arch Linux). As such, I think
this is a problem with distribution's initrd that is not waiting
properly for all PVs to show up and it calls LV activation prematurely.
I'd report your issue to your distribution's initrd component as each
distribution uses its own initrd scheme (I could help you with Fedora's
dracut initrd, but I don't see into Debian's/Ubuntu initrd scheme).

Post by Peter Rajnoha
Does it activate when you run vgchange -aay vmdata-vg vmhost-vg
directly on the busybox cmd line?
The exact command I used to use in the BusyBox prompt is
lvm vgchange -ay vmhost-vg
Or, if I remember correctly, it activates simply by
lvm vgchange -ay
as well.
Then I exit the BusyBox prompt, and the boot process continues correctly.
My root FS is in vmhost-vg, and I have no idea why it doesn't come up
automatically.

Yeah, it all points to premature vgchange call in initrd's script.
Please, report this in your distribution's bug tracking system if
possible.

--
Peter

MegaBrutal

2014-11-25 15:54:38 UTC

This means that LV autoactivation is not enabled in that case too
(as it depends on lvmetad to be active) and there must a direct
call for the activation (vgchange/lvchange -ay/-aay)
However, most distributions do not use lvmetad in initrd anyway
(the only I know of at the moment is Arch Linux). As such, I think
this is a problem with distribution's initrd that is not waiting
properly for all PVs to show up and it calls LV activation prematurely.
I'd report your issue to your distribution's initrd component as each
distribution uses its own initrd scheme (I could help you with Fedora's
dracut initrd, but I don't see into Debian's/Ubuntu initrd scheme).

Yeah, it all points to premature vgchange call in initrd's script.
Please, report this in your distribution's bug tracking system if
possible.

Thanks for the advice!
I opened a Launchpad report here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213

Daniel Savard

2014-11-25 16:15:49 UTC

What are your kernel boot options? Do you specify the VGs you wish to
be activated at boot time there?

I have one entry like this one for each VG: rd.lvm.vg=vgname
-----------------
Daniel Savard

This means that LV autoactivation is not enabled in that case too
(as it depends on lvmetad to be active) and there must a direct
call for the activation (vgchange/lvchange -ay/-aay)
However, most distributions do not use lvmetad in initrd anyway
(the only I know of at the moment is Arch Linux). As such, I think
this is a problem with distribution's initrd that is not waiting
properly for all PVs to show up and it calls LV activation prematurely.
I'd report your issue to your distribution's initrd component as each
distribution uses its own initrd scheme (I could help you with Fedora's
dracut initrd, but I don't see into Debian's/Ubuntu initrd scheme).

Yeah, it all points to premature vgchange call in initrd's script.
Please, report this in your distribution's bug tracking system if
possible.

Thanks for the advice!
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

MegaBrutal

2014-11-25 17:00:06 UTC

No, I don't have such kernel option.

But previously it was working without that. Aren't all volume groups
supposed to auto-activate, unless I set otherwise in lvm.conf?

I'll try this kernel option, however.

I have a "rootdelay" set to make initrd wait longer for the boot
device. With previous kernels, it worked, but now no matter how long I
set this value, the VG never activates. It only activates when I
manually activate it from the initrd prompt.

Post by Daniel Savard
What are your kernel boot options? Do you specify the VGs you wish to
be activated at boot time there?
I have one entry like this one for each VG: rd.lvm.vg=vgname
-----------------
Daniel Savard

This means that LV autoactivation is not enabled in that case too
(as it depends on lvmetad to be active) and there must a direct
call for the activation (vgchange/lvchange -ay/-aay)
However, most distributions do not use lvmetad in initrd anyway
(the only I know of at the moment is Arch Linux). As such, I think
this is a problem with distribution's initrd that is not waiting
properly for all PVs to show up and it calls LV activation prematurely.
I'd report your issue to your distribution's initrd component as each
distribution uses its own initrd scheme (I could help you with Fedora's
dracut initrd, but I don't see into Debian's/Ubuntu initrd scheme).

Yeah, it all points to premature vgchange call in initrd's script.
Please, report this in your distribution's bug tracking system if
possible.

Thanks for the advice!
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213

MegaBrutal

2015-03-19 18:34:31 UTC

Now I think I figured out what's going on. It seems to be
Debian/Ubuntu specific, but I post here, maybe Debian/Ubuntu devs are
here and see this.

Bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213

What actually makes the VG activation so long is that I have a
snapshot. Activating the snapshot takes very long, and bringing up the
entire VG takes about 5 minutes. This wouldn't be such a big problem,
as I could just patiently wait for the activation (with rootdelay).
But it seems something (maybe some kind of watchdog) kills vgchange
before it could finish bringing up all VGs. I had the fortune to boot
a developmental Vivid, and I've seen some 'watershed' messages stating
that 'vgchange' was killed because it was taking "too long". If we'd
let 'vgchange' to finish properly, I had the 2nd VG activated
properly, which contains my root FS.

It's a server. If it has a long boot time, so be it, it doesn't get
rebooted often anyway during normal circumstances. But it is required
to boot up without user interaction, e.g., when I issue a reboot
remotely. The main problem is that currently, user interaction is
necessary to pass initrd (as the root VG needs to be manually
activated), which means, I can only reboot the server when I'm
physically near.

Post by MegaBrutal
No, I don't have such kernel option.
But previously it was working without that. Aren't all volume groups
supposed to auto-activate, unless I set otherwise in lvm.conf?
I'll try this kernel option, however.
I have a "rootdelay" set to make initrd wait longer for the boot
device. With previous kernels, it worked, but now no matter how long I
set this value, the VG never activates. It only activates when I
manually activate it from the initrd prompt.

This means that LV autoactivation is not enabled in that case too
(as it depends on lvmetad to be active) and there must a direct
call for the activation (vgchange/lvchange -ay/-aay)
However, most distributions do not use lvmetad in initrd anyway
(the only I know of at the moment is Arch Linux). As such, I think
this is a problem with distribution's initrd that is not waiting
properly for all PVs to show up and it calls LV activation prematurely.
I'd report your issue to your distribution's initrd component as each
distribution uses its own initrd scheme (I could help you with Fedora's
dracut initrd, but I don't see into Debian's/Ubuntu initrd scheme).

Yeah, it all points to premature vgchange call in initrd's script.
Please, report this in your distribution's bug tracking system if
possible.

Thanks for the advice!
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213

Stuart Gathman

2015-03-19 19:06:21 UTC

Post by MegaBrutal
Now I think I figured out what's going on. It seems to be
Debian/Ubuntu specific, but I post here, maybe Debian/Ubuntu devs are
here and see this.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213
What actually makes the VG activation so long is that I have a
snapshot. Activating the snapshot takes very long, and bringing up the
entire VG takes about 5 minutes. This wouldn't be such a big problem,
as I could just patiently wait for the activation (with rootdelay).
But it seems something (maybe some kind of watchdog) kills vgchange
before it could finish bringing up all VGs. I had the fortune to boot
a developmental Vivid, and I've seen some 'watershed' messages stating
that 'vgchange' was killed because it was taking "too long". If we'd
let 'vgchange' to finish properly, I had the 2nd VG activated
properly, which contains my root FS.
It's a server. If it has a long boot time, so be it, it doesn't get
rebooted often anyway during normal circumstances. But it is required
to boot up without user interaction, e.g., when I issue a reboot
remotely. The main problem is that currently, user interaction is
necessary to pass initrd (as the root VG needs to be manually
activated), which means, I can only reboot the server when I'm
physically near.

I've had the same problem on Fedora - so it is not ubuntu specific. On
Fedora, systemd has a timeout for VG activation. That can be
increased. However, you can flag the snapshot to *not* be activated
automatically (Skip activation: -k) at volume group activation. That
allows the system to boot (remote reboot), and then you can manually
activate the big snapshot (or automatically in a later script) - waiting
the requisite 5 or 10 minutes.

MegaBrutal

2015-03-20 03:56:54 UTC

Post by Stuart Gathman
I've had the same problem on Fedora - so it is not ubuntu specific. On
Fedora, systemd has a timeout for VG activation. That can be increased.
However, you can flag the snapshot to *not* be activated automatically (Skip
activation: -k) at volume group activation. That allows the system to boot
(remote reboot), and then you can manually activate the big snapshot (or
automatically in a later script) - waiting the requisite 5 or 10 minutes.

Thank you very much! It seems to be a nice workaround. Although, I
can't live with it as Utopic has old LVM version which doesn't have
this feature. (Though Vivid will have it.)

Anyway, is it a wild idea to prioritize the activation of the root
device in initrd?

Zdenek Kabelac

2015-03-20 08:30:54 UTC

I've had the same problem on Fedora - so it is not ubuntu specific. On Fedora,
systemd has a timeout for VG activation. That can be increased. However, you
-k) at volume group activation. That allows the system to boot (remote
reboot), and then you can manually activate the big snapshot (or automatically
in a later script) - waiting the requisite 5 or 10 minutes.

Hi

To clean some 'myths' here:

With old-snapshots (non-thin pool based) you cannot skip activation of
snapshot - origin and all its snapshots always have to be active.

If the old-snapshot is larger (in range of GB) it takes quite some time to
process whole COW volume and read and parse all metadata stored there.
The metadata format for an old snapshot has been targeted to be small and
occupy minimum extra space - thus if someone keeps it permanently and stores
there a lot of data it's very very very inefficient.

This problem with old snaps basically cannot be fixed unless there would be a
completely different new snapshot target ;)

So the advised fix for long term snapshot is to switch to use thin-pool.

Here you will have all the goodies - very fast and efficient snapshots, you
could easily take snapshot of snapshot and you could also select if you want
to have snapshot activated or not (and by default it's skipped from activation).

Regards

Zdenek

MegaBrutal

2015-03-20 16:52:24 UTC

I resend this message in plaintext, as I think the previous
HTML-formatted one didn't reach the list, as I don't see it in the
archives.

Post by Zdenek Kabelac
This problem with old snaps basically cannot be fixed unless there would be
a completely different new snapshot target ;)

I understand this is a by-design feature of LVM snapshots, and I
accept it as-is. The problem for me is not the long activation time,
but the fact that initrd stops waiting for the full activation to
complete. It gives up and kills the vgchange process.

I understand initrd behaviour may be distro-specific.

Post by Zdenek Kabelac
So the advised fix for long term snapshot is to switch to use thin-pool.
Here you will have all the goodies - very fast and efficient snapshots, you
could easily take snapshot of snapshot and you could also select if you want
to have snapshot activated or not (and by default it's skipped from activation).

Well, thin pools seem nice, but I don't see them being matured enough
for production use. For example, basic VG management like reducing or
splitting a VG failed for me when a thin pool was present in the VG,
even if the split would have not affected the thin pool itself. I also
failed to get rid of missing PVs when a thin pool would have been
affected – while „vgreduce --removemissing --force” promises to remove
affected LVs, it couldn't get rid of the thin pool. When I posted a
thread about the issue here, no one bothered to answer.
(It is here: https://www.redhat.com/archives/linux-lvm/2015-March/msg00001.html)

Also, as far as I know, thin snapshots can only be created of thin LVs
residing in the same thin pool. That means, you can't make a thin
snapshot of any LV, and one shouldn't keep critical LVs (e.g. root FS)
in thin pools.

Zdenek Kabelac

2015-03-20 19:24:56 UTC

Post by MegaBrutal
I resend this message in plaintext, as I think the previous
HTML-formatted one didn't reach the list, as I don't see it in the
archives.

Post by Zdenek Kabelac
This problem with old snaps basically cannot be fixed unless there would be
a completely different new snapshot target ;)

I understand this is a by-design feature of LVM snapshots, and I
accept it as-is. The problem for me is not the long activation time,
but the fact that initrd stops waiting for the full activation to
complete. It gives up and kills the vgchange process.
I understand initrd behaviour may be distro-specific.

Yes we try to fight this 'timeout' battle all the time - but we
are likely on the 'bad-side' of table.

Lot of people seem to believe timeouts will solve their problems :)

Anyway - you could either configure bigger timeout in your distro or you could
pvmove COW devices on fast disks (SSDs) in your VG (if you have one) or simply
not use snapshot of larger sizes (which is IMHO always good idea).

Fill bug please...

Post by MegaBrutal
Also, as far as I know, thin snapshots can only be created of thin LVs
residing in the same thin pool. That means, you can't make a thin
snapshot of any LV, and one shouldn't keep critical LVs (e.g. root FS)
in thin pools.

Yes there are going to be some more improvements on thin-pool side, to better
support logic that you have always some fully-provisioned origin, just
snapshots may fail.

Regards

Zdenek

MegaBrutal

2015-03-20 14:13:18 UTC