Discussion:
[linux-lvm] autoactivation fails
Christian Hesse
2015-05-21 12:19:55 UTC
Permalink
Hello everybody,

with recent lvm2 I had problems booting an Arch Linux system with non-systemd
initramfs. lvmetad is launched, but volumes are not activated. A git bisect
reported this bad commit:

commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
Author: Ondrej Kozina <***@redhat.com>
Date: Mon Apr 13 16:29:15 2015 +0200

toollib: close connection to lvmetad after fork

sharing connection between parent command and background
processes spawned from parent could lead to occasional failures
due to unexpected corruption in daemon responses sent to either child
or a parent.

lvmetad issued warning about duplicate config values in request.
LVM commands occasionaly failed w/ internal error after receving
corrupted response.

lvmetad connection is renewed when needed after explicit disconnect
in child

Reverting this single commit makes everything work with git master
(v2_02_120-36-gba68aed) again.
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Ondrej Kozina
2015-05-21 15:09:38 UTC
Permalink
Hi Christian,
Post by Christian Hesse
Hello everybody,
with recent lvm2 I had problems booting an Arch Linux system with non-systemd
initramfs. lvmetad is launched, but volumes are not activated. A git bisect
commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
I would like to ask you for more information about the issue. First,
have you managed to collect any error messages during the init phase
where it's supposed to fail?

The thing about that particular commit is it could possibly break
background polling, but I don't see (yet) how it could have damaged the
activation code. Have you tried to reboot a system where any operation
like: pvmove, lvconvert --merge or lvconvert mirror conversion was in
progress in background before you rebooted the system by any chance?

Second step would be to unmute the eventual background processes spawned
during activation. To do it just uncomment

// #define DEBUG_CHILD

in tools/toollib.c:76 lvm2 sources and rebuild.

This should expose all error messages from within the forked off
processes. Also, if you could perhaps give me some hints how to get the
Arch Linux in the same state as when you experienced the issue it would
help us as well.

Regards
Ondrej
Christian Hesse
2015-05-21 18:57:34 UTC
Permalink
Post by Ondrej Kozina
Hi Christian,
Post by Christian Hesse
Hello everybody,
with recent lvm2 I had problems booting an Arch Linux system with
non-systemd initramfs. lvmetad is launched, but volumes are not
commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
I would like to ask you for more information about the issue. First,
have you managed to collect any error messages during the init phase
where it's supposed to fail?
The thing about that particular commit is it could possibly break
background polling, but I don't see (yet) how it could have damaged the
activation code. Have you tried to reboot a system where any operation
like: pvmove, lvconvert --merge or lvconvert mirror conversion was in
progress in background before you rebooted the system by any chance?
No. And I see this on different systems. It's perfectly reproducable.
Post by Ondrej Kozina
Second step would be to unmute the eventual background processes spawned
during activation. To do it just uncomment
// #define DEBUG_CHILD
in tools/toollib.c:76 lvm2 sources and rebuild.
This should expose all error messages from within the forked off
processes.
Defining DEBUG_CHILD does not bring any changes... Not sure where output goes
if there is any.

However it looks like the issue is not in lvmetad itself but triggers one
in lvm. lvmetad is running with pid 69, but I see a log entry about
segmentation fault of pid 121:

lvm[121]: segfault at 58 ip 00007f2f24197d20 sp 00007ffe3b499b18 error 4 in
libc.so.6[7f2f24161000+1990000]

Not sure what command lvm is running, though. Probably pvscan, no?

BTW, if I run 'vgchange -ay' (well, actually 'lvm vgchange -ay' as it is a
multicall binary) in rescue shell the volumes are activated and boot
continues.
Post by Ondrej Kozina
Also, if you could perhaps give me some hints how to get the
Arch Linux in the same state as when you experienced the issue it would
help us as well.
I could upload a disk image that demonstrates the issue... Any preferences
about the format?
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Christian Hesse
2015-05-21 19:50:03 UTC
Permalink
Post by Christian Hesse
Post by Ondrej Kozina
Hi Christian,
Post by Christian Hesse
Hello everybody,
with recent lvm2 I had problems booting an Arch Linux system with
non-systemd initramfs. lvmetad is launched, but volumes are not
commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
I would like to ask you for more information about the issue. First,
have you managed to collect any error messages during the init phase
where it's supposed to fail?
The thing about that particular commit is it could possibly break
background polling, but I don't see (yet) how it could have damaged the
activation code. Have you tried to reboot a system where any operation
like: pvmove, lvconvert --merge or lvconvert mirror conversion was in
progress in background before you rebooted the system by any chance?
No. And I see this on different systems. It's perfectly reproducable.
Post by Ondrej Kozina
Second step would be to unmute the eventual background processes spawned
during activation. To do it just uncomment
// #define DEBUG_CHILD
in tools/toollib.c:76 lvm2 sources and rebuild.
This should expose all error messages from within the forked off
processes.
Defining DEBUG_CHILD does not bring any changes... Not sure where output
goes if there is any.
However it looks like the issue is not in lvmetad itself but triggers one
in lvm. lvmetad is running with pid 69, but I see a log entry about
lvm[121]: segfault at 58 ip 00007f2f24197d20 sp 00007ffe3b499b18 error 4 in
libc.so.6[7f2f24161000+1990000]
Not sure what command lvm is running, though. Probably pvscan, no?
BTW, if I run 'vgchange -ay' (well, actually 'lvm vgchange -ay' as it is a
multicall binary) in rescue shell the volumes are activated and boot
continues.
Post by Ondrej Kozina
Also, if you could perhaps give me some hints how to get the
Arch Linux in the same state as when you experienced the issue it would
help us as well.
I could upload a disk image that demonstrates the issue... Any preferences
about the format?
Ok, here we go...

http://www.eworm.de/tmp/lvm2.vdi.gz

This is is Virtualbox disk image, compressed with bzip2.

It just contains boot loader, kernel and initramfs, but it demonstrates the
issue. Have fun! ;)
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Ondrej Kozina
2015-05-22 06:44:23 UTC
Permalink
Post by Christian Hesse
However it looks like the issue is not in lvmetad itself but triggers one
in lvm. lvmetad is running with pid 69, but I see a log entry about
lvm[121]: segfault at 58 ip 00007f2f24197d20 sp 00007ffe3b499b18 error 4 in
libc.so.6[7f2f24161000+1990000]
Not sure what command lvm is running, though. Probably pvscan, no?
Yes, it's pvscan --background in non-systemd environment what segfaults.
Thanks for the report! I'll notify you about patch when ready.

Regards
Ondrej
Ondrej Kozina
2015-05-25 07:16:15 UTC
Permalink
Post by Christian Hesse
Hello everybody,
with recent lvm2 I had problems booting an Arch Linux system with non-systemd
initramfs. lvmetad is launched, but volumes are not activated. A git bisect
commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
Date: Mon Apr 13 16:29:15 2015 +0200
Fixed by commit:

commit f8bf6410954fcf82bf28852e0dba015c6b7f19dc
Author: Ondrej Kozina <***@redhat.com>
Date: Fri May 22 14:48:28 2015 +0200

lvmetad.c: ignore lvmetad global handle on disconnect

do not unset lvmetad global handle on disconnect. This is
hotfix for issue described in:
https://www.redhat.com/archives/linux-lvm/2015-May/msg00008.html

Reported-by: Christian Hesse <***@eworm.de>

pvscan --cache --background segfaulted due to NULL ptr dereference.

Thanks for the report!
Ondrej
Christian Hesse
2015-05-25 16:36:50 UTC
Permalink
Post by Ondrej Kozina
Post by Christian Hesse
Hello everybody,
with recent lvm2 I had problems booting an Arch Linux system with
non-systemd initramfs. lvmetad is launched, but volumes are not
commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
Date: Mon Apr 13 16:29:15 2015 +0200
commit f8bf6410954fcf82bf28852e0dba015c6b7f19dc
Date: Fri May 22 14:48:28 2015 +0200
lvmetad.c: ignore lvmetad global handle on disconnect
do not unset lvmetad global handle on disconnect. This is
https://www.redhat.com/archives/linux-lvm/2015-May/msg00008.html
pvscan --cache --background segfaulted due to NULL ptr dereference.
Thanks for the report!
Looks good. Thanks a lot!
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Christian Hesse
2015-05-26 08:06:20 UTC
Permalink
Post by Christian Hesse
Post by Ondrej Kozina
Post by Christian Hesse
Hello everybody,
with recent lvm2 I had problems booting an Arch Linux system with
non-systemd initramfs. lvmetad is launched, but volumes are not
commit fe30658a4d5fe4e4e6bb346c9c9ee7142a98f49d
Date: Mon Apr 13 16:29:15 2015 +0200
commit f8bf6410954fcf82bf28852e0dba015c6b7f19dc
Date: Fri May 22 14:48:28 2015 +0200
lvmetad.c: ignore lvmetad global handle on disconnect
do not unset lvmetad global handle on disconnect. This is
https://www.redhat.com/archives/linux-lvm/2015-May/msg00008.html
pvscan --cache --background segfaulted due to NULL ptr dereference.
Thanks for the report!
Looks good. Thanks a lot!
Actually running v2.02.220 with commit f8bf6410 on top. Volumes are
activated, but udev reports:

timeout, giving up waiting for workers to finish

This adds a boot delay of 30 seconds. Not perfectly sure if this is an lvm2
issue though, chances are that this is caused by systemd v220 update. Any
ideas?
--
Best regards,
Chris
Ondrej Kozina
2015-05-26 08:49:27 UTC
Permalink
Post by Christian Hesse
Actually running v2.02.220 with commit f8bf6410 on top. Volumes are
timeout, giving up waiting for workers to finish
This adds a boot delay of 30 seconds. Not perfectly sure if this is an lvm2
issue though, chances are that this is caused by systemd v220 update. Any
ideas?
Could you try to boot with udev in debug mode? I'm not sure if it's same
as with Fedora distribution. It should be enough to put 'debug' keyword
among kernel parameters, though I'm not sure about Arch Linux distribution.

Paste udev debug messages here. Look for lines containing pvscan but the
reason for such timeout may be completely lvm2 unrelated.

Also I'm slightly confused here. Do we speak about system w/o systemd
involved at all? Or do Arch Linux boot to initramfs w/o systemd and
switch to systemd after root being actually mounted?

And on last question, do you ship upstream udev rules or do you install
Arch Linux specific rules?
Christian Hesse
2015-05-26 12:05:34 UTC
Permalink
Post by Ondrej Kozina
Post by Christian Hesse
Actually running v2.02.220 with commit f8bf6410 on top. Volumes are
timeout, giving up waiting for workers to finish
This adds a boot delay of 30 seconds. Not perfectly sure if this is an
lvm2 issue though, chances are that this is caused by systemd v220
update. Any ideas?
Could you try to boot with udev in debug mode? I'm not sure if it's same
as with Fedora distribution. It should be enough to put 'debug' keyword
among kernel parameters, though I'm not sure about Arch Linux distribution.
Paste udev debug messages here. Look for lines containing pvscan but the
reason for such timeout may be completely lvm2 unrelated.
Also I'm slightly confused here. Do we speak about system w/o systemd
involved at all? Or do Arch Linux boot to initramfs w/o systemd and
switch to systemd after root being actually mounted?
And on last question, do you ship upstream udev rules or do you install
Arch Linux specific rules?
First of all... It's not related to lvm2.
Even a system without lvm2 hook in initramfs is effected.

Looks like this is an upstream udev issue:
https://bugs.freedesktop.org/show_bug.cgi?id=90051

I will not bother you with any details of my system. ;)

Thanks a lot anyway!
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Loading...