Discussion:
[linux-lvm] vgmknodes --refresh blocks forever waiting on a semaphore
Joel Friedly
2015-02-19 22:19:33 UTC
Permalink
I'm trying to some disk failure testing, and we're using LVM on top of raw
disks. After replacing the disk, the LV on the disk is unreadable until I
run vgmknodes --refresh. That command hangs forever, but I can kill -9
it. After running the command, everything works again and I can read the
LV.

I've seen this twice, so I ran strace the second time and you can find the
output here: https://gist.github.com/jfriedly/50fe9134c4bc616f9f90 and
Ctrl-F for "425989"

On line 4250, LVM sets the semaphore's value to 1, then it immediately
checks the semaphore's value and confirms that it's 1.

On line 4253, LVM increments the semaphore's value to 2, then it
immediately checks the semaphore's value and confirms that it's 2.

On line 4295, LVM gets the semaphore's value and sees that it's 2, then it
immediately decrements the value to 1 and then waits indefinitely for the
value to hit 0.


Is LVM expecting some other process to decrement the semaphore? Is this a
bug in vgmknodes --refresh? Running without the refresh flag doesn't block
forever, but it also doesn't make the LV readable.



System Info:

Ubuntu 12.04
Kernel 3.13.0-39-generic
LVM 2.02.95-4ubuntu1
The disk is part of a VG named "vg.nebula.alexandria" and it is dedicated
to an LV called "alexandria.tlog".


Thanks for your help guys, and let me know if you need any more debugging
info,
Joel
Peter Rajnoha
2015-02-20 08:32:12 UTC
Permalink
Post by Joel Friedly
I'm trying to some disk failure testing, and we're using LVM on top of
raw disks. After replacing the disk, the LV on the disk is unreadable
until I run vgmknodes --refresh. That command hangs forever, but I can
kill -9 it. After running the command, everything works again and I can
read the LV.
I've seen this twice, so I ran strace the second time and you can find
the output here: https://gist.github.com/jfriedly/50fe9134c4bc616f9f90
and Ctrl-F for "425989"
On line 4250, LVM sets the semaphore's value to 1, then it immediately
checks the semaphore's value and confirms that it's 1.
On line 4253, LVM increments the semaphore's value to 2, then it
immediately checks the semaphore's value and confirms that it's 2.
On line 4295, LVM gets the semaphore's value and sees that it's 2, then
it immediately decrements the value to 1 and then waits indefinitely for
the value to hit 0.
Is LVM expecting some other process to decrement the semaphore?
Yes, it's expecting notification from udev rule - the 95-dm-notify.rules
(usually found in /lib/udev/rules.d directory) - that one contains
"dmsetup udevcomplete" call which decrements the semaphore. The --refresh
causes CHANGE udev events to be generated for the LV and LVM waits for
any udev processing to be finished before it continues further.

What Linux distribution is this?

Can you attach (or just send me directly) the output of "vgmknodes --refresh -vvvv"
for more debug info. Also, try running "udevadm monitor --udev --env" just
before executing the vgmknodes command and save the output as well.

Also, please consider filing a bug report for the distribution this
problem is seen in. It's better to track the problem this way as you
can share the debug output and communicate the problem directly with
maintainers of LVM in that distribution - the environment used may
differ in various distributions.
--
Peter
Peter Rajnoha
2015-02-20 08:35:58 UTC
Permalink
Post by Peter Rajnoha
Post by Joel Friedly
I'm trying to some disk failure testing, and we're using LVM on top of
raw disks. After replacing the disk, the LV on the disk is unreadable
until I run vgmknodes --refresh. That command hangs forever, but I can
kill -9 it. After running the command, everything works again and I can
read the LV.
I've seen this twice, so I ran strace the second time and you can find
the output here: https://gist.github.com/jfriedly/50fe9134c4bc616f9f90
and Ctrl-F for "425989"
On line 4250, LVM sets the semaphore's value to 1, then it immediately
checks the semaphore's value and confirms that it's 1.
On line 4253, LVM increments the semaphore's value to 2, then it
immediately checks the semaphore's value and confirms that it's 2.
On line 4295, LVM gets the semaphore's value and sees that it's 2, then
it immediately decrements the value to 1 and then waits indefinitely for
the value to hit 0.
Is LVM expecting some other process to decrement the semaphore?
Yes, it's expecting notification from udev rule - the 95-dm-notify.rules
(usually found in /lib/udev/rules.d directory) - that one contains
"dmsetup udevcomplete" call which decrements the semaphore. The --refresh
causes CHANGE udev events to be generated for the LV and LVM waits for
any udev processing to be finished before it continues further.
What Linux distribution is this?
(Sorry, you've added that info already, I've noticed - it's Ubuntu.)
--
Peter
Loading...