Joel Friedly
2015-02-19 22:19:33 UTC
I'm trying to some disk failure testing, and we're using LVM on top of raw
disks. After replacing the disk, the LV on the disk is unreadable until I
run vgmknodes --refresh. That command hangs forever, but I can kill -9
it. After running the command, everything works again and I can read the
LV.
I've seen this twice, so I ran strace the second time and you can find the
output here: https://gist.github.com/jfriedly/50fe9134c4bc616f9f90 and
Ctrl-F for "425989"
On line 4250, LVM sets the semaphore's value to 1, then it immediately
checks the semaphore's value and confirms that it's 1.
On line 4253, LVM increments the semaphore's value to 2, then it
immediately checks the semaphore's value and confirms that it's 2.
On line 4295, LVM gets the semaphore's value and sees that it's 2, then it
immediately decrements the value to 1 and then waits indefinitely for the
value to hit 0.
Is LVM expecting some other process to decrement the semaphore? Is this a
bug in vgmknodes --refresh? Running without the refresh flag doesn't block
forever, but it also doesn't make the LV readable.
System Info:
Ubuntu 12.04
Kernel 3.13.0-39-generic
LVM 2.02.95-4ubuntu1
The disk is part of a VG named "vg.nebula.alexandria" and it is dedicated
to an LV called "alexandria.tlog".
Thanks for your help guys, and let me know if you need any more debugging
info,
Joel
disks. After replacing the disk, the LV on the disk is unreadable until I
run vgmknodes --refresh. That command hangs forever, but I can kill -9
it. After running the command, everything works again and I can read the
LV.
I've seen this twice, so I ran strace the second time and you can find the
output here: https://gist.github.com/jfriedly/50fe9134c4bc616f9f90 and
Ctrl-F for "425989"
On line 4250, LVM sets the semaphore's value to 1, then it immediately
checks the semaphore's value and confirms that it's 1.
On line 4253, LVM increments the semaphore's value to 2, then it
immediately checks the semaphore's value and confirms that it's 2.
On line 4295, LVM gets the semaphore's value and sees that it's 2, then it
immediately decrements the value to 1 and then waits indefinitely for the
value to hit 0.
Is LVM expecting some other process to decrement the semaphore? Is this a
bug in vgmknodes --refresh? Running without the refresh flag doesn't block
forever, but it also doesn't make the LV readable.
System Info:
Ubuntu 12.04
Kernel 3.13.0-39-generic
LVM 2.02.95-4ubuntu1
The disk is part of a VG named "vg.nebula.alexandria" and it is dedicated
to an LV called "alexandria.tlog".
Thanks for your help guys, and let me know if you need any more debugging
info,
Joel