Discussion:
[linux-lvm] lvmlockd: about the limitation on lvresizing the LV active on multiple nodes
Eric Ren
2017-12-28 10:42:07 UTC
Permalink
Hi David,

I see there is a limitation on lvesizing the LV active on multiple node.
"""
limitations of lockd VGs
...
* resizing an LV that is active in the shared mode on multiple hosts
"""

It seems a big limitation to use lvmlockd in cluster:

"""
c1-n1:~ # lvresize -L-1G vg1/lv1
  WARNING: Reducing active logical volume to 1.00 GiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce vg1/lv1? [y/n]: y
  LV is already locked with incompatible mode: vg1/lv1
"""

Node "c1-n1" is the last node having vg1/lv1 active on it.
Can we change the lock mode from "shared" to "exclusive" to
lvresize without having to deactivate the LV on the last node?

It will reduce the availability if we have to deactivate LV on all
nodes to resize. Is there plan to eliminate this limitation in the
near future?

Regards,
Eric
Eric Ren
2018-01-02 08:09:08 UTC
Permalink
Hi David,

I see the comments on res_process():

"""
/*
 * Go through queued actions, and make lock/unlock calls on the resource
 * based on the actions and the existing lock state.
 *
 * All lock operations sent to the lock manager are non-blocking.
 * This is because sanlock does not support lock queueing.
 * Eventually we could enhance this to take advantage of lock
 * queueing when available (i.e. for the dlm).
"""

Is it the reason why lvmlockd has limitation on lvresize with "sh" lock
because lvmlockd cannot up convert "sh" to "ex" to perform lvresize command?

Regards,
Eric
Post by Eric Ren
Hi David,
I see there is a limitation on lvesizing the LV active on multiple node.
"""
limitations of lockd VGs
...
* resizing an LV that is active in the shared mode on multiple hosts
"""
"""
c1-n1:~ # lvresize -L-1G vg1/lv1
  WARNING: Reducing active logical volume to 1.00 GiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce vg1/lv1? [y/n]: y
  LV is already locked with incompatible mode: vg1/lv1
"""
Node "c1-n1" is the last node having vg1/lv1 active on it.
Can we change the lock mode from "shared" to "exclusive" to
lvresize without having to deactivate the LV on the last node?
It will reduce the availability if we have to deactivate LV on all
nodes to resize. Is there plan to eliminate this limitation in the
near future?
Regards,
Eric
_______________________________________________
linux-lvm mailing list
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
David Teigland
2018-01-02 17:10:34 UTC
Permalink
Post by Eric Ren
* resizing an LV that is active in the shared mode on multiple hosts
Only in the case where the LV is active on multiple hosts at once,
i.e. a cluster fs, which is less common than a local fs.

In the general case, it's not safe to assume that an LV can be modified by
one node while it's being used by others, even when all of them hold
shared locks on the LV. You'd want to prevent that in general.
Exceptions exist, but whether an exception is ok will likely depend on
what the specific change is, what application is using the LV, whether
that application can tolerate such a change.

One (perhaps the only?) valid exception I know about is extending an LV
while it's being used under a cluster fs (any cluster fs?)

(In reference to your later email, this is not related to lock queueing,
but rather to basic ex/sh lock incompatibility, and when/how to allow
exceptions to that.)

The simplest approach I can think of to allow lvextend under a cluster fs
would be a procedure like:

1. one one node: lvextend --lockopt skip -L+1G VG/LV

That option doesn't exist, but illustrates the point that some new
option could be used to skip the incompatible LV locking in lvmlockd.

2. on each node: lvchange --refresh VG/LV

This updates dm on each node with the new device size.

3. gfs2_grow VG/LV or equivalent

At this point the fs on any node can begin accessing the new space.
Eric Ren
2018-01-03 03:52:34 UTC
Permalink
Hello David,

Happy new year!
Post by David Teigland
Post by Eric Ren
* resizing an LV that is active in the shared mode on multiple hosts
Only in the case where the LV is active on multiple hosts at once,
i.e. a cluster fs, which is less common than a local fs.
In the general case, it's not safe to assume that an LV can be modified by
one node while it's being used by others, even when all of them hold
shared locks on the LV. You'd want to prevent that in general.
Exceptions exist, but whether an exception is ok will likely depend on
what the specific change is, what application is using the LV, whether
that application can tolerate such a change.
One (perhaps the only?) valid exception I know about is extending an LV
while it's being used under a cluster fs (any cluster fs?)
The only concrete scenario I can think of is also cluster fs, like OCFS2,
tunefs.ocfs2 can enlarge the FS to use all the device space online.
Post by David Teigland
(In reference to your later email, this is not related to lock queueing,
but rather to basic ex/sh lock incompatibility, and when/how to allow
exceptions to that.)
I thought the procedures to allow lvresize is like below if the LV is
used by cluster FS:

Assume the LV is active with "sh" lock on multiple nodes (node1 and node2),
and we  lvextend on node1:

- node1:  the "sh" lock on r1 (the LV resource) needs to up convert:
"sh" -> "ex";
- node2: on behalf of the BAST, the "sh" lock on r1needs to down
convert: "sh" -> "nl",
  which means the LV should be suspended;
- node1: on receiving AST (get "ex" lock), lvresize is allowed;

After the completion of lvresize,  the original lock state should be
restored on every node,
meanwhile the latest metadata can be refreshed, maybe like below:

- node1: restore the original lock mode, "ex" -> "sh", the metadata
version will be increased,
  so that request to update metadata can be sent to other nodes;
- node2: on receiving request, "nl" -> "sh", then to refresh the
metadata from disk;
Post by David Teigland
The simplest approach I can think of to allow lvextend under a cluster fs
If there is a simple approach, I think it maybe worth a try.
Post by David Teigland
1. one one node: lvextend --lockopt skip -L+1G VG/LV
That option doesn't exist, but illustrates the point that some new
option could be used to skip the incompatible LV locking in lvmlockd.
Hmm, is it safe to just skip the locking while the LV is active on other
node?
Is there somewhere in the code to avoid concurrent lvm command to execute
at the same time?
Post by David Teigland
2. on each node: lvchange --refresh VG/LV
This updates dm on each node with the new device size.
3. gfs2_grow VG/LV or equivalent
At this point the fs on any node can begin accessing the new space.
It would be great.

Regards,
Eric
David Teigland
2018-01-03 15:07:13 UTC
Permalink
Post by Eric Ren
Post by David Teigland
1. one one node: lvextend --lockopt skip -L+1G VG/LV
That option doesn't exist, but illustrates the point that some new
option could be used to skip the incompatible LV locking in lvmlockd.
Hmm, is it safe to just skip the locking while the LV is active on other
node?
Is there somewhere in the code to avoid concurrent lvm command to execute
at the same time?
The VG lock is still used to protect the VG metadata change. The LV lock
doesn't protect anything per se, it just represents that lvchange has
activated the LV on this host. (The LV lock does not represent the
suspended/resumed state of the dm device either, as you suggested above.)

I'll send a simple patch to skip the lv lock to try this.
Dave
Eric Ren
2018-01-04 09:06:10 UTC
Permalink
David,
Post by David Teigland
Post by Eric Ren
Post by David Teigland
1. one one node: lvextend --lockopt skip -L+1G VG/LV
That option doesn't exist, but illustrates the point that some new
option could be used to skip the incompatible LV locking in lvmlockd.
Hmm, is it safe to just skip the locking while the LV is active on other
node?
Is there somewhere in the code to avoid concurrent lvm command to execute
at the same time?
The VG lock is still used to protect the VG metadata change. The LV lock
doesn't protect anything per se, it just represents that lvchange has
activated the LV on this host. (The LV lock does not represent the
suspended/resumed state of the dm device either, as you suggested above.)
I see, thanks for you explanation!
Post by David Teigland
I'll send a simple patch to skip the lv lock to try this.
I've tested your patch and it works very well.  Thanks very much.

Regards,
Eric
Eric Ren
2018-01-09 02:42:27 UTC
Permalink
Hi David,
Post by Eric Ren
David,
Post by Eric Ren
Post by David Teigland
1. one one node: lvextend --lockopt skip -L+1G VG/LV
     That option doesn't exist, but illustrates the point that some
new
     option could be used to skip the incompatible LV locking in
lvmlockd.
Hmm, is it safe to just skip the locking while the LV is active on other
node?
Is there somewhere in the code to avoid concurrent lvm command to execute
at the same time?
The VG lock is still used to protect the VG metadata change. The LV lock
doesn't protect anything per se, it just represents that lvchange has
activated the LV on this host.  (The LV lock does not represent the
suspended/resumed state of the dm device either, as you suggested above.)
I see, thanks for you explanation!
I'll send a simple patch to skip the lv lock to try this.
I've tested your patch and it works very well.  Thanks very much.
Could you please consider to push this patch upstream? Also, Is this the
same case
for pvmove as lvresize? If so, can we also work out a similar patch for
pvmove?

Regards,
Eric
David Teigland
2018-01-09 15:42:39 UTC
Permalink
Post by Eric Ren
Post by Eric Ren
I've tested your patch and it works very well.  Thanks very much.
Could you please consider to push this patch upstream?
OK
Post by Eric Ren
Also, Is this the same case for pvmove as lvresize? If so, can we also
work out a similar patch for pvmove?
Running pvmove on an LV active on multiple hosts could be allowed with the
same kind of patch. However, it would need to use cmirror which we are
trying to phase out; the recent cluster raid1 has a more promising future.
So I think cmirror should be left in the clvm era and not brought forward.

Dave
Eric Ren
2018-01-10 06:55:42 UTC
Permalink
Hi David,
Post by David Teigland
Post by Eric Ren
Post by Eric Ren
I've tested your patch and it works very well.  Thanks very much.
Could you please consider to push this patch upstream?
OK
Thanks very  much! So, can we update the `man 8 lvmlockd` to remove the
below limitation
on lvresize?

"""
limitations of lockd VGs
...
  · resizing an LV that is active in the shared mode on multiple hosts
"""
Post by David Teigland
Post by Eric Ren
Also, Is this the same case for pvmove as lvresize? If so, can we also
work out a similar patch for pvmove?
Running pvmove on an LV active on multiple hosts could be allowed with the
same kind of patch. However, it would need to use cmirror which we are
OK, I see.
Post by David Teigland
trying to phase out; the recent cluster raid1 has a more promising future.
My understanding is:

if cluster raid1 is used as PV, data is replicated and data migration is
nearly equivalent
to replace disk. However, in scenario PV is on raw disk, pvmove is very
handy for data migration.

IIRC, you mean we can consider to use cluster raid1 as the underlaying
DM target to support pvmove
used in cluster, since currect pvmove is using mirror target now?
Post by David Teigland
So I think cmirror should be left in the clvm era and not brought forward.
By the way, another thing I'd to ask about:   Do we really want to drop
the concept of clvm?

From my understanding, lvmlockd is going to replace only "clvmd"
daemon, not clvm in exact.
clvm is apparently short for cluster/cluster-aware LVM, which is
intuitive naming. I see clvm
as an abstract concept, which is consisted of two pieces: clvmd and
cmirrord. IMHO, I'd like to
see the clvm concept remains, no matter what we deal with the clvmd and
cmirrord. It might
be good for user or documentation to digest the change :)

Regards,
Eric
David Teigland
2018-01-10 15:56:59 UTC
Permalink
Post by Eric Ren
if cluster raid1 is used as PV, data is replicated and data migration is
nearly equivalent
to replace disk. However, in scenario PV is on raw disk, pvmove is very
handy for data migration.
IIRC, you mean we can consider to use cluster raid1 as the underlaying DM
target to support pvmove
used in cluster, since currect pvmove is using mirror target now?
That's what I imagined could be done, but I've not thought about it in
detail. IMO pvmove under a shared LV is too complicated and not worth
doing.
Post by Eric Ren
By the way, another thing I'd to ask about:   Do we really want to drop
the concept of clvm?
From my understanding, lvmlockd is going to replace only "clvmd" daemon,
not clvm in exact. clvm is apparently short for cluster/cluster-aware
LVM, which is intuitive naming. I see clvm as an abstract concept, which
is consisted of two pieces: clvmd and cmirrord. IMHO, I'd like to see
the clvm concept remains, no matter what we deal with the clvmd and
cmirrord. It might be good for user or documentation to digest the
change :)
Thank you for pointing out the artifice in naming here, it has long
irritated me too. There is indeed no such thing as "clvm" or "HA LVM",
and I think we'd be better off to ban these terms completely, at least at
the technical level. (Historically, I suspect sales/marketing had a role
in this mess by wanting to attach a name to something to sell.)

If the term "clvm" survives, it will become even worse IMO if we expand it
to cover cases not using "clvmd". To me it's all just "lvm", and I don't
see why we need any other names.
Eric Ren
2018-01-11 09:32:23 UTC
Permalink
Hi David,
Post by David Teigland
Post by Eric Ren
IIRC, you mean we can consider to use cluster raid1 as the underlaying DM
target to support pvmove
used in cluster, since currect pvmove is using mirror target now?
That's what I imagined could be done, but I've not thought about it in
detail. IMO pvmove under a shared LV is too complicated and not worth
doing.
Very true.
Post by David Teigland
Post by Eric Ren
By the way, another thing I'd to ask about:   Do we really want to drop
the concept of clvm?
From my understanding, lvmlockd is going to replace only "clvmd" daemon,
not clvm in exact. clvm is apparently short for cluster/cluster-aware
LVM, which is intuitive naming. I see clvm as an abstract concept, which
is consisted of two pieces: clvmd and cmirrord. IMHO, I'd like to see
the clvm concept remains, no matter what we deal with the clvmd and
cmirrord. It might be good for user or documentation to digest the
change :)
Thank you for pointing out the artifice in naming here, it has long
irritated me too. There is indeed no such thing as "clvm" or "HA LVM",
and I think we'd be better off to ban these terms completely, at least at
the technical level. (Historically, I suspect sales/marketing had a role
in this mess by wanting to attach a name to something to sell.)
Hha, like cluster MD raid.
Post by David Teigland
If the term "clvm" survives, it will become even worse IMO if we expand it
to cover cases not using "clvmd". To me it's all just "lvm", and I don't
see why we need any other names.
It looks like people need a simple naming to distinguish the usage scenario:
local and cluster.

Thanks,
Eric

Loading...