[linux-lvm] The benefits of lvmlockd over clvmd?

Post by Eric Ren
Hi David,
Regarding the question of the subject, I can think of three main benefits of
- lvmlockd supports two cluster locking plugins: dlm and sanlock. sanlock
plugin can supports up to ~2000 nodes
that benefits LVM usage in big virtulizaton/storage cluster,

True, although it's never been tried anywhere near that many. The main
point hiding behind the big number is that hosts are pretty much unaware
of each other, so adding more doesn't have any affect, and when something
happens to one, others are unaffected because they are unaware.

Post by Eric Ren
while dlm plugin fits HA clsuter.
- lvmlockd has better design than clvmd. clvmd is command-line level based
locking system, which means the
whole LVM software will get hang if any LVM command gets dead-locking
issue. However, lvmlockd is *resources* based
cluster locking. The resources to protect is VG and LV so that the deadlock
issue will be isolated inside the resource and
operations on other VG/LV can still proceed.
- lvmlockd can work with lvmetad.
But, I may be wrong in some points. Could you please help correct me and
complete the benefit list?

To me the biggest benefit is the design and internal implementation, which
I admit don't make for great marketing. The design in general follows the
idea described above, in which hosts fundamentally operate unaware of
others and one host never has any effect on another. That's diametrically
opposite to the original clvm "single system image" design in which
everything that happens is in theory meant to be happening everywhere.

Eric Ren

2018-01-10 07:11:24 UTC

Hi David,

Thanks for your explanations!

The comments above is only talking about lvmlockd with sanlock, and it's
because the different protocols/algorithms used by them: sanlock with Paxos,
dlm with corosync, right?

Is this point roughly true?

Post by Eric Ren
- lvmlockd can work with lvmetad.
But, I may be wrong in some points. Could you please help correct me and
complete the benefit list?

Sorry, "the idea described above" by me?

Post by David Teigland
others and one host never has any effect on another. That's diametrically

For example, with clvmd the command "lvchange -ay VG/LV" will try to
activate the LV
on every hosts, but with lvmlockd, we need to perform "lvchange -asy" on
each host :)

Post by David Teigland
opposite to the original clvm "single system image" design in which
everything that happens is in theory meant to be happening everywhere.

Got it. Thanks again.

Regards,
Eric

Zdenek Kabelac

2018-01-10 09:36:53 UTC

Post by Eric Ren
Hi David,
Thanks for your explanations!

The comments above is only talking about lvmlockd with sanlock, and it's
because the different protocols/algorithms used by them: sanlock with Paxos,
dlm with corosync, right?

Is this point roughly true?

Post by Eric Ren
- lvmlockd can work with lvmetad.
But, I may be wrong in some points. Could you please help correct me and
complete the benefit list?

Sorry, "the idea described above" by me?

others and one host never has any effect on another. That's diametrically

For example, with clvmd the command "lvchange -ay VG/LV" will try to activate
the LV
on every hosts, but with lvmlockd, we need to perform "lvchange -asy" on each
host :)

There are couple fuzzy sentences - so lets try to make them more clear.

Default mode for 'clvmd' is to 'share' resource everywhere - which clearly
comes from original 'gfs' requirement and 'linear/striped' volume that can be
easily activated on many nodes.

However over the time - different use-cases got more priority so basically
every new dm target (except mirror) does NOT support shared storage (maybe
raid will one day...). So targets like snapshot, thin, cache, raid do
require 'so called' exclusive activation.

So here comes the difference - lvmlockd in its default goes with
'exclusive/local' activation and shared (old clvmd default) needs to be requested.

Another difference is - 'clvmd' world is 'automating' activation around the
whole cluster (so from node A it's possible to activate volume on node B
without ANY other command then 'lvchange).

With 'lvmlockd' mechanism - this was 'dropped' and it's users responsibility
to initiate i.e. ssh command with activation on another node(s) and resolve
error handling.

There are various pros&cons over each solution - both needs setups and while
'clvmd' world is 'set & done' lvmlockd world scripting needs to be born in
some way.

Also ATM 'lvmetad' can't be used even with lvmlockd - simply because we are
not (yet) capable to handle 'udev' around the cluster (and it's not clear we
ever will).

On the positive side - we are working hard to enhance 'scanning' speed - so in
majority of use-cases there is no real performance gain with lvmetad usage anyway.

Regards

Zdenek

Eric Ren

2018-01-10 14:42:08 UTC

Zdenek,

Thanks for helping make this more clear to me :)

Post by Zdenek Kabelac
There are couple fuzzy sentences - so lets try to make them more clear.
Default mode for 'clvmd' is to 'share' resource everywhere - which
clearly comes from original 'gfs' requirement and 'linear/striped'
volume that can be easily activated on many nodes.
However over the time - different use-cases got more priority so
basically every new dm target (except mirror) does NOT support shared
storage (maybe raid will one day...). So targets like snapshot,
thin, cache, raid do require 'so called' exclusive activation.

Good to know the history about clvmd :)

Post by Zdenek Kabelac
So here comes the difference - lvmlockd in its default goes with
'exclusive/local' activation and shared (old clvmd default) needs to be requested.
Another difference is - 'clvmd' world is 'automating' activation
around the whole cluster (so from node A it's possible to activate
volume on node B without ANY other command then 'lvchange).
With 'lvmlockd' mechanism - this was 'dropped' and it's users
responsibility to initiate i.e. ssh command with activation on another
node(s) and resolve error handling.
There are various pros&cons over each solution - both needs setups and
while 'clvmd' world is 'set & done' lvmlockd world scripting needs
to be born in some way.

True.

Post by Zdenek Kabelac
Also ATM 'lvmetad' can't be used even with lvmlockd - simply because
we are not (yet) capable to handle 'udev' around the cluster (and it's
not clear we ever will).

This sentence surprises me much. According to manpage of lvmlockd, it
seems clear that lvmlockd can work with lvmetad now.
IIRC, it's not the first time you mentioned about "cluster udev". It
gives me a impression that the currect udev system is not
100% reliable for shared disks in cluster, no matter if we use lvmetad
or not, right? If so, could you please give an example
scenario where lvmetad may not work well with lvmlockd?

Post by Zdenek Kabelac
On the positive side - we are working hard to enhance 'scanning' speed
- so in majority of use-cases there is no real performance gain with
lvmetad usage anyway.

Great! Thanks.

Regards,
Eric

Zdenek Kabelac

2018-01-10 15:35:56 UTC

Post by Eric Ren
Zdenek,
Thanks for helping make this more clear to me :)

Post by Zdenek Kabelac
There are couple fuzzy sentences - so lets try to make them more clear.
Default mode for 'clvmd' is to 'share' resource everywhere - which clearly
comes from original 'gfs' requirement and 'linear/striped' volume that can
be easily activated on many nodes.
However over the time - different use-cases got more priority so basically
every new dm target (except mirror) does NOT support shared storage (maybe
raid will one day...). So targets like snapshot, thin, cache, raid do
require 'so called' exclusive activation.

Good to know the history about clvmd :)

Post by Zdenek Kabelac
So here comes the difference - lvmlockd in its default goes with
'exclusive/local' activation and shared (old clvmd default) needs to be requested.
Another difference is - 'clvmd' world is 'automating' activation around
the whole cluster (so from node A it's possible to activate volume on node B
without ANY other command then 'lvchange).
With 'lvmlockd' mechanism - this was 'dropped' and it's users responsibility
to initiate i.e. ssh command with activation on another node(s) and resolve
error handling.
There are various pros&cons over each solution - both needs setups and while
'clvmd' world is 'set & done' lvmlockd world scripting needs to be born in
some way.

True.

Post by Zdenek Kabelac
Also ATM 'lvmetad' can't be used even with lvmlockd - simply because we are
not (yet) capable to handle 'udev' around the cluster (and it's not clear we
ever will).

This sentence surprises me much. According to manpage of lvmlockd, it seems
clear that lvmlockd can work with lvmetad now.
IIRC, it's not the first time you mentioned about "cluster udev". It gives me
a impression that the currect udev system is not
100% reliable for shared disks in cluster, no matter if we use lvmetad or not,
right? If so, could you please give an example
scenario where lvmetad may not work well with lvmlockd?

Hi

The world of udevd/systemd is complicated monster - which has no notation for
handling bad/duplicate/.... devices and so on.

Current design of lvmetad is not sufficient to live in ocean of bugs in this
category - so as said - ATM it's highly recommend to keep lvmetad off in clusters.

Regards

Zdenek

David Teigland

2018-01-10 17:25:29 UTC

Post by Zdenek Kabelac

Post by Zdenek Kabelac
Also ATM 'lvmetad' can't be used even with lvmlockd - simply
because we are not (yet) capable to handle 'udev' around the cluster
(and it's not clear we ever will).

This sentence surprises me much. According to manpage of lvmlockd, it

It surprises me too, since I developed lvmlockd and lvmetad features
precisely to make it work.

Post by Zdenek Kabelac

Post by Eric Ren
seems clear that lvmlockd can work with lvmetad now.
IIRC, it's not the first time you mentioned about "cluster udev". It
gives me a impression that the currect udev system is not
100% reliable for shared disks in cluster, no matter if we use lvmetad
or not, right? If so, could you please give an example
scenario where lvmetad may not work well with lvmlockd?

The world of udevd/systemd is complicated monster - which has no notation for
handling bad/duplicate/.... devices and so on.
Current design of lvmetad is not sufficient to live in ocean of bugs in this
category - so as said - ATM it's highly recommend to keep lvmetad off in clusters.

There are indeed plenty of problems with lvmetad, which is why I've been
trying to get us to get rid of lvmetad, and improved disk scanning to be
so much more efficient:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-new-scan-32

However, you have not pointed out any specific problems unique to lvmlockd
with lvmetad.

David Teigland

2018-01-10 16:45:30 UTC

Post by David Teigland
True, although it's never been tried anywhere near that many. The main
point hiding behind the big number is that hosts are pretty much unaware
of each other, so adding more doesn't have any affect, and when something
happens to one, others are unaffected because they are unaware.

The comments above is only talking about lvmlockd with sanlock, and it's
because the different protocols/algorithms used by them: sanlock with Paxos,
dlm with corosync, right?

right

Post by Eric Ren
- lvmlockd has better design than clvmd. clvmd is command-line level
based locking system, which means the whole LVM software will get
hang if any LVM command gets dead-locking issue. However, lvmlockd
is *resources* based cluster locking. The resources to protect is VG
and LV so that the deadlock issue will be isolated inside the
resource and operations on other VG/LV can still proceed.

Is this point roughly true?

It's vague enough to pass I suppose, but I don't think comparing them in
terms of locking is very helpful, because clvmd is not really a locking
system, it's a parallel execution system. Keep in mind we're generalizing
and largely discussing opinion here, so there are bound to be some valid
disagreements about this.

Using lvmlockd, lvm just acquires and releases simple locks around making
changes. There's no notion of "nodes", no concept of remote/local.
Either a command gets a lock or it doesn't, not much else can happen or go
wrong, it's pretty trivial.

clvmd is vastly different: "all nodes" and remote/local nodes is the
central controlling idea. Locks are not the main thing, the main thing is
running all commands in parallel everywhere. There are endless ways
things can go wrong when trying to run commands in parallel on all nodes.
What's more, there's no point to it, since nothing actually needs to run
in parallel everywhere.