Discussion:
[linux-lvm] LVM2 snapshot weirdness
Nick Morrison
2013-11-15 11:13:23 UTC
Permalink
Hello list,

I subscribed a few minutes ago, and I've got a question already :-) I've done
some searching and research on my problem, but haven't had any success yet. So
I thought I'd pose it to this list, in the hope of finding some tips for
solving it.

I have an HP server running RHEL 5.7. It contains four hard disks, in two
hardware RAID-1 groups. The first RAID-1 contains the operating system and
utilities; the second is slated for compressed snapshot dumps of the first.
There's nothing particularly freaky about the configuration of the machine, as
far as I can tell.

I wrote a script which does the following:

# lvcreate -pr -L 2G -s -n LogVol00-snapshot /dev/VolGroup00/LogVol00
# dd bs=8k if=/dev/VolGroup00/LogVol00-snapshot | gzip -3 -c > /backup/LogVol00.img.gz

After this completes, I verify the dump by comparing the output of:

# md5sum /dev/VolGroup00/LogVol00-snapshot

with

# gunzip -c /backup/LogVol00.img.gz | md5sum

This method seems to have been repeatedly successful on four other servers of
similar hardware configuration. I am running the same script on all of them.
The other machines have a larger logical volume (500Gb vs this machine, 260G)
but the PVs are all the same (whole disk except the /boot partition). However,
on this machine, I got an md5sum mismatch.

Seeking the source of the weirdness, I ran a second dd from the same snapshot:

# dd bs=8k if=/dev/VolGroup00/LogVol00-snapshot | gzip -3 -c > /backup/LogVol00-again.img.gz

.. and md5sum gave me a third, different, result.

It seems as if the data read from the snapshot is changing.

Do I have a conceptual misunderstanding of what I'm doing? Can anyone suggest
things for me to look at, verify, or test? Am I missing something blindingly
obvious here? :-)

Any and all suggestions and comments welcomed!

Cheers and beers,
Nick
Nick Morrison
2013-12-02 12:27:20 UTC
Permalink
Post by Nick Morrison
.. and md5sum gave me a third, different, result.
It seems as if the data read from the snapshot is changing.
More tests. I created a snapshot using

# lvcreate -pr -L 2G -s -n test-snapshot /dev/VolGroup00/LogVol00


Then I ran:

# md5sum /dev/VolGroup00/test-snapshot

... three times. test-snapshot is a read-only snapshot of the / volume, which is about 265G.

One one server, each time, I got a different md5sum. Generally a bad result. However:

on the other server, I got the same md5sum result on the first and third runs, but a different result on the second. Curious.

I'm getting no interesting messages in dmesg during the reads.

Is it likely that we have a fault at a lower level than LVM's snapshot code here?


[***@server-01:/backup/test]# cat nohup.out
Checking md5sum of /dev/VolGroup00/test-dump multiple times.

Pass 1 started Fri Nov 29 11:46:56 CET 2013
103fa0cc0351836598866423fabcfb6b /dev/VolGroup00/test-dump

real 34m31.006s
user 15m15.252s
sys 6m48.961s

Pass 2 started Fri Nov 29 12:21:28 CET 2013
2d37fce8eea493eb4600f1593b9da825 /dev/VolGroup00/test-dump

real 34m24.818s
user 15m11.098s
sys 6m28.567s

Pass 3 started Fri Nov 29 12:55:52 CET 2013
4f6940a672115b896cea5220ba8bde7c /dev/VolGroup00/test-dump

real 35m13.079s
user 15m15.016s
sys 6m19.312s

Complete at Fri Nov 29 13:31:05 CET 2013
[***@server-01:/backup/test]#



Other server:


[***@server-02:/backup/test]# cat nohup.out
Checking md5sum of /dev/VolGroup00/test-dump multiple times.

Pass 1 started Fri Nov 29 11:47:04 CET 2013
de7f4847be8dd3019ee2a690f48c6397 /dev/VolGroup00/test-dump

real 29m34.334s
user 14m49.219s
sys 6m33.685s

Pass 2 started Fri Nov 29 12:16:38 CET 2013
363ee32c1bb5da6eb9754b0d6de79180 /dev/VolGroup00/test-dump

real 29m24.910s
user 14m40.705s
sys 6m44.806s

Pass 3 started Fri Nov 29 12:46:03 CET 2013
de7f4847be8dd3019ee2a690f48c6397 /dev/VolGroup00/test-dump

real 29m20.927s
user 14m40.441s
sys 6m46.865s

Complete at Fri Nov 29 13:15:24 CET 2013
[***@server-02:/backup/test]#

Loading...