Mark Mielke
2017-08-13 12:05:52 UTC
I searched around for this a bit, and although other users may have hit
this, I didn't find a good explanation offered. I suspect the users clean
it up manually and then it disappears for another 2 years. I hope this
message will get captured by Google, and help somebody else out. Also, I
hope to have some discussion about this as it seems like an easily
preventable problem.
The archive file names are generated like:
if (dm_snprintf(archive_name, sizeof(archive_name),
"%s/%s_%05u-%d.vg",
dir, vg->name, ix, rnum) < 0) {
The directory scanning code that loads the archive file names into memory
recognizes a problem, although it isn't explicit about what the problem is:
/* Sort fails beyond 5-digit indexes */
if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
log_error("Couldn't scan the archive directory (%s).", dir);
return 0;
}
The file names encode the index like "00000". The sorting code uses
"alphasort", which will only work properly as long as the index stays
within 5 digits. As soon as it exceeds 5 digits, it begins to sort the
"100000" to the beginning, and "99999" to the end. Then, new archives seems
to *all* be "100000". We had some 40,000 indexes with "100000" before we
noticed. And, because the index is followed by a random number, it would
only expire a few of the "100000" before it would hit one that was younger
than the 30 days retention period set by default. When I reduced the
retention period to 7 days, it expired only about 12 archive files of
40,000 archive files. This behaviour is probably due to random number
distribution ensuring that there are always some recent records near 0?
This issue eventually affects everyone, although obviously the people that
use features like snapshots more frequently (we use it every 15 minutes,
across multiple volumes) will hit it sooner,
There are a few fixes possible... Probably, "alphasort" should not be used
at all, but a context aware sort should be used, that can filter and sort
as it goes, decoding the index correctly as a number, and comparing it as a
number. Then, if performance is desirable, and scalability, it would be
ideal if it did it in a single pass, and buffering only the minimum needed
to expire the correct archive files.
We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I
was surprised that it still exists on "master". "git blame" says this has
been an issue since 2002:
5be981bab5 (Alasdair Kergon 2002-05-07 12:47:11 +0000 139) /* Sort
fails beyond 5-digit indexes */
59d6420b9a (Joe Thornber 2002-02-08 11:58:18 +0000 140) if ((count
= scandir(dir, &dirent, NULL, alphasort)) < 0) {
b8f47d5f69 (Alasdair Kergon 2009-07-15 20:02:46 +0000 141)
log_error("Couldn't scan the archive directory (%s).", dir);
952d12a5f5 (Alasdair Kergon 2002-01-09 19:16:48 +0000 142)
return 0;
952d12a5f5 (Alasdair Kergon 2002-01-09 19:16:48 +0000 143) }
Ouch... :-)
For anybody that does hit this.... Prune the archive files with index <
100000 is effective. It starts counting from 100000, and you now have 9X
more life before it will happen again... :-)
this, I didn't find a good explanation offered. I suspect the users clean
it up manually and then it disappears for another 2 years. I hope this
message will get captured by Google, and help somebody else out. Also, I
hope to have some discussion about this as it seems like an easily
preventable problem.
The archive file names are generated like:
if (dm_snprintf(archive_name, sizeof(archive_name),
"%s/%s_%05u-%d.vg",
dir, vg->name, ix, rnum) < 0) {
The directory scanning code that loads the archive file names into memory
recognizes a problem, although it isn't explicit about what the problem is:
/* Sort fails beyond 5-digit indexes */
if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
log_error("Couldn't scan the archive directory (%s).", dir);
return 0;
}
The file names encode the index like "00000". The sorting code uses
"alphasort", which will only work properly as long as the index stays
within 5 digits. As soon as it exceeds 5 digits, it begins to sort the
"100000" to the beginning, and "99999" to the end. Then, new archives seems
to *all* be "100000". We had some 40,000 indexes with "100000" before we
noticed. And, because the index is followed by a random number, it would
only expire a few of the "100000" before it would hit one that was younger
than the 30 days retention period set by default. When I reduced the
retention period to 7 days, it expired only about 12 archive files of
40,000 archive files. This behaviour is probably due to random number
distribution ensuring that there are always some recent records near 0?
This issue eventually affects everyone, although obviously the people that
use features like snapshots more frequently (we use it every 15 minutes,
across multiple volumes) will hit it sooner,
There are a few fixes possible... Probably, "alphasort" should not be used
at all, but a context aware sort should be used, that can filter and sort
as it goes, decoding the index correctly as a number, and comparing it as a
number. Then, if performance is desirable, and scalability, it would be
ideal if it did it in a single pass, and buffering only the minimum needed
to expire the correct archive files.
We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I
was surprised that it still exists on "master". "git blame" says this has
been an issue since 2002:
5be981bab5 (Alasdair Kergon 2002-05-07 12:47:11 +0000 139) /* Sort
fails beyond 5-digit indexes */
59d6420b9a (Joe Thornber 2002-02-08 11:58:18 +0000 140) if ((count
= scandir(dir, &dirent, NULL, alphasort)) < 0) {
b8f47d5f69 (Alasdair Kergon 2009-07-15 20:02:46 +0000 141)
log_error("Couldn't scan the archive directory (%s).", dir);
952d12a5f5 (Alasdair Kergon 2002-01-09 19:16:48 +0000 142)
return 0;
952d12a5f5 (Alasdair Kergon 2002-01-09 19:16:48 +0000 143) }
Ouch... :-)
For anybody that does hit this.... Prune the archive files with index <
100000 is effective. It starts counting from 100000, and you now have 9X
more life before it will happen again... :-)
--
Mark Mielke <***@gmail.com>
Mark Mielke <***@gmail.com>