[linux-lvm] Writable snapshot hang, high I/O wait, almost no disk I/O

Vitaliy Filippov

2014-01-15 22:19:29 UTC

Hi!

I've just tried to experiment with dm snapshot target to test a tool I've
recently written for ext4 filesystem (it resizes inode tables without
recreating the fs). Underlying device is /dev/sdc1 on a SATA hard drive,
COW device is a loop device connected to a sparse file located on an ext4
/home filesystem.

At first there was no problem, the read/write performance was good. But
then my tool started to move inode tables (~ 4.5 GB in total on a 3 TB
partition) and hanged in 'D' state - I observe it at the moment, it
already lasts for almost 2 hours :) 'sync' executed in another console
also hangs. There are messages in dmesg:

[654255.339515] INFO: task sync:28318 blocked for more than 120 seconds.
[654255.339522] Tainted: G W 3.12-trunk-amd64 #1
[654255.339525] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[654255.339528] sync D ffff8800a7de8b00 0 28318 27636
0x20020000
[654255.339534] ffff8800a7de87c0 0000000000000082 00000000000142c0
ffff88008c3e3fd8
[654255.339539] 00000000000142c0 ffff88008c3e3fd8 ffff88011fc14b00
ffff88011ffce908
[654255.339543] 0000000000000002 ffffffff8110f530 ffff88008c3e3e20
000000002baa1371
[654255.339548] Call Trace:
[654255.339558] [<ffffffff8110f530>] ? wait_on_page_read+0x60/0x60
[654255.339564] [<ffffffff81489194>] ? io_schedule+0x94/0x120
[654255.339568] [<ffffffff8110f535>] ? sleep_on_page+0x5/0x10
[654255.339574] [<ffffffff814871d4>] ? __wait_on_bit+0x54/0x80
[654255.339578] [<ffffffff8110f34f>] ? wait_on_page_bit+0x7f/0x90
[654255.339583] [<ffffffff8107a440>] ? wake_atomic_t_function+0x30/0x30
[654255.339589] [<ffffffff8111bc28>] ? pagevec_lookup_tag+0x18/0x20
[654255.339593] [<ffffffff8110f438>] ? filemap_fdatawait_range+0xd8/0x150
[654255.339598] [<ffffffff8119bf30>] ? sync_inodes_one_sb+0x20/0x20
[654255.339603] [<ffffffff811a62ce>] ? iterate_bdevs+0xbe/0xf0
[654255.339607] [<ffffffff8119c199>] ? sys_sync+0x69/0x90

Interesting there are no such messages about my tool, although it's still
in D state and I can't interrupt it using any signal.

At the same time the system overall is still responsive, and it's clearly
seen that some work is being done - the place occupied by my sparse file
which is the snapshot device grows very slowly - something about 1-2
megabytes in a minute. There's massive I/O wait (~60% cpu), NO user (0%)
and NO system (0%) CPU usage, and almost NO real disk input/output -
'iostat 1' shows mostly zeroes, and only sometimes it shows some small
values 'written' on a device with that sparse file.

'slabtop' shows very big, and roughly equal amounts of bio-0, bio-1, dm_io
and buffer_head objects. kmalloc-32, dm_snap_pending_exception and
kcopyd_job are less, but also close to 100%. Most of these numbers change
slightly over time, except for kcopyd_job that is stuck at 77490. It looks
like the following:

1311206 634877 48% 0,10K 35438 37 141752K buffer_head
621400 621304 99% 0,19K 31070 20 124280K bio-0
621184 621080 99% 0,04K 6752 92 27008K dm_io
621090 621080 99% 0,25K 41406 15 165624K bio-1
88816 86981 97% 0,03K 793 112 3172K kmalloc-32
77959 77889 99% 0,10K 2107 37 8428K
dm_snap_pending_exception
77490 77490 100% 3,23K 38745 2 309960K kcopyd_job

(full /proc/slabinfo is attached)

'dmsetup status' prints the following and the output does not change over
time:
sdc1ovl: 0 5860531087 snapshot 806488/5860531087 728

I've googled for lvm snapshot performance problems and there are many
threads about it, but as I understand something similar to my case was
only in a 2005 thread here:
http://www.redhat.com/archives/dm-devel/2005-June/msg00126.html
And also maybe in another thread here:
http://www.redhat.com/archives/dm-devel/2006-September/msg00078.html

What do you think this problem could be? Thanks in advance to anyone who
answers :-)

--
With best regards,
Vitaliy Filippov