2.6.27-rc6: lockdep warning: iprune_mutex at shrink_icache_memory+0x38/0x1a8

Previous thread: [RFC v5][PATCH 0/9] Kernel based checkpoint/restart by Oren Laadan on Saturday, September 13, 2008 - 4:05 pm. (44 messages)

Next thread: Re: + utsname-completely-overwrite-prior-information.patch added to -mm tree by Serge E. Hallyn on Saturday, September 13, 2008 - 6:40 pm. (1 message)
From: Alexander Beregalov
Date: Saturday, September 13, 2008 - 4:31 pm

Hi

[ INFO: possible circular locking dependency detected ]
2.6.27-rc6-00034-gd1c6d2e #3
-------------------------------------------------------
nfsd/1766 is trying to acquire lock:
 (iprune_mutex){--..}, at: [<c01743fb>] shrink_icache_memory+0x38/0x1a8

 but task is already holding lock:
  (&(&ip->i_iolock)->mr_lock){----}, at: [<c021134f>]
  xfs_ilock+0xa2/0xd6


I read files through nfs and saw delay for few seconds.
System is x86_32, nfs, xfs.
The last working kernel is 2.6.27-rc5,
I do not know yet is it reproducible or not.



the existing dependency chain (in reverse order) is:

-> #1 (&(&ip->i_iolock)->mr_lock){----}:
       [<c0137b3f>] __lock_acquire+0x970/0xae8
       [<c0137d12>] lock_acquire+0x5b/0x77
       [<c012e803>] down_write_nested+0x35/0x6c
       [<c0211328>] xfs_ilock+0x7b/0xd6
       [<c02114a1>] xfs_ireclaim+0x1d/0x59
       [<c022e056>] xfs_finish_reclaim+0x12a/0x134
       [<c022e1d8>] xfs_reclaim+0xbc/0x125
       [<c023aba9>] xfs_fs_clear_inode+0x55/0x8e
       [<c01742aa>] clear_inode+0x7a/0xc9
       [<c0174335>] dispose_list+0x3c/0xca
       [<c017453e>] shrink_icache_memory+0x17b/0x1a8
       [<c014e5be>] shrink_slab+0xd3/0x12e
       [<c014e8e4>] kswapd+0x2cb/0x3ac
       [<c012b404>] kthread+0x39/0x5e
       [<c0103933>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (iprune_mutex){--..}:
       [<c0137a14>] __lock_acquire+0x845/0xae8
       [<c0137d12>] lock_acquire+0x5b/0x77
       [<c037a03e>] __mutex_lock_common+0xa0/0x2d0
       [<c037a2f7>] mutex_lock_nested+0x29/0x31
       [<c01743fb>] shrink_icache_memory+0x38/0x1a8
       [<c014e5be>] shrink_slab+0xd3/0x12e
       [<c014eded>] try_to_free_pages+0x1cf/0x287
       [<c014a665>] __alloc_pages_internal+0x257/0x3c6
       [<c014be50>] __do_page_cache_readahead+0xb7/0x16f
       [<c014c141>] ondemand_readahead+0x115/0x123
       [<c014c1c6>] page_cache_sync_readahead+0x16/0x1c
       [<c017e7be>] __generic_file_splice_read+0xe0/0x3f7
       ...
From: Dave Chinner
Date: Monday, September 15, 2008 - 7:52 pm

<sigh>

We need a FAQ for this one. It's a false positive.  Google for an
explanation - I've explained it 4 or 5 times in the past year and
asked that the lockdep folk invent a special annotation for the
iprune_mutex (or memory reclaim) because of the way it can cause
recursion into the filesystem and hence invert lock orders without
causing deadlocks.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Grant Coady
Date: Monday, September 15, 2008 - 9:31 pm

Yeah, but a 30 second dreadlock?  It's a long wait wondering what's 
gone down or not ;)

Grant.
--

From: Dave Chinner
Date: Tuesday, September 16, 2008 - 12:03 am

The delay will be probably due to how slow the system can be when it
runs out of memory, not from the lockdep report.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Alexander Beregalov
Date: Tuesday, September 16, 2008 - 12:35 am

Hi Dave

Yes, you already explained a similar message to me, but it was a bug,
not false positive.
http://lkml.org/lkml/2008/7/3/29
http://lkml.org/lkml/2008/7/3/315

I will try to bisect.
It is not a OOM case.
--

From: Alexander Beregalov
Date: Wednesday, September 17, 2008 - 11:33 am

I can not reproduce it.
--

Previous thread: [RFC v5][PATCH 0/9] Kernel based checkpoint/restart by Oren Laadan on Saturday, September 13, 2008 - 4:05 pm. (44 messages)

Next thread: Re: + utsname-completely-overwrite-prior-information.patch added to -mm tree by Serge E. Hallyn on Saturday, September 13, 2008 - 6:40 pm. (1 message)