Re: [BUG] Lockless patches cause hardlock under heavy IO

Previous thread: [PATCH -next] lkdtm: fix for CONFIG_SCSI=n by Randy Dunlap on Wednesday, June 18, 2008 - 5:09 pm. (1 message)

Next thread: [PATCH] MFD maintainer by Samuel Ortiz on Wednesday, June 18, 2008 - 5:42 pm. (11 messages)
To: <linux-mm@...>, LKML <linux-kernel@...>
Date: Wednesday, June 18, 2008 - 5:15 pm

I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and
they caused a hardlock under heavy IO:

x86-implement-pte_special.patch
mm-introduce-get_user_pages_fast.patch
mm-introduce-get_user_pages_fast-fix.patch
mm-introduce-get_user_pages_fast-checkpatch-fixes.patch
x86-lockless-get_user_pages_fast.patch
x86-lockless-get_user_pages_fast-checkpatch-fixes.patch
x86-lockless-get_user_pages_fast-fix.patch
x86-lockless-get_user_pages_fast-fix-2.patch
x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch
x86-lockless-get_user_pages_fast-fix-warning.patch
dio-use-get_user_pages_fast.patch
splice-use-get_user_pages_fast.patch
x86-support-1gb-hugepages-with-get_user_pages_lockless.patch
#
mm-readahead-scan-lockless.patch
radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch
#mm-speculative-page-references.patch: clameter saw bustage
mm-speculative-page-references.patch
mm-speculative-page-references-fix.patch
mm-speculative-page-references-fix-fix.patch
mm-speculative-page-references-hugh-fix3.patch
mm-lockless-pagecache.patch
mm-spinlock-tree_lock.patch
powerpc-implement-pte_special.patch

I am on an x86_64. I dont know what other info you need...

-Ryan
--

To: Ryan Hope <rmh3093@...>
Cc: <linux-mm@...>, LKML <linux-kernel@...>, Nick Piggin <nickpiggin@...>
Date: Thursday, June 19, 2008 - 4:12 am

What kind of machine, how much memory, how many spindles, what
filesystem and what is heavy load?

Furthermore, try the NMI watchdog with serial/net-console to capture its

--

To: Peter Zijlstra <peterz@...>
Cc: Ryan Hope <rmh3093@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Thursday, June 19, 2008 - 4:19 am

Good suggestions. A trace would be really helpful.

As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be
a good idea to turn on if you haven't already.

BTW. what was the reason for applying those patches? Did you hit the

Can you isolate it to one of the two groups of patches? I suspect it
might be the latter so you might try that first -- this version of
speculative page references is very nice in theory but it is a little
more complex to implement the slowpaths so it could be an error there.
--

To: Nick Piggin <nickpiggin@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Sunday, June 22, 2008 - 10:37 am

Well I couldn't stop playing with this... I am pretty sure the cause
of the hardlocks is in the second half of the patches (the speculative
page ref patches). I reversed all of those patches so that just the
GUP patchs were included and no more hardlocks... then I applied the
concurrent page cache patches from the -rt branch include 1 OLD
speculative page ref patch and this caused hardlocks for peopel again.
However enabling heap randomization fixed the hardlocks for one of the
users and the disabling swap fixed the issue of the other user. I hope
this helps.

-Ryan

--

To: Ryan Hope <rmh3093@...>
Cc: Nick Piggin <nickpiggin@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Sunday, June 22, 2008 - 11:07 am

--

To: Peter Zijlstra <peterz@...>
Cc: Nick Piggin <nickpiggin@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Sunday, June 22, 2008 - 11:18 am

Well in the current version of the patchset we are using, one user
would start playing some game (disabling "Disable Heap Randomization"
fixed the hardlocks for him... the other user got hardlocks when
copying an ISO from a reiser4 partition to a reiserfs partition
(disabling swap fixed the issue for him).

--

To: Ryan Hope <rmh3093@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Sunday, June 22, 2008 - 10:29 pm

Hmm, nobody has reported such a hang with -mm yet, so maybe it
is another interaction in the patchset. OTOH, probably nobody
much uses -mm and reiser4, and reiser4 does lots of weird fiddling
with pagecache so it could be broken in -mm even.

--

To: Nick Piggin <nickpiggin@...>
Cc: Ryan Hope <rmh3093@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Monday, June 23, 2008 - 7:48 pm

I have been running and testing -mm with reiser4 for a couple years now.
I haven't been able to run -mm recently because I've been hitting the
copy_user bug Linus fixed for AMD64 but Andrew hasn't updated yet (no
complaints, Andrew) and I'm too lazy to manually patch.
--

To: Nick Piggin <nickpiggin@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Sunday, June 22, 2008 - 11:51 pm

well i get the hardlock on -mm with out using reiser4, i am pretty
sure is swap related

--

To: Ryan Hope <rmh3093@...>, Paul E. McKenney <paulmck@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Monday, June 23, 2008 - 7:54 am

The guys seeing hangs don't use PREEMPT_RCU, do they?

In my swapping tests, I found -mm3 to be stable with classic RCU, but
on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather
quickly. First crash was in find_get_pages so I suspected lockless
pagecache doing something subtly wrong with the RCU API, but I just got
another crash in __d_lookup:

BUG: unable to handle kernel paging request at ffff81004a139f38
IP: [<ffffffff802bb82c>] __d_lookup+0x8c/0x160
PGD 8063 PUD 7fc3f163 PMD 7df50163 PTE 800000004a139160
Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 0
Modules linked in: brd
Pid: 29563, comm: cc1 Not tainted 2.6.26-rc5-mm3 #467
RIP: 0010:[<ffffffff802bb82c>] [<ffffffff802bb82c>] __d_lookup+0x8c/0x160
RSP: 0018:ffff81004bf7dba8 EFLAGS: 00010282
RAX: 0000000000000007 RBX: ffff81004a139f38 RCX: 0000000000000000
RDX: ffff810028057808 RSI: 0000000000000000 RDI: ffff81004bf7a880
RBP: ffff81004bf7dbf8 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: ffff81004a139ef8
R13: 0000000073885cf7 R14: ffff810070f53ef8 R15: ffff81004bf7dca8
FS: 00002abe0a1decf0(0000) GS:ffffffff80779dc0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff81004a139f38 CR3: 0000000057569000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process cc1 (pid: 29563, threadinfo ffff81004bf7c000, task ffff81004bf7a880)
Stack: 0000000100000001 0000000000000007 ffff810070f53f00 00000007000041ed
ffff810001ce2013 ffff81004bf7dca8 00000000000041ed ffff81004bf7de48
ffff81004bf7dca8 ffff81004bf7dcb8 ffff81004bf7dc48 ffffffff802af2b5
Call Trace:
[<ffffffff802af2b5>] do_lookup+0x35/0x230
[<ffffffff80312d60>] ? ext3_permission+0x10/0x20
[<ffffffff802b0cbb>] __link_path_walk+0x39b/0x10a0
[<ffffffff802b...

To: Nick Piggin <nickpiggin@...>
Cc: Ryan Hope <rmh3093@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Monday, June 23, 2008 - 9:05 am

Could you please send me a repeat-by? (At least Alexey is no longer
alone!)

--

To: <paulmck@...>
Cc: Ryan Hope <rmh3093@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Monday, June 23, 2008 - 8:13 pm

OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably
important to reproduce it (but the fact that I'm reproducing oopses
with << PAGE_SIZE objects like dentries and radix tree nodes indicates
that there is even more free-before-grace activity going undetected --
if you construct a test case using full pages, it might become even
easier to detect with DEBUG_PAGEALLOC).

2 socket, 8 core x86 system.

I mounted two tmpfs filesystems, one contains a single large file
which is formatted as 1K block size ext3 and mounted loopback, the
other is used directly. Linux kernel source is unpacked on each mount
and concurrent make -j128 on each. This pushes it pretty hard into
swap. Classic RCU survived another 5 hours of this last night.

But that's a fairly convoluted test for an RCU problem. I expect it
should be easier to trigger with something more targetted...
--

To: Nick Piggin <nickpiggin@...>
Cc: <paulmck@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 11:12 am

Well i tried to run pure -mm this weekend, it locked as soon as I got
into gnome so I applied a couple of the bug fixes from lkml and -mm
seems to be running stable now. I cant seem to get it to hard lock
now, at least not doing the simple stuff that was causing it to hard
lock on my other patchset, either the lockless patches expose some bug
that in -rc6 or lockless requires some other patches further up in the
-mm series file.

--

To: Ryan Hope <rmh3093@...>
Cc: Nick Piggin <nickpiggin@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 11:32 am

Cool!!! Any guess as to which of the bug fixes did the trick?
Failing that, a list of the bug fixes that you applied?

--

To: <paulmck@...>
Cc: Nick Piggin <nickpiggin@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 11:57 am

I can give you a list of patches that should correspond to the thread
name (for the most part):

fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch

fix_munlock-page-table-walk.patch

migration_entry_wait-fix.patch

PATCH collect lru meminfo statistics from correct offset

Mlocked field of /proc/meminfo display silly number.
because trivial mistake exist in meminfo_read_proc().

You can also look in our git repo to see the code that changed with
these patches if you cant track them down in LKML:
http://zen-sources.org/cgi-bin/gitweb.cgi?p=kernel-mm.git;a=shortlog;h=r...

On Tue, Jun 24, 2008 at 11:32 AM, Paul E. McKenney
--

To: Ryan Hope <rmh3093@...>
Cc: Nick Piggin <nickpiggin@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 12:12 pm

Thank you! And is this using Classic RCU or Preemptable RCU?

--

To: <paulmck@...>
Cc: Nick Piggin <nickpiggin@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 12:23 pm

I have been using CONFIG_PREEMPT_RCU=Y

On Tue, Jun 24, 2008 at 12:12 PM, Paul E. McKenney
--

To: <paulmck@...>
Cc: Nick Piggin <nickpiggin@...>, Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 2:01 pm

I just a report of someone getting a hardlock while building boost, he
was using classic RCU and no swap.

--

To: Ryan Hope <rmh3093@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Sunday, June 22, 2008 - 11:56 pm

Oh you do, OK good, it would be nice if I were able to reproduce it here.

Any particular thing that triggers it? Preferably without running X or
any proprietary software (eg. if you run a make -j128 kernel compile or
something that forces a lot of swapping, does that lock up?).

What filesystem? Can you also attach your .config

No luck getting a backtrace out of the NMI watchdog?

Thanks,
Nick
--

To: Nick Piggin <nickpiggin@...>
Cc: Peter Zijlstra <peterz@...>, LKML <linux-kernel@...>
Date: Friday, June 20, 2008 - 10:33 am

Well if there are no more suggestion we are going to have to abandon
testing lockless for now because it is causing hardlocks on everyones
box who uses it. I hope the next round of patches has better luck.

--

To: Nick Piggin <nickpiggin@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Thursday, June 19, 2008 - 4:31 pm

This seems to be hardlocking on anyone that has a 64bit processor, the
only one who this has not locked on yet has a 32bit processor. I hope
that helps, its the best I can come up with so far.

-Ryan

--

To: Nick Piggin <nickpiggin@...>
Cc: Peter Zijlstra <peterz@...>, <linux-mm@...>, LKML <linux-kernel@...>
Date: Thursday, June 19, 2008 - 10:52 am

The reason for applying these patches was because users of my patchset
have been wanting me to include lockless again. It was pretty popular
among the users but we removed it because it would cause hardlocks. I
though I would try it out again now that its in -mm.

I guess I could start reverting patches and see if the issue goes away.

--

To: Ryan Hope <rmh3093@...>
Cc: <linux-mm@...>, LKML <linux-kernel@...>
Date: Wednesday, June 18, 2008 - 5:28 pm

On Wed, 18 Jun 2008 17:15:08 -0400

if it's locking related, enabling LOCKDEP is a first good test to do.

CONFIG_PROVE_LOCKING=y
as well as the various spinlock/mutex lock debug questions

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: <linux-mm@...>, LKML <linux-kernel@...>
Date: Thursday, June 19, 2008 - 10:45 am

Well enabling these debug options is sorta useless because once it
hardlocks I cant see or do anything..

--

To: Ryan Hope <rmh3093@...>
Cc: <linux-mm@...>, LKML <linux-kernel@...>
Date: Thursday, June 19, 2008 - 8:05 pm

On Thu, 19 Jun 2008 10:45:43 -0400

the nice thing about lockdep is that it spots *potential* deadlocks;
often well before the actual deadlock happens...

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

Previous thread: [PATCH -next] lkdtm: fix for CONFIG_SCSI=n by Randy Dunlap on Wednesday, June 18, 2008 - 5:09 pm. (1 message)

Next thread: [PATCH] MFD maintainer by Samuel Ortiz on Wednesday, June 18, 2008 - 5:42 pm. (11 messages)