Re: [2.6.27-rc5] inotify_read's ev_mutex vs do_page_fault's mmap_sem...

Previous thread: [PATCH] sh: intc_prio_data() test before subtraction on unsigned by roel kluin on Tuesday, September 9, 2008 - 2:02 pm. (2 messages)

Next thread: none
From: Daniel J Blueman
Date: Tuesday, September 9, 2008 - 2:03 pm

I observed this locking violation [1] while gnome-panel was loading;
this was previously reported at
http://uwsg.iu.edu/hypermail/linux/kernel/0806.3/2881.html .

Let me know for more information/config/testing. Thanks!
  Daniel

--- [1]

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc5-233c-debug #2
-------------------------------------------------------
gnome-panel/4944 is trying to acquire lock:
 (&mm->mmap_sem){----}, at: [<ffffffff806aa40f>] do_page_fault+0x12f/0xae0

but task is already holding lock:
 (&dev->ev_mutex){--..}, at: [<ffffffff80320ac3>] inotify_read+0xe3/0x200

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&dev->ev_mutex){--..}:
       [<ffffffff80271889>] __lock_acquire+0xd49/0x1190
       [<ffffffff80271d61>] lock_acquire+0x91/0xc0
       [<ffffffff806a5069>] __mutex_lock_common+0xb9/0x430
       [<ffffffff806a54bf>] mutex_lock_nested+0x3f/0x50
       [<ffffffff80321216>] inotify_dev_queue_event+0x46/0x1c0
       [<ffffffff803200e6>] inotify_inode_queue_event+0xc6/0x110
       [<ffffffff802f424c>] fsnotify_create+0x3c/0x70
       [<ffffffff802f4c5d>] vfs_create+0xbd/0xd0
       [<ffffffff802f7dfe>] do_filp_open+0x80e/0x910
       [<ffffffff802e82b0>] do_sys_open+0x80/0x110
       [<ffffffff802e8380>] sys_open+0x20/0x30
       [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #2 (&ih->mutex){--..}:
       [<ffffffff80271889>] __lock_acquire+0xd49/0x1190
       [<ffffffff80271d61>] lock_acquire+0x91/0xc0
       [<ffffffff806a5069>] __mutex_lock_common+0xb9/0x430
       [<ffffffff806a54bf>] mutex_lock_nested+0x3f/0x50
       [<ffffffff8031fe73>] inotify_find_update_watch+0x53/0xe0
       [<ffffffff80320d85>] sys_inotify_add_watch+0x115/0x1d0
       [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 ...
From: Nick Piggin
Date: Tuesday, September 9, 2008 - 9:07 pm

Thanks for the report. I've attached a patch you could test. It compiles
(and boots a UML here) but I don't think I've actually tested the inotify
path at all, so it may explode on you.

Peter, this copy_*_user stuff is quite a nightmare... Well actually it
isn't, if the code is designed with it in mind from the start, but it is
easy for people to forget it can take mmap_sem and filesystem locks... Is
there a way to annotate it and say "might take mmap_sem for read" for
example? So that these LORs will _always_ trigger rather than just once
in a million times when the reclaim gods frown on us?

From: Peter Zijlstra
Date: Wednesday, September 10, 2008 - 12:57 am

Sure, how about the below - untested - uncompiled, might eat kittens,
etc..

Just sprinkle something like:

  might_lock_read(&mm->mmap_sem);

in the right places.

---
Subject: lockdep: might_lock annotation

useful to establish a lock dependency in case the actual dependency is
rare or hard to trigger.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 331e5f1..0aa657a 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -480,4 +480,22 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 # define lock_map_release(l)			do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_LOCKING
+# define might_lock(lock) 						\
+do {									\
+	typecheck(struct lockdep_map *, &(lock)->dep_map);		\
+	lock_acquire(&(lock)->dep_map, 0, 0, 0, 2, NULL, _THIS_IP_);	\
+	lock_release(&(lock)->dep_map, 0, _THIS_IP_);			\
+} while (0)
+# define might_lock_read(lock) 						\
+do {									\
+	typecheck(struct lockdep_map *, &(lock)->dep_map);		\
+	lock_acquire(&(lock)->dep_map, 0, 0, 1, 2, NULL, _THIS_IP_);	\
+	lock_release(&(lock)->dep_map, 0, _THIS_IP_);			\
+} while (0)
+#else
+# define might_lock(lock) do { } while (0)
+# define might_lock_read(lock) do { } while (0)
+#endif
+
 #endif /* __LINUX_LOCKDEP_H */


--

From: Nick Piggin
Date: Wednesday, September 10, 2008 - 1:03 am

From: Ingo Molnar
Date: Wednesday, September 10, 2008 - 1:37 am

cool! Please send in an RFC patch once you have something that boots - 
we can stick it into tip/core/locking and see whether there's any new 
messages on a wide range of systems and workloads. (and we'd also check 
whether the number of kittens is an invariant.)

	Ingo
--

From: Nick Piggin
Date: Wednesday, September 10, 2008 - 2:50 am

Well I have verified it boots, and have used the annotation in some of
x86-64's user copy routines (luckily no flood of bugs I was scared of,
phew!)

So I would like to request you merge Peter's patch, and we'll hopefully
start seeing the annotations being used. FWIW, I don't suppose lockdep
can determine that it is a sleeping lock, and do the appropriate
might_sleep checks at this point as well?

Thanks,
Nick


--

From: Peter Zijlstra
Date: Wednesday, September 10, 2008 - 2:59 am

Humm, no - we don't actually have that information there - I guess one
could add it to lockdep_map and set it from the various init routines,
but I'm not sure its worth it - just add might_sleep() along with
might_lock() :-)

--

From: Nick Piggin
Date: Wednesday, September 10, 2008 - 1:11 pm

I guess so... seems like lockdep should know about "locks", however :)
Spinlock isn't always a spinlock, and ordinary kernel code usually
should not care about that.

But I'll add it in this case.
--

From: Nick Piggin
Date: Wednesday, September 10, 2008 - 1:12 pm

Indeed it was wrong, I messed up the waitqueue handling. This one
survives longer for me...
From: Daniel J Blueman
Date: Wednesday, September 10, 2008 - 1:10 pm

So far, so good with that last patch. I'll continue testing this and
follow up if I find any problem.

Thanks for the patching, Nick!

Daniel
-- 
Daniel J Blueman
--

Previous thread: [PATCH] sh: intc_prio_data() test before subtraction on unsigned by roel kluin on Tuesday, September 9, 2008 - 2:02 pm. (2 messages)

Next thread: none