Hi,
include/linux/spinlock.h shows:
#define spin_lock_irq(lock) _spin_lock_irq(lock)
unconditionally, i.e. irrespectible of config options, we always (on SMP)
call kernel/spinlock.c:_spin_lock_irq(), which is even not inlined.
Contrary to that, unlocks are written as one would expect, i.e:
#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || \
!defined(CONFIG_SMP)
# define spin_unlock_irq(lock) _spin_unlock_irq(lock)
#else
# define spin_unlock_irq(lock) \
do { \
__raw_spin_unlock(&(lock)->raw_lock); \
__release(lock); \
local_irq_enable(); \
} while (0)
and __raw_spin_unlock() is of course properly inlined.
What is the reason for this asymetry? Shouldn't the acquiring functions be
implemented in the very same way? Or at least, shouldn't all the
__lockfunc functions be inlined?
Thanks,
--
Jiri Kosina
SUSE Labs
--
i.e. is there any particular reason why we don't have something like the
patch below (implemented for all the lock variants of course, this is
just to demonstrate what I mean)?
spinlocks: inline spin_lock() in non-debug cases
inline spin_lock() in the non-debug case.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 576a5f7..30922ba 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -178,7 +178,12 @@ do { \
#define read_trylock(lock) __cond_lock(lock, _read_trylock(lock))
#define write_trylock(lock) __cond_lock(lock, _write_trylock(lock))
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || \
+ !defined(CONFIG_SMP)
#define spin_lock(lock) _spin_lock(lock)
+#else
+#define spin_lock(lock) \
+ do { __raw_spin_lock(&(lock)->raw_lock); __acquire(lock); } while (0)
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define spin_lock_nested(lock, subclass) _spin_lock_nested(lock, subclass)
--
IIRC the main reason we decided to uninline them was image size. So i'd suggest for you to check how this change impacts vmlinux size (on both 64-bit and 32-bit), a typical distro config (or allyesconfig with lock debugging disabled). If you do the test on x86.git/latest you'll also have the CONFIG_OPTIMIZE_INLINING=y and CONFIG_CC_OPTIMIZE_FOR_SIZE=y combination as well, which generates the most compact x86 kernel image ever. Ingo --
In fact we have received report from one of our users that he is seeing approximately 15% performance degradation of mmap() when spinlocks are not inlined. I am going to do some performance measurements myself shortly, as it seems quite strange, but while at it, I have noticed the aforementioned asymetry in spinlock.h, so I just wanted to know if there is any particular reason behind that. -- Jiri Kosina SUSE Labs --
inlining decisions almost never have effects of that order of magnitude - especially on new CPUs, so that 15% looks quite suspicious to me. If it's real then an easy-to-run testcase would be nice. Ingo --
Definitely, that's why I am a little bit suspicious. Thanks, -- Jiri Kosina SUSE Labs --
At some point -- but that was before queued locks -- I noticed that for i386 spin unlocks the call sequence for the sub function is actually larger in code than the actual spin unlock operation and for x86-64 it was about the same. That was not even counting any negative register allocation effects the call has on the caller. Spinlocks don't clobber a lot of registers, but the compiler doesn't know that when calling the function so it has to assume all ABI callee clobbered are gone. I didn't do anything back then because at this point Ingo was reorganizing the spinlock code hourly[1] for his lockdep etc. merge and wanted to wait for it to settle down and then it dropped from the radar. Anyways without queued spinlocks that has probably changed again, might be still worth rechecking. -Andi [1] ok I'm exaggerating... --
spin unlocks seem to be properly inlined anyway, so that should be fine. My concern here is the non-inlining of spin locks, for which I don't think your argument above is also valid, right? Thanks, -- Jiri Kosina --
When I did the original numbers (again before queued locks) the call was slightly shorter than the actual spin lock operation. This was not counting register allocator effects though which are difficult to measure generally. -Andi --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm | [PATCH 03/07] ARM: Use shared GIC entry macros on Realview |
| Oliver Neukum | Re: [Bug #13682] The webcam stopped working when upgrading from 2.6.29 to 2.6.30 |
| Martin Schwidefsky | <
