Re: [PATCH] kexec jump: fix compiling warning on xchg(&kexec_lock, 0) in kernel_kexec()

Previous thread: sysprof in 2.6.27 kernel by Lukas Hejtmanek on Wednesday, August 13, 2008 - 4:39 am. (9 messages)

Next thread: Proposal: kernel config option for default UFS filesystem by barry bouwsma on Wednesday, August 13, 2008 - 5:37 am. (2 messages)
To: Eric W. Biederman <ebiederm@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Vivek Goyal <vgoyal@...>, <mingo@...>, Linus Torvalds <torvalds@...>
Cc: <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 5:12 am

Fix compiling warning on xchg(&kexec_lock, 0) in kernel_kexec().

Signed-off-by: Huang Ying <ying.huang@intel.com>

---
kernel/kexec.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1433,6 +1433,7 @@ module_init(crash_save_vmcoreinfo_init)
int kernel_kexec(void)
{
int error = 0;
+ int locked;

if (xchg(&kexec_lock, 1))
return -EBUSY;
@@ -1498,7 +1499,8 @@ int kernel_kexec(void)
#endif

Unlock:
- xchg(&kexec_lock, 0);
+ locked = xchg(&kexec_lock, 0);
+ BUG_ON(!locked);

return error;
}

--

To: Huang Ying <ying.huang@...>
Cc: Eric W. Biederman <ebiederm@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Vivek Goyal <vgoyal@...>, <mingo@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 12:57 pm

Why do you want to do this at all?

And why do you implement your locks with xchg() in the first place? That's
total and utter crap.

Hint: we have _real_ locking primitives in the kernel.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Huang Ying <ying.huang@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Vivek Goyal <vgoyal@...>, <mingo@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 2:12 pm

This part certainly.

The way the code should work, and the way it has in the past is:
image = xchg(&kexec_image, NULL)
if (!image)
return -EINVAL;

Very simple and very obvious and very easy to get right, and it has
been that way for years.

Eric
--

To: Eric W. Biederman <ebiederm@...>
Cc: <torvalds@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 3:44 pm

On Wed, 13 Aug 2008 11:12:48 -0700

- We're talking about kexec_lock here, not kexec_image

- afacit all manipulations of kexec_image happen under kexec_lock, so
they don't need to be atomic, do they?

- Is xchg() guaranteed to be atomic? That's what atomic_xchg() is for.

- xchg() isn't guaranteed to exist on all architectures. atomic_xchg() is.

Could someone please review and test this? It's on top of

kexec-jump-clean-up-ifdef-and-comments.patch
kexec-jump-rename-kexec_control_code_size-to-kexec_control_page_size.patch
kexec-jump-check-code-size-in-control-page.patch
kexec-jump-check-code-size-in-control-page-fix.patch
kexec-jump-remove-duplication-of-kexec_restart_prepare.patch
kexec-jump-in-sync-with-hibernation-implementation.patch
kexec-jump-__ftrace_enabled_save-restore.patch
kexec-jump-fix-for-ftrace.patch

Subject: kexec: use a bitop for locking rather than xchg()
From: Andrew Morton <akpm@linux-foundation.org>

Functionally the same, but more conventional.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

kernel/kexec.c | 41 +++++++++++++++++++++--------------------
1 file changed, 21 insertions(+), 20 deletions(-)

diff -puN kernel/kexec.c~kexec-use-a-bitop-for-locking-rather-than-xchg kernel/kexec.c
--- a/kernel/kexec.c~kexec-use-a-bitop-for-locking-rather-than-xchg
+++ a/kernel/kexec.c
@@ -9,6 +9,7 @@
#include <linux/capability.h>
#include <linux/mm.h>
#include <linux/file.h>
+#include <linux/bitops.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/kexec.h>
@@ -924,19 +925,28 @@ static int kimage_load_segment(struct ki
*/
struct kimage *kexec_image;
struct kimage *kexec_crash_image;
+
+static unsigned long kexec_bitlock;
+
/*
- * A home grown binary mutex.
- * Nothing can wait so this mutex is safe ...

To: Andrew Morton <akpm@...>
Cc: Eric W. Biederman <ebiederm@...>, <torvalds@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 4:15 pm

Yes, xchg() is guaranteed to be atomic. atomic_xchg() applies only to

You appear to be confusing xchg() with cmpxchg(). AFAIK, xchg() exists
on all architectures, and is used in several instances of generic code.
It is particularly extensively used in the networking layer.

Trond

--

To: Andrew Morton <akpm@...>
Cc: Eric W. Biederman <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 3:50 pm

Nope. That needs to be an "unsigned long".

But more importantl, why not just make it a lock in the first place?

static DEFINE_SPINLOCK(kexec_lock);

#define kexec_trylock() spin_trylock(&kexec_lock)
#define kexec_unlock() spin_unlock(&kexec_lock)

and then you get it all right and clear and obvious.

Yeah, and I didn't check whether there is anything that is supposed to be
able to sleep. If there is, use a mutex instead of a spinlock, of course.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 4:07 pm

On Wed, 13 Aug 2008 12:50:57 -0700 (PDT)

Used a bitop to preserve the runtime checking in there. spin_unlock()
doesn't return the previous lockedness.

Yes, it does sleepy things inside the lock.

A bitop seems a better fit to me. We never spin on that lock (it
always uses test_and_set), so why use a "spin"lock?
--

To: Andrew Morton <akpm@...>
Cc: <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 4:13 pm

Umm. spin_unlock does a lot more when you have lock debugging on, and

..because an atomic bitop is not the same as a lock.

The memory ordering guarantees are different. Yes, they are sufficient,
but that's because we've had to make them so to account for CRAP CODE that
uses bit operations as if they were locks.

Don't continue that. It's WRONG.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 4:25 pm

On Wed, 13 Aug 2008 13:13:13 -0700 (PDT)

#2:

(The xchg(kexec_crash_image) stuff is still in there)

--- a/kernel/kexec.c~kexec-use-a-bitop-for-locking-rather-than-xchg
+++ a/kernel/kexec.c
@@ -924,19 +924,14 @@ static int kimage_load_segment(struct ki
*/
struct kimage *kexec_image;
struct kimage *kexec_crash_image;
-/*
- * A home grown binary mutex.
- * Nothing can wait so this mutex is safe to use
- * in interrupt context :)
- */
-static int kexec_lock;
+
+static DEFINE_SPINLOCK(kexec_lock);

asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments,
struct kexec_segment __user *segments,
unsigned long flags)
{
struct kimage **dest_image, *image;
- int locked;
int result;

/* We only trust the superuser with rebooting the system. */
@@ -972,8 +967,7 @@ asmlinkage long sys_kexec_load(unsigned
*
* KISS: always take the mutex.
*/
- locked = xchg(&kexec_lock, 1);
- if (locked)
+ if (!spin_trylock(&kexec_lock))
return -EBUSY;

dest_image = &kexec_image;
@@ -1015,8 +1009,7 @@ asmlinkage long sys_kexec_load(unsigned
image = xchg(dest_image, image);

out:
- locked = xchg(&kexec_lock, 0); /* Release the mutex */
- BUG_ON(!locked);
+ spin_unlock(&kexec_lock);
kimage_free(image);

return result;
@@ -1063,9 +1056,6 @@ asmlinkage long compat_sys_kexec_load(un

void crash_kexec(struct pt_regs *regs)
{
- int locked;
-
-
/* Take the kexec_lock here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
* we are using after a panic on a different cpu.
@@ -1074,8 +1064,7 @@ void crash_kexec(struct pt_regs *regs)
* of memory the xchg(&kexec_crash_image) would be
* sufficient. But since I reuse the memory...
*/
- locked = xchg(&kexec_lock, 1);
- if (!locked) {
+ if (spin_trylock(&kexec_lock)) {
if (kexec_crash_image) {
struct pt_regs fixed_regs;
crash_setup_regs(&fixed_regs, regs);
@@ -1083,8 +1072,7 @...

To: Andrew Morton <akpm@...>
Cc: <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 4:31 pm

I thought you said there were things that want to sleep in the region?

If so, spinlocks will work as long as you don't have CONFIG_PREEMPT or
lock validation (there's no way to deadlock thanks to all the lock getters
using the "trylock" variant), but will blow up because a successful
trylock will obviously also disable preemption and/or trigger all the lock
detection.

So if there are potential sleepers, you'd need the mutex instead.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 4:41 pm

On Wed, 13 Aug 2008 13:31:24 -0700 (PDT)

be reasonable - that was over five minutes ago.

--- a/kernel/kexec.c~kexec-use-a-bitop-for-locking-rather-than-xchg
+++ a/kernel/kexec.c
@@ -12,7 +12,7 @@
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/kexec.h>
-#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/list.h>
#include <linux/highmem.h>
#include <linux/syscalls.h>
@@ -924,19 +924,14 @@ static int kimage_load_segment(struct ki
*/
struct kimage *kexec_image;
struct kimage *kexec_crash_image;
-/*
- * A home grown binary mutex.
- * Nothing can wait so this mutex is safe to use
- * in interrupt context :)
- */
-static int kexec_lock;
+
+static DEFINE_MUTEX(kexec_mutex);

asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments,
struct kexec_segment __user *segments,
unsigned long flags)
{
struct kimage **dest_image, *image;
- int locked;
int result;

/* We only trust the superuser with rebooting the system. */
@@ -972,8 +967,7 @@ asmlinkage long sys_kexec_load(unsigned
*
* KISS: always take the mutex.
*/
- locked = xchg(&kexec_lock, 1);
- if (locked)
+ if (!mutex_trylock(&kexec_mutex))
return -EBUSY;

dest_image = &kexec_image;
@@ -1015,8 +1009,7 @@ asmlinkage long sys_kexec_load(unsigned
image = xchg(dest_image, image);

out:
- locked = xchg(&kexec_lock, 0); /* Release the mutex */
- BUG_ON(!locked);
+ mutex_unlock(&kexec_mutex);
kimage_free(image);

return result;
@@ -1063,10 +1056,7 @@ asmlinkage long compat_sys_kexec_load(un

void crash_kexec(struct pt_regs *regs)
{
- int locked;
-
-
- /* Take the kexec_lock here to prevent sys_kexec_load
+ /* Take the kexec_mutex here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
* we are using after a panic on a different cpu.
*
@@ -1074,8 +1064,7 @@ void crash_kexec(struct pt_regs *regs)
...

To: Andrew Morton <akpm@...>
Cc: <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <vgoyal@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 6:17 pm

LOL. Anyway, I can't find anything wrong with this last one, however
difficult I try to be.

Linus
--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <ebiederm@...>, <ying.huang@...>, <pavel@...>, <nigel@...>, <rjw@...>, <mingo@...>, <linux-kernel@...>, <kexec@...>
Date: Wednesday, August 13, 2008 - 5:21 pm

I tested kexec and kudmp on 2.6.27-rc3 with this patch. Works for me on a
32 bit machine.

Thanks
Vivek
--

To: Eric W. Biederman <ebiederm@...>
Cc: Linus Torvalds <torvalds@...>, Huang Ying <ying.huang@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, <mingo@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 2:31 pm

Hi Eric,

Are there any issues with usage of test_and_set_bit() or usage of spinlock
primitives?

Thanks
Vivek
--

To: Huang Ying <ying.huang@...>
Cc: Eric W. Biederman <ebiederm@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Vivek Goyal <vgoyal@...>, <mingo@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 5:27 am

Would prefer that thi code not use such a peculair idiom. I don't
believe that it needs to.

Please always quote the compiler output in the changelog when fixing
warnings and build errors.

The patch is titled "kexec jump: ..." whereas this is just a plain old
kexec fix, which is applicable to mainline.

We don't need to create that local. I queued this:

Subject: kexec: fix compilation warning on xchg(&kexec_lock, 0) in kernel_kexec()
From: Huang Ying <ying.huang@intel.com>

kernel/kexec.c: In function 'kernel_kexec':
kernel/kexec.c:1506: warning: value computed is not used

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

kernel/kexec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN kernel/kexec.c~kexec-jump-fix-compiling-warning-on-xchgkexec_lock-0-in-kernel_kexec kernel/kexec.c
--- a/kernel/kexec.c~kexec-jump-fix-compiling-warning-on-xchgkexec_lock-0-in-kernel_kexec
+++ a/kernel/kexec.c
@@ -1503,7 +1503,8 @@ int kernel_kexec(void)
}

Unlock:
- xchg(&kexec_lock, 0);
+ if (!xchg(&kexec_lock, 0))
+ BUG();

return error;
}
_

--

To: Andrew Morton <akpm@...>
Cc: Huang Ying <ying.huang@...>, Eric W. Biederman <ebiederm@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Vivek Goyal <vgoyal@...>, <mingo@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 1:01 pm

No, please don't.

Just don't take this whole patch-series until it's cleaned up. There is
absolutely no excuse for using xchg as a locking primitive. Nothing like
this should be queued anywhere, it should be burned and the ashes should
be scattered over the atlantic so that nobody will ever see them again.

F*ck me with a spoon, if you have to use xchg() to do a trylock, why the
hell isn't the unlock sequence then

smp_mb();
var = 0;

instead? Not that that's really right either, but at least it avoids the
_ridiculous_ crap. The real solution is probably to use a spinlock and
trylock/unlock.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Huang Ying <ying.huang@...>, Eric W. Biederman <ebiederm@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Vivek Goyal <vgoyal@...>, <mingo@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 1:25 pm

Or test_and_set_bit(). That's what I've been saying too, only
differently ;)

But cleaning up the long-standing silly usage of xchg() is a different
activity from suppressing this recently-added compile warning.

--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, Huang Ying <ying.huang@...>, Eric W. Biederman <ebiederm@...>, Pavel Machek <pavel@...>, <nigel@...>, Rafael J. Wysocki <rjw@...>, Vivek Goyal <vgoyal@...>, <linux-kernel@...>, Kexec Mailing List <kexec@...>
Date: Wednesday, August 13, 2008 - 1:59 pm

actually, in this case i disagree: the warning here is a canary that
there's something wrong about this code - i.e. gcc is _right_ about
warning us. The warning is also totally harmless - the warning shows us
the suckiness of the code structure - and squashing the warning doesnt
fix that.

So im coal-mine analogies, i disagree with squashing the canary, we
should find and fix the shaft that emits the smelly methane instead ;-)

Ingo
--

Previous thread: sysprof in 2.6.27 kernel by Lukas Hejtmanek on Wednesday, August 13, 2008 - 4:39 am. (9 messages)

Next thread: Proposal: kernel config option for default UFS filesystem by barry bouwsma on Wednesday, August 13, 2008 - 5:37 am. (2 messages)