Re: [PATCH] sysfs: Add lockdep annotations for the sysfs active reference

Previous thread: RE: Tmem [PATCH 0/5] (Take 3): Transcendent memory by Dan Magenheimer on Thursday, December 24, 2009 - 1:51 pm. (1 message)

Next thread: OOM killer unexpectedly called with kernel 2.6.32 by A. Boulan on Thursday, December 24, 2009 - 4:42 pm. (2 messages)
From: Linus Torvalds
Date: Thursday, December 24, 2009 - 3:00 pm

.. or wahetever it is you'll be celebrating today/tomorrow.

And if you aren't celebrating anything at all, but instead sitting in your 
dark basement feeling lonely and bored, you can at least try out the 
latest -rc kernel. Because it's better than moping around doing nothing.

And if you _are_ celebrating, but are having a bit of an overload of ham 
(or are getting fed up with losing all your clothes in strip dreidel and 
having people point and snigger at you, if that's what gets you through 
the holidays), take a break, compile a kernel, and try it out for size. 

Because while your pants may not fit you for the next month, the new 
kernel might just fit your computer perfectly.

The dirstat says it all: mostly drivers (much of it staging updates, 
especially as nouveau also counts as staging even if it's not physically 
there). But a smattering of changes all over (scheduler, vfs, arch, kfifo 
etc):

   2.9% Documentation/
   2.4% arch/arm/
   3.4% arch/powerpc/
   2.7% arch/x86/
   9.4% arch/
   9.8% drivers/gpu/drm/nouveau/
   2.8% drivers/gpu/drm/radeon/
  15.4% drivers/gpu/drm/
   6.4% drivers/platform/x86/
  12.1% drivers/staging/dst/
  13.4% drivers/staging/sm7xx/
  27.1% drivers/staging/
  10.3% drivers/usb/serial/
  11.9% drivers/usb/
  69.3% drivers/
   3.7% fs/
   6.4% include/linux/
   6.5% include/
   3.4% kernel/
   2.4% sound/pci/hda/
   3.3% sound/

I'd have been happier with a smaller -rc2, but I've seen worse too, so I'm 
not going to complain. 

And now I'm off to peel potatoes. And then to eat them.

Have fun,

		Linus

---
Al Viro (4):
      fix braindamage in audit_tree.c untag_chunk()
      fix more leaks in audit_tree.c tag_chunk()
      Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3)
      Sanitize f_flags helpers

Alan Cox (1):
      jfs: Fix 32bit build warning

Alan Stern (5):
      PM: Use pm_runtime_put_sync in system resume
      PM: Runtime PM documentation update
      USB: ...
From: Ingo Molnar
Date: Friday, December 25, 2009 - 3:27 am

Today's -tip crashed during bootup on one of my testsystems:

[   28.616777] initcall eeepc_laptop_init+0x0/0x4d returned -19 after 29853 usecs
[   28.624002] calling  dell_wmi_init+0x0/0x126 @ 1
[   28.629128] device: 'input1': device_add
[   28.633113] PM: Adding info for No Bus:input1
[   28.637678] input: Dell WMI hotkeys as /class/input/input1
[   28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0)
[   28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014
[   28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70
[   28.654439] *pde = 00000000 
[   28.654439] Oops: 0000 [#1] DEBUG_PAGEALLOC
[   28.654439] last sysfs file: 
[   28.654439] Modules linked in:
[   28.654439] 
[   28.654439] Pid: 1, comm: swapper Not tainted 2.6.33-rc2-tip-00158-g2bd999a-dirty #2238 A8N-E/System Product Name
[   28.654439] EIP: 0060:[<c17f7f21>] EFLAGS: 00010282 CPU: 0
[   28.654439] EIP is at wmi_install_notify_handler+0x31/0x70
[   28.654439] EAX: 00000006 EBX: c17f55d0 ECX: 0000009d EDX: fffffff4
[   28.654439] ESI: 00000000 EDI: 00000000 EBP: f7052f90 ESP: f7052f84
[   28.654439]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[   28.654439] Process swapper (pid: 1, ti=f7052000 task=f7070000 task.ti=f7052000)
[   28.654439] Stack:
[   28.654439]  fffffff4 00000000 00000006 f7052fa4 c2165992 aa66398a 00000006 aa66398a
[   28.654439] <0> f7052fd0 c1001122 c1d58799 c2165874 00000001 0000749d 00000000 c2165874
[   28.654439] <0> c219a214 c21271d6 00000000 f7052fe4 c2127262 00000000 000000e0 00000000
[   28.654439] Call Trace:
[   28.654439]  [<c2165992>] ? dell_wmi_init+0x11e/0x126
[   28.654439]  [<c1001122>] ? do_one_initcall+0x32/0x1d0
[   28.654439]  [<c2165874>] ? dell_wmi_init+0x0/0x126
[   28.654439]  [<c2165874>] ? dell_wmi_init+0x0/0x126
[   28.654439]  [<c21271d6>] ? kernel_init+0x0/0xe0
[   28.654439]  [<c2127262>] ? kernel_init+0x8c/0xe0
[   28.654439]  [<c10033c6>] ? kernel_thread_helper+0x6/0x10
[   28.654439] Code: 89 ...
From: Dmitry Torokhov
Date: Friday, December 25, 2009 - 12:49 pm

Hmm, the patch is busted in one way, but it should not be crashing like
that still... I wonder what is going on. Still, the patch below should
help it a bit.

-- 
Dmitry

dell-wmi - fix condition to abort driver loading

From: Dmitry Torokhov <dmitry.torokhov@gmail.com>

The commit 1fdd407f4e3f2ecb453954cbebb6c22491c61853 incorrectly made driver
abort loading when known GUID is present when it should have done exactly
the opposite.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
---

 drivers/platform/x86/dell-wmi.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..c980782 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -324,7 +324,7 @@ static int __init dell_wmi_init(void)
 {
 	int err;
 
-	if (wmi_has_guid(DELL_EVENT_GUID)) {
+	if (!wmi_has_guid(DELL_EVENT_GUID)) {
 		printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
 		return -ENODEV;
 	}
--

From: Len Brown
Date: Saturday, December 26, 2009 - 1:19 pm

> dell-wmi - fix condition to abort driver loading


applied to acpi-test

thanks,
--

From: Len Brown
Date: Saturday, December 26, 2009 - 1:17 pm

Rather than reverting the broken patch that caused wmi to load,
does applying this patch to deal with the broken error handling
cause the oops to go away?

thanks,
-Len

From d11e073ee3e3091d9190dace97ce480e960cca1b Mon Sep 17 00:00:00 2001
From: Len Brown <len.brown@intel.com>
Date: Fri, 25 Dec 2009 23:14:26 -0500
Subject: [PATCH] Revert "wmi: Free the allocated acpi objects through wmi_get_event_data"
X-Patchwork-Hint: ignore

This reverts commit 3e9b988e4edf065d39c1343937f717319b1c1065.

Reported-by: Sedat Dilek <sedat.dilek@googlemail.com>
Tested-by Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/platform/x86/dell-wmi.c |    1 -
 drivers/platform/x86/hp-wmi.c   |    2 --
 drivers/platform/x86/wmi.c      |    4 ++--
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..46244c6 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -238,7 +238,6 @@ static void dell_wmi_notify(u32 value, void *context)
 			input_sync(dell_wmi_input_dev);
 		}
 	}
-	kfree(obj);
 }
 
 
diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
index 8781d8f..222ab57 100644
--- a/drivers/platform/x86/hp-wmi.c
+++ b/drivers/platform/x86/hp-wmi.c
@@ -388,8 +388,6 @@ static void hp_wmi_notify(u32 value, void *context)
 	} else
 		printk(KERN_INFO "HP WMI: Unknown key pressed - %x\n",
 			eventcode);
-
-	kfree(obj);
 }
 
 static int __init hp_wmi_input_setup(void)
diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
index 9f93d6c..e425a86 100644
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -540,8 +540,8 @@ EXPORT_SYMBOL_GPL(wmi_remove_notify_handler);
 /**
  * wmi_get_event_data - Get WMI data associated with an event
  *
- * @event: Event to find
- * @out: Buffer to hold event data. out->pointer should be freed with kfree()
+ * @event - Event ...
From: Len Brown
Date: Saturday, December 26, 2009 - 9:20 pm

These kfree's look correct, assuming we properly check
the return status.  So perhaps instead you can test
the patch below?

thanks,
-Len

From 5caa3ab36da77d59017cff9b9d1e910862b489e7 Mon Sep 17 00:00:00 2001
Message-Id: <5caa3ab36da77d59017cff9b9d1e910862b489e7.1261887124.git.len.brown@intel.com>
In-Reply-To: <51b0f1c2b8c32ee44ff01ef74599a1f17e4fc565.1261887124.git.len.brown@intel.com>
References: <51b0f1c2b8c32ee44ff01ef74599a1f17e4fc565.1261887124.git.len.brown@intel.com>
From: Len Brown <len.brown@intel.com>
Date: Sat, 26 Dec 2009 23:02:24 -0500
Subject: [PATCH 3/3] dell-wmi, hp-wmi, msi-wmi: check wmi_get_event_data() return value
X-Patchwork-Hint: ignore

When acpi_evaluate_object() is passed ACPI_ALLOCATE_BUFFER,
the caller must kfree the returned buffer if AE_OK is returned.

The callers of wmi_get_event_data() pass ACPI_ALLOCATE_BUFFER,
and thus must check its return value before accessing
or kfree() on the buffer.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/platform/x86/dell-wmi.c |    7 ++++++-
 drivers/platform/x86/hp-wmi.c   |    7 ++++++-
 drivers/platform/x86/msi-wmi.c  |    7 ++++++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 4c7e702..500af8c 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -202,8 +202,13 @@ static void dell_wmi_notify(u32 value, void *context)
 	struct acpi_buffer response = { ACPI_ALLOCATE_BUFFER, NULL };
 	static struct key_entry *key;
 	union acpi_object *obj;
+	acpi_status status;
 
-	wmi_get_event_data(value, &response);
+	status = wmi_get_event_data(value, &response);
+	if (status != AE_OK) {
+		printk(KERN_INFO "dell-wmi: bad event status 0x%x\n", status);
+		return;
+	}
 
 	obj = (union acpi_object *)response.pointer;
 
diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
index 18bf741..5b648f0 100644
--- a/drivers/platform/x86/hp-wmi.c
+++ ...
From: Ingo Molnar
Date: Monday, December 28, 2009 - 2:44 am

Applied it to tip:out-of-tree for testing, and have dropped the revert as 
well. Will let you know how it goes. (if you dont hear from me later today you 
an assume it's all fixed.)

Thanks,

	Ingo
--

From: Ingo Molnar
Date: Monday, December 28, 2009 - 5:01 am

Still a very similar looking crash (attached). I went for the plain revert in 
tip:out-of-tree again.

(Note that the system does not have this hardware, and that it's booted with 
the driver built-in. So the relevant codepath should be very simple. Config 
attached.)

	Ingo

[   27.447053] initcall compal_init+0x0/0xf7 returned -19 after 3 usecs
[   27.453409] calling  dell_wmi_init+0x0/0x129 @ 1
[   27.458255] PM: Adding info for No Bus:input3
[   27.462676] input: Dell WMI hotkeys as /class/input/input3
[   27.468179] BUG: unable to handle kernel NULL pointer dereference at 00000014
[   27.472165] IP: [<c1f26aa8>] wmi_install_notify_handler+0x28/0x80
[   27.472165] *pde = 00000000 
[   27.472165] Oops: 0000 [#1] PREEMPT SMP 
[   27.472165] last sysfs file: 
[   27.472165] 
[   27.472165] Pid: 1, comm: swapper Not tainted 2.6.33-rc2-tip-00212-g3c2365e-dirty #3297 A8N-E/System Product Name
[   27.472165] EIP: 0060:[<c1f26aa8>] EFLAGS: 00010282 CPU: 0
[   27.472165] EIP is at wmi_install_notify_handler+0x28/0x80
[   27.472165] EAX: fffffff4 EBX: c1f1e200 ECX: 22b612b0 EDX: c2b2d1e0
[   27.472165] ESI: 00000000 EDI: 00000001 EBP: f64b3f84 ESP: f64b3f78
[   27.472165]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   27.472165] Process swapper (pid: 1, ti=f64b3000 task=f64d0000 task.ti=f64b3000)
[   27.472165] Stack:
[   27.472165]  fffffff4 00000000 00000006 f64b3f98 c28d70d7 64a0689d 00000006 64a0689d
[   27.472165] <0> f64b3fc4 c100112b c255aa54 c28d6fb6 00000001 00000003 00000000 c28d6fb6
[   27.472165] <0> c294a50c c2882310 00000000 f64b3fd0 c28822fd c29493e8 f64b3fe4 c2882392
[   27.472165] Call Trace:
[   27.472165]  [<c28d70d7>] ? dell_wmi_init+0x121/0x129
[   27.472165]  [<c100112b>] ? do_one_initcall+0x2b/0x1c0
[   27.472165]  [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
[   27.472165]  [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
[   27.472165]  [<c2882310>] ? kernel_init+0x0/0xc9
[   27.472165]  [<c28822fd>] ? do_basic_setup+0x44/0x57
[   27.472165]  [<c2882392>] ? ...
From: Paul Rolland
Date: Monday, December 28, 2009 - 8:02 am

Hello,

On Mon, 28 Dec 2009 13:01:25 +0100

Looks like it is very similar to what I have when I run :
modprobe dell-wmi
on my Dell laptop (Vostro 1520).

Message from syslogd@tux at Dec 28 16:00:22 ...
 kernel:Oops: 0000 [#1] SMP 

Message from syslogd@tux at Dec 28 16:00:22 ...
 kernel:last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

Message from syslogd@tux at Dec 28 16:00:22 ...
 kernel:Stack:

Message from syslogd@tux at Dec 28 16:00:22 ...
 kernel:Call Trace:

Message from syslogd@tux at Dec 28 16:00:22 ...
 kernel:Code: 65 f8 48 89 f3 49 89 d4 48 85 f6 74 35 48 85 ff 74 30 48 8d 75 e8 e8 27 fd ff ff 48 8b 55 e8 b8 06 00 00 00 48 85 d2 74 09 b0 15 <48> 83 7a 30 00 74 20 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44 

Message from syslogd@tux at Dec 28 16:00:22 ...
 kernel:CR2: 0000000100000024
Dec 28 16:00:22 tux kernel: ACPI: WMI: Mapper loaded
Dec 28 16:00:22 tux kernel: input: Dell WMI hotkeys as /devices/virtual/input/input13
Dec 28 16:00:22 tux kernel: BUG: unable to handle kernel paging request at 0000000100000024
Dec 28 16:00:22 tux kernel: IP: [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi]
Dec 28 16:00:22 tux kernel: PGD 565f3067 PUD 0 
Dec 28 16:00:22 tux kernel: Oops: 0000 [#1] SMP 
Dec 28 16:00:22 tux kernel: last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
Dec 28 16:00:22 tux kernel: CPU 1 
Dec 28 16:00:22 tux kernel: Pid: 20215, comm: modprobe Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520
Dec 28 16:00:22 tux kernel: RIP: 0010:[<ffffffffa0003789>]  [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi]
Dec 28 16:00:22 tux kernel: RSP: 0018:ffff88009434bed8  EFLAGS: 00010202
Dec 28 16:00:22 tux kernel: RAX: 0000000000000015 RBX: ffffffffa014f0f0 RCX: 000000000000009d
Dec 28 16:00:22 tux kernel: RDX: 00000000fffffff4 RSI: ffff88009434be98 RDI: ffff88009434be8c
Dec 28 16:00:22 tux kernel: RBP: ffff88009434bef8 R08: ffff88009434be88 R09: 0000000000000010
Dec 28 16:00:22 tux kernel: R10: ...
From: Paul Rolland
Date: Monday, December 28, 2009 - 9:15 am

Hello,

On Mon, 28 Dec 2009 16:02:12 +0100


I've fixed this one by changing wmi.c :

--- wmi.c       2009-12-28 17:11:52.000000000 +0100
+++ wmi.c.changed       2009-12-28 17:12:17.000000000 +0100
@@ -488,12 +488,13 @@
 {
        struct wmi_block *block;
        acpi_status status;
+       bool err;
 
        if (!guid || !handler)
                return AE_BAD_PARAMETER;
 
-       find_guid(guid, &block);
-       if (!block)
+       err = find_guid(guid, &block);
+       if (!err)
                return AE_NOT_EXIST;
 
        if (block->handler)

Signed-off-by: rol@as2917.net <Paul Rolland>

but now I have :
input: Dell WMI hotkeys as /devices/virtual/input/input13
dell-wmi: Unable to register notify handler - 6
sys_init_module: 'dell_wmi'->init suspiciously returned 6, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 2220, comm: modprobe Not tainted 2.6.33-rc2 #1
Call Trace:
 [<ffffffff8107450a>] sys_init_module+0x1da/0x250
 [<ffffffff810024ab>] system_call_fastpath+0x16/0x1b


Then, I tried to remove dell-wmi to fix the warning, and then I have :
BUG: unable to handle kernel NULL pointer dereference at 00000000000008b0
IP: [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
PGD bb506067 PUD bd170067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
CPU 1 
Pid: 3833, comm: rmmod Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520
RIP: 0010:[<ffffffffa024971b>]  [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
RSP: 0000:ffff8800bd3e7ea8  EFLAGS: 00010202
RAX: 000000000000000a RBX: 0000000000000880 RCX: 000000000000009d
RDX: 0000000000000000 RSI: ffff8800bd3e7e68 RDI: 0000000000000880
RBP: ffff8800bd3e7eb8 R08: ffff8800bd3e7e58 R09: 0000000000000010
R10: ffff8800bd3e7eb0 R11: 0000000000000006 R12: ffffffffa0251780
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f628c4c46f0(0000) GS:ffff880028300000(0000) ...
From: Paul Rolland
Date: Monday, December 28, 2009 - 9:53 am

Fixed this completely with the following two patches on top of 2.6.33-rc2 :

--- wmi.c.orig  2009-12-28 17:27:15.000000000 +0100
+++ wmi.c       2009-12-28 17:39:01.000000000 +0100
@@ -488,12 +488,13 @@
 {
        struct wmi_block *block;
        acpi_status status;
+       bool err;
 
        if (!guid || !handler)
                return AE_BAD_PARAMETER;
 
-       find_guid(guid, &block);
-       if (!block)
+       err = find_guid(guid, &block);
+       if (!err)
                return AE_NOT_EXIST;
 
        if (block->handler)
@@ -517,12 +518,13 @@
 {
        struct wmi_block *block;
        acpi_status status;
+       bool err;
 
        if (!guid)
                return AE_BAD_PARAMETER;
 
-       find_guid(guid, &block);
-       if (!block)
+       err = find_guid(guid, &block);
+       if (!err)
                return AE_NOT_EXIST;
 
        if (!block->handler)

--- dell-wmi.c.orig     2009-12-28 17:49:09.000000000 +0100
+++ dell-wmi.c  2009-12-28 17:41:36.000000000 +0100
@@ -343,7 +343,7 @@
                printk(KERN_ERR
                        "dell-wmi: Unable to register notify handler - %d\n",
                        err);
-               return err;
+               return -err;
        }
 
        return 0;


Signed-off-by: rol@as2917.net <Paul Rolland>

Paul
--

From: Dmitry Torokhov
Date: Monday, December 28, 2009 - 1:17 pm

Hi Paul,


This still leaks AE_* error codes to the upper layers which do not care
for them and expect the errors from Exxxx namespace.

I wonder why wmi is not using standard error codes but instead decided
to come up with it's own.

-- 
Dmitry
--

From: Len Brown
Date: Tuesday, December 29, 2009 - 11:14 pm

Unfortunately, find_guid() returns 1 for success
and 0 for failure, making this code look backwards.

I've applied this, but changed it to if (!find_guid())

This change is not quite correct,
but the correct fix is already in the acpi-test tree in the patch
"dell-wmi: sys_init_module: 'dell_wmi'->init suspiciously returned 21,
it should follow 0/-E convention"

I'll reply with both of the patches in a sec.
Please let me know if they are sufficient to fix your system.

thanks,
--

From: Paul Rolland
Date: Wednesday, December 30, 2009 - 12:13 am

Hi Len,

On Wed, 30 Dec 2009 01:14:48 -0500 (EST)


Yes, Dmitry sent a mail indicating that yesterday. I must admit I've been

They are. You can add a :
Tested-by: Paul Rolland <rol@as2917.net>
for what it matters ;)

Paul
--

From: Len Brown
Date: Tuesday, December 29, 2009 - 11:19 pm

From: rol@as2917.net <Paul Rolland>

Signed-off-by: rol@as2917.net <Paul Rolland>
Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/platform/x86/wmi.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
index 9f93d6c..cc9ad74 100644
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -492,8 +492,7 @@ wmi_notify_handler handler, void *data)
 	if (!guid || !handler)
 		return AE_BAD_PARAMETER;
 
-	find_guid(guid, &block);
-	if (!block)
+	if (!find_guid(guid, &block))
 		return AE_NOT_EXIST;
 
 	if (block->handler)
@@ -521,8 +520,7 @@ acpi_status wmi_remove_notify_handler(const char *guid)
 	if (!guid)
 		return AE_BAD_PARAMETER;
 
-	find_guid(guid, &block);
-	if (!block)
+	if (!find_guid(guid, &block))
 		return AE_NOT_EXIST;
 
 	if (!block->handler)
-- 
1.6.6.rc4.11.g129a5

--


From: Len Brown <len.brown@intel.com>

wmi_install_notify_handler() returns an acpi_error,
but dell_wmi_init() needs return a -errno style error.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/platform/x86/dell-wmi.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..4c7e702 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -323,6 +323,7 @@ static int __init dell_wmi_input_setup(void)
 static int __init dell_wmi_init(void)
 {
 	int err;
+	acpi_status status;
 
 	if (wmi_has_guid(DELL_EVENT_GUID)) {
 		printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
@@ -336,14 +337,14 @@ static int __init dell_wmi_init(void)
 	if (err)
 		return err;
 
-	err = wmi_install_notify_handler(DELL_EVENT_GUID,
+	status = wmi_install_notify_handler(DELL_EVENT_GUID,
 					 dell_wmi_notify, NULL);
-	if (err) {
+	if (ACPI_FAILURE(status)) {
 		input_unregister_device(dell_wmi_input_dev);
 		printk(KERN_ERR
 			"dell-wmi: Unable to register notify handler - %d\n",
-			err);
-		return err;
+			status);
+		return -ENODEV;
 	}
 
 	return 0;
-- 
1.6.6.rc4.11.g129a5


--

From: Miguel Calleja
Date: Friday, December 25, 2009 - 6:10 am

El Thu, 24 Dec 2009 14:00:24 -0800, Linus Torvalds escribió:

Hi.

I've upgraded from 2.6.32 and I get a blank screen when the system is 
starting KMS. My motherboard is an Intel DG43NB with onboard video. I 
traced back to the patch that stopped working on my system and it is 
patch-2.6.32-git7. patch-2.6.32-git6 works fine. From dmesg of git7 I see 
this:

i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
i915 0000:00:02.0: setting latency timer to 64
mtrr: type mismatch for c0000000,10000000 old: write-back new: write-
combining
[drm] MTRR allocation failed.  Graphics performance may suffer.
i915 0000:00:02.0: irq 28 for MSI/MSI-X
[drm] set up 31M of stolen space
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/drm_crtc_helper.c:1035 
drm_helper_initial_config+0x5e/0x60 [drm_kms_helper]()
Hardware name:         
No connectors reported connected with modes
Modules linked in: i915(+) snd_hda_codec bttv(+) v4l2_common videodev 
ir_common videobuf_dma_sg videobuf_core drm_kms_helper snd_pcm btcx_risc 
uhci_hcd i2c_i801 tveeprom drm snd_timer i2c_algo_bit snd cfbcopyarea 
ehci_hcd psmouse video i2c_core soundcore e1000e sr_mod intel_agp usbcore 
backlight 8139too output cfbimgblt cfbfillrect snd_page_alloc cdrom 
agpgart evdev thermal rtc_cmos rtc_core rtc_lib button
Pid: 628, comm: modprobe Not tainted 2.6.32-git7 #1
Call Trace:
 [<c103215e>] ? warn_slowpath_common+0x6e/0xb0
 [<f8875abe>] ? drm_helper_initial_config+0x5e/0x60 [drm_kms_helper]
 [<c10321eb>] ? warn_slowpath_fmt+0x2b/0x30
 [<f8875abe>] ? drm_helper_initial_config+0x5e/0x60 [drm_kms_helper]
 [<f89a3a91>] ? i915_driver_load+0x1321/0x14c0 [i915]
 [<f89a2750>] ? i915_vga_set_decode+0x0/0x20 [i915]
 [<f87b8eaf>] ? drm_get_dev+0x29f/0x470 [drm]
 [<c10db051>] ? sysfs_do_create_link+0xa1/0x140
 [<f89cdc15>] ? i915_pci_probe+0x0/0x130 [i915]
 [<c112d6db>] ? local_pci_probe+0xb/0x10
 [<c112df29>] ? pci_device_probe+0x69/0x90
 [<c118ffeb>] ? driver_probe_device+0x7b/0x170
 [<c112d848>] ? ...
From: Miguel Calleja
Date: Tuesday, December 29, 2009 - 2:50 am

I've bisected the kernel and have found that my problem is due to this 
commit:

commit fc816655236cd9da162356e96e74c7cfb0834d92

drm/i915: Don't set up HDMI ports that aren't in the BIOS device table.

author	Zhao Yakui <yakui.zhao@intel.com>	
	Tue, 24 Nov 2009 01:48:45 +0000 (09:48 +0800)
committer	Eric Anholt <eric@anholt.net>	
	Tue, 1 Dec 2009 00:41:48 +0000 (16:41 -0800)
Use the child device array to decide whether the given HDMI output should 
be
initialized. If the given HDMI port can't be found in child device array,
it is not present and won't be initialized.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>

diff --git a/drivers/gpu/drm/i915/intel_hdmi.c  b/drivers/gpu/drm/i915/
intel_hdmi.c
index c33451a..2ff5d03 100644 (file)
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -225,7 +225,52 @@ static const struct drm_encoder_funcs 
intel_hdmi_enc_funcs = {
        .destroy = intel_hdmi_enc_destroy,
 };
 
-
+/*
+ * Enumerate the child dev array parsed from VBT to check whether
+ * the given HDMI is present.
+ * If it is present, return 1.
+ * If it is not present, return false.
+ * If no child dev is parsed from VBT, it assumes that the given
+ * HDMI is present.
+ */
+int hdmi_is_present_in_vbt(struct drm_device *dev, int hdmi_reg)
+{
+       struct drm_i915_private *dev_priv = dev->dev_private;
+       struct child_device_config *p_child;
+       int i, hdmi_port, ret;
+
+       if (!dev_priv->child_dev_num)
+               return 1;
+
+       if (hdmi_reg == SDVOB)
+               hdmi_port = DVO_B;
+       else if (hdmi_reg == SDVOC)
+               hdmi_port = DVO_C;
+       else if (hdmi_reg == HDMIB)
+               hdmi_port = DVO_B;
+       else if (hdmi_reg == HDMIC)
+               hdmi_port = DVO_C;
+       else if (hdmi_reg == HDMID)
+               hdmi_port = DVO_D;
+       else
+               return 0;
+
+       ret = 0;
+       for (i = 0; i < ...
From: Rafael J. Wysocki
Date: Tuesday, December 29, 2009 - 7:01 am

I can reproduce this issue on one of my test boxes.

Rafael
--

From: Borislav Petkov
Date: Friday, December 25, 2009 - 1:00 pm

Hi,

the r8169 driver fails loading here with the following message:

[    0.353955] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    0.354258] r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    0.354391] r8169 0000:02:00.0: PCI INT A disabled
[    0.354527] r8169: probe of 0000:02:00.0 failed with error -22

Machine is Acer Aspire One, Atom N270 CPU.

Actually, the breakage seems to have appeared a bit earlier, sometime
between .32 and .33-rc1 as the bisection result shows:

ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Oct 26 13:20:44 2009 -0700

    PCI: determine CLS more intelligently
    
    Till now, CLS has been determined either by arch code or as
    L1_CACHE_BYTES.  Only x86 and ia64 set CLS explicitly and x86 doesn't
    always get it right.  On most configurations, the chance is that
    firmware configures the correct value during boot.
    
    This patch makes pci_init() determine CLS by looking at what firmware
    has configured.  It scans all devices and if all non-zero values
    agree, the value is used.  If none is configured or there is a
    disagreement, pci_dfl_cache_line_size is used.  arch can set the dfl
    value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
    override the actual one.
    
    ia64, x86 and sparc64 updated to set the default cls instead of the
    actual one.
    
    While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
    in pci.h and drop private declarations from arch code.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Acked-by: David Miller <davem@davemloft.net>
    Acked-by: Greg KH <gregkh@suse.de>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>

And since this result looks unrelated to my eye I did the whole
bisection search twice just to make sure. In ...
From: Borislav Petkov
Date: Friday, December 25, 2009 - 2:50 pm

Ok here's what happens: 

pci_apply_final_quirks() dumps on the console

[    0.369252] PCI: CLS 0 bytes, default 64

which means that it hasn't fallen back to setting the default cache line
size. Also, the call

pci_read_config_byte(dev, PCI_CACHE_LINE_SIZE, &tmp);

sets tmp = 0 and the following condition hits everytime

if (!cls)
         cls = tmp;
if (!tmp || cls == tmp)
         continue;

Which means that we never get around the set the default CLS.

The following dirty fix solves the issue on my machine:

--
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 7cfa7c3..9854c26 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2629,7 +2629,10 @@ static int __init pci_apply_final_quirks(void)
 	if (!pci_cache_line_size) {
 		printk(KERN_DEBUG "PCI: CLS %u bytes, default %u\n",
 		       cls << 2, pci_dfl_cache_line_size << 2);
-		pci_cache_line_size = cls;
+		if (!cls)
+			pci_cache_line_size = pci_dfl_cache_line_size;
+		else
+			pci_cache_line_size = cls;
 	}
 
 	return 0;


-- 
Regards/Gruss,
    Boris.
--

From: Jesse Barnes
Date: Friday, December 25, 2009 - 11:00 pm

On Fri, 25 Dec 2009 22:50:32 +0100

There should be a fix queued in my for-linus branch (I've been out this
week so haven't sent the pull req yet), can you give it a try and make
sure?

Jesse
--

From: Borislav Petkov
Date: Saturday, December 26, 2009 - 1:02 am

Yep, 2820f333e3b4ad96590093efbed7b3400bcf492b fixes it.

Thanks.

-- 
Regards/Gruss,
    Boris.
--

From: Borislav Petkov
Date: Saturday, December 26, 2009 - 2:36 am

Hi,

I'm getting the following warning upon resume.
Kernel is 2.6.33-rc2 + one unrelated PCI fix¹

------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:477 sysfs_add_one+0x107/0x121()
Hardware name: System Product Name
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:12.2/usb1/1-2/1-2:1.0/ep_81'
Modules linked in: binfmt_misc kvm_amd kvm powernow_k8 cpufreq_ondemand cpufreq_powersave cpufreq_userspace freq_table c
pufreq_conservative ipv6 vfat fat 8250_pnp 8250 pcspkr serial_core ohci_hcd k10temp edac_core
Pid: 279, comm: khubd Not tainted 2.6.33-rc2-00001-g6d7daec #1
Call Trace:
 [<ffffffff8103b010>] warn_slowpath_common+0x7c/0x94
 [<ffffffff8103b07f>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8112c448>] sysfs_add_one+0x107/0x121
 [<ffffffff8118b8c1>] ? kobject_add_internal+0x9e/0x1a0
 [<ffffffff8112c9d8>] create_dir+0x5d/0x98
 [<ffffffff8112ca50>] sysfs_create_dir+0x3d/0x50
 [<ffffffff813eb98b>] ? _raw_spin_unlock+0x35/0x52
 [<ffffffff8118b902>] kobject_add_internal+0xdf/0x1a0
 [<ffffffff8118ba99>] kobject_add_varg+0x41/0x50
 [<ffffffff8118bb63>] kobject_add+0x64/0x66
 [<ffffffff8118b79a>] ? kobject_get+0x1a/0x22
 [<ffffffff812930dc>] device_add+0xad/0x4e6
 [<ffffffff81195a7b>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff81293533>] device_register+0x1e/0x22
 [<ffffffff812ea5ae>] usb_create_ep_devs+0xfd/0x127
 [<ffffffff812e3604>] create_intf_ep_devs+0x50/0x6c
 [<ffffffff812e4a81>] usb_set_interface+0x228/0x23a
 [<ffffffff813eb938>] ? _raw_spin_unlock_irqrestore+0x5b/0x79
 [<ffffffff812de26d>] usb_reset_and_verify_device+0x412/0x497
 [<ffffffff812de577>] usb_port_resume+0x164/0x25b
 [<ffffffff813e9404>] ? mutex_unlock+0xe/0x10
 [<ffffffff812edc84>] generic_resume+0x1c/0x1e
 [<ffffffff812e5b1c>] usb_resume_device+0x3a/0x3c
 [<ffffffff812e61a9>] usb_resume_both+0x70/0x10a
 [<ffffffff812e6e6f>] usb_external_resume_device+0x39/0x77
 [<ffffffff812dfd51>] hub_thread+0x4eb/0xfc6
 [<ffffffff813eb8c0>] ? _raw_spin_unlock_irq+0x41/0x5e
 ...
From: Borislav Petkov
Date: Saturday, December 26, 2009 - 2:45 am

Hi,

this jumped into dmesg upon resume (.config and dmesg are attached in
the previous "EHCI resume sysfs duplicates..." message in this thread):

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.33-rc2-00001-g6d7daec #1
-------------------------------------------------------
Xorg/3076 is trying to acquire lock:
 (&dev->struct_mutex){+.+.+.}, at: [<ffffffff81223fd4>] drm_mmap+0x38/0x5c

but task is already holding lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff810b7509>] sys_mmap_pgoff+0xd6/0x1b4

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++++}:
       [<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
       [<ffffffff8106993c>] lock_acquire+0xf2/0x116
       [<ffffffff810bb2b5>] might_fault+0x95/0xb8
       [<ffffffff810e87d6>] filldir+0x75/0xd0
       [<ffffffff8112be2a>] sysfs_readdir+0x10f/0x149
       [<ffffffff810e895b>] vfs_readdir+0x6b/0xa8
       [<ffffffff810e8ae1>] sys_getdents+0x81/0xd1
       [<ffffffff810022f2>] system_call_fastpath+0x16/0x1b

-> #2 (sysfs_mutex){+.+.+.}:
       [<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
       [<ffffffff8106993c>] lock_acquire+0xf2/0x116
       [<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
       [<ffffffff8112c488>] sysfs_addrm_start+0x26/0x28
       [<ffffffff8112c940>] sysfs_remove_dir+0x52/0x8d
       [<ffffffff8118b6f9>] kobject_del+0x16/0x37
       [<ffffffff8118b758>] kobject_release+0x3e/0x66
       [<ffffffff8118c5b5>] kref_put+0x43/0x4d
       [<ffffffff8118b674>] kobject_put+0x47/0x4b
       [<ffffffff813e11c1>] cacheinfo_cpu_callback+0xa2/0xdb
       [<ffffffff8105c317>] notifier_call_chain+0x37/0x63
       [<ffffffff8105c3c7>] raw_notifier_call_chain+0x14/0x16
       [<ffffffff813d58ec>] _cpu_down+0x1a5/0x29a
       [<ffffffff8103c851>] disable_nonboot_cpus+0x74/0x10d
       [<ffffffff8107793e>] hibernation_snapshot+0x99/0x1d3
       ...

Hi


This output seems to suggest need to fix drm.



--


Actually, this painful dependency may technically be a bug in drm, but at 
the same time, it's really filldir() that makes it _very_ hard for certain 
locking scenarios because it has that callback to the VFS layer that takes 
the mmap_sem, and sysfs itself wants the sysfs_mutex in paths where it is 
not at all unreasonable to hold the mmap_sem.

We've seen it several times (yes, mostly with drm, but it's been seen with 
others too), and it's very annoying. It can be fixed by having very 
careful readdir implementations, but I really would blame sysfs in 
particular for having a very annoying lock reversal issue when used 
reasonably.

So the optimal situation would be for sysfs to not have that annoying lock 
dependency, and it would really have to be sysfs_readdir() that drops the 
sysfs_mutex around the filldir call (and that obviously implies having to 
re-validate and be really careful).

Added Eric and Greg to the cc, in case the sysfs people want to solve it.

The other alternative would be to fix this on a VFS layer by changing how 
'readdir/filldir' interacts, and instead make it fill in a kernel buffer - 
avoiding the mmap_sem issue entirely. And than later (in readdir) that 
kernel buffer could be copied to user space without holding any other 
locks.

I like the VFS approach because I think we could possibly use that as a 
first approach to eventually try to think about caching readdir() results 
at a VFS level - readdir() is currently the _only_ main filesystem 
callback that always calls into the low-level filesystem, and always takes 
a lot of locks. I'm adding Al to the Cc for that - he knows about this 
issue from me previously thinking aloud along these lines.

And yes, one option would be to just fix drm - by avoiding calling any 
sysfs functions while holding the mmap_lock (either in the mmap callback 
or the page fault paths). However, as mentioned, I really do think that 
the blame can be laid on sysfs for trying to be a nice generic interface, ...
From: Eric W. Biederman
Date: Wednesday, December 30, 2009 - 2:34 pm

Maybe.  The mnmap_sem has some interesting issues all of it's own.

There are scalability reasons for dropping the sysfs_mutex in sysfs_readdir
and I have some tenative patches for that.  I will take a look after I
come back from the holidays, in a couple of days.  I don't understand

Could be.  I have simplified the sysfs locking quite a bit this last
round.  I don't know if there is much more than corner cases left to
improve.

Eric

--


The details are in the original thread on lkml, but it boils down to 
basically (the below may not be the exact sequence, but it's close)

 - drm_mmap (called with mmap_sem) takes 'dev->struct_mutex' to protect 
   it's own device data (very reasonable)

 - drm_release takes 'dev->struct_mutex' again to protect its own data, 
   and calls "mtrr_del_page()" which ends up taking cpu_hotplug.lock.

   Again, that doesn't sound "wrong" in any way.

 - hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) ->  ..
   kref_put .. -> sysfs_addrm_start (sysfs_mutex)

   Again, nothing suspicious or "bad", and this part of the dependency 
   chain has nothing to do with the DRM code itself.

 - sysfs_readdir() (and this is the big problem) holds sysfs_mutex in its
   readdir implementation over the call to filldir. And filldir copies the 
   data to user space, so now you have sysfs_mutex -> mmap_sem.

See? None of the chains look bad. Except sysfs_readdir() obviously has 
that sysfs_mutex -> mmap_sem thing, which is _very_ annoying, because now 
you end up with a chain like

   mmap_sem -> dev->struct_mutex -> cpu_hotplug.lock -> sysfs_mutex -> mmap_sem

and I think you'll agree that of all the lock chains, the place to break 
the association is at sysfs_mutex. And the obvious place to break it would 

Ok, hopefully the above chain explains it to you, and also makes it clear 
that it's rather hard to break anywhere else, and it's not somebody else 
doing anything "obviously bogus".

Btw, the scalability issues with readdir() in general is why I'd be open 
to try to change the rules for filldir(), and always make it a kernel 
space copy. I suspect a number of users would like being able to use 
spinlocks over filldir, but it's currently impossible. 

However, we have a lot of filldir implementations (knfsd and several 
different system call interfaces), and while some of them already use 
kernel buffers (eg knfsd) others would need to allocate temporary storage ...
From: Eric W. Biederman
Date: Thursday, December 31, 2009 - 1:40 am

kobject_del with a lock held scares me.

There is a possible deadlock (that lockdep is ignorant of) if you hold
a lock over sysfs_deactivate() and if any sysfs file takes that lock.

I won't argue with a claim of inconvenient locking semantics here, and
this is different to the problem you are seeing (except that fixing this

I agree that fixing sysfs_readdir to not hold the sysfs_mutex over filldir
is useful to reduce the lock hold time if nothing else.

The cheap fix here is mostly a matter of grabbing a reference to the
sysfs_dirent and then revalidating that the reference is still useful
after we reacquire the sysfs_mutex.  If not we already have the code for
restarting from just an offset.  We just don't want to use it too much as
that will give us O(n^2) times for sysfs readdir.


We very definitely have an ABBA deadlock with sysfs_deactivate and the
cpu_hotplug.lock.  arch/x86/kernel/microcode_core.c:reload_store() is the
code for a sysfs file that when written to calls get_online_cpus().

Regardless of what we do with sysfs_readdir we need to see if we can
fix cpu_down(), to remove this nasty deadlock.

Eric
--


I would not object at _all_ if sysfs fixed the locking here instead of in 
filldir.

The problem is that releasing objects (with kref_put() and friends) is 
something that is _commonly_ done with various locks held.

Btw, that "cpu_down()" is by no means the only case. I would suggest you 
just google for

	sysfs_mutex lockdep

and you'll find a _lot_ of cases, most of them not involving drm at all, 
but ext4 and btrfs.

(Side note: almost all of them tend to _also_ have mmap_sem in the chain: 

I suspect that filldir is almost always implicated because mmap_sem is so 
hard to do just one way: both page faulting and mmap have it held, and so 
a lot of locks need to be gotten _after_ it, while filldir very often has 
the exact reverse requirement. 

So that's why filldir is kind of special (and the fundamental _reason_ it 
is special is exactly because pretty much all other VFS operations work 
with generic caches, and the actual filesystem only fills in the caches, 
it doesn't copy to user space directly while holding any locks - although 
ioctl's sometimes have the same issue as filldir for all the same 
reasons).

Anyway, I'm in _no_ way saying that you need to break it at filldir: the 
reason I pick on filldir is because I hate it, and think that it's a 
really annoying special case at the VFS level. But from a sysfs 
standpoint, I could well see that there are worse problems than that kind 
of annoying VFS problem.

So if you can break it at that kref_put layer (which leads to releasing a 
sysfs object etc), then that would be great. In fact, it would be better, 
since kref_put and friends are in many ways "more fundamental" than some 

Well, the _trivial_ fix is to just move the mutex_lock/unlock _inside_ the 
loop instead of of outside. Something like the appended might just work, 
and is the really stupid approach.

Totally untested. And it will do a _lot_ more sysfs mutex accesses, since 
now it will lock/unlock around each entry we return.

A smarter ...
From: Eric W. Biederman
Date: Friday, January 1, 2010 - 6:58 am

When sysfs_readdir stops short we now cache the next sysfs_dirent to
return to user space in filp->private_data.  There is no impact on the
rest of sysfs by doing this and in the common case it allows us to
pick up exactly where we left off with no seeking.

Additionally I drop and regrab the sysfs_mutex around filldir to avoid
a page fault arbitrarily increasing the hold time on the sysfs_mutex.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---

I haven't stressed this patch but it is sound in principle, and is a
general sysfs improvement, regardless of any locking issues.

 fs/sysfs/dir.c |   85 ++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 62 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..ad8a57e 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -827,19 +827,51 @@ static inline unsigned char dt_type(struct sysfs_dirent *sd)
 	return (sd->s_mode >> 12) & 15;
 }
 
+static int sysfs_dir_release(struct inode *inode, struct file *filp)
+{
+	sysfs_put(filp->private_data);
+	return 0;
+}
+
+static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
+	ino_t ino, struct sysfs_dirent *pos)
+{
+	if (pos) {
+		int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
+			pos->s_parent == parent_sd &&
+			ino == pos->s_ino;
+		sysfs_put(pos);
+		if (valid)
+			return pos;
+	}
+	pos = parent_sd->s_dir.children;
+	while (pos && (ino > pos->s_ino))
+		pos = pos->s_sibling;
+	return pos;
+}
+
+static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
+	ino_t ino, struct sysfs_dirent *pos)
+{
+	pos = sysfs_dir_pos(parent_sd, ino, pos);
+	if (pos)
+		pos = pos->s_sibling;
+	return pos;
+}
+
 static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
 	struct dentry *dentry = filp->f_path.dentry;
 	struct sysfs_dirent * parent_sd = dentry->d_fsdata;
-	struct sysfs_dirent *pos;
+	struct sysfs_dirent *pos = ...
From: Borislav Petkov
Date: Friday, January 1, 2010 - 8:33 am

I've slapped it ontop of v2.6.33-rc2-249-gcd6e125 here and the circular
locking warning is gone. I'll keep an eye on it in the next couple of
days, just in case.

Tested-by: Borislav Petkov <petkovbb@gmail.com>

-- 
Regards/Gruss,
    Boris.
--

From: Linus Torvalds
Date: Friday, January 1, 2010 - 11:56 am

These are all incorrect in the presense of 'lseek'. You can't do that

	if (!pos && "test filp->f_pos")

thing, because you get all the wrong results for both the case of an lseek 
before doing any readdir (which is undefined behavior, so I guess that's 
technically ok) _and_ for the 'lseek back to zero _after_ doing a readdir' 
case (which is _not_ undefined behavior!

It looks like it might be easy to fix by making a sysfs_llseek() function 
that does something like

	.. sysfs_llseek(..)
	{
		mutex_lock(&sysfs_mutex);
		sysfs_release();
		filp->private_data = NULL;
		mutex_unlock(&sysfs_mutex);

		return generic_file_llseek(..);
	}

or similar.  Except themn you'll need to change the EOF condition testing 
and turn it into a re-validation event. Or maybe do the re-validation in 
sysfs_llseek() itself, rather than just dropping the cached data.

Hmm? I haven't thought it through very deeply, so maybe I'm missing 
something.

		Linus
--

From: Eric W. Biederman
Date: Friday, January 1, 2010 - 3:43 pm

When sysfs_readdir stops short we now cache the next
sysfs_dirent to return to user space in filp->private_data.
There is no impact on the rest of sysfs by doing this and
in the common case it allows us to pick up exactly where
we left off with no seeking.

Additionally I drop and regrab the sysfs_mutex around
filldir to avoid a page fault abritrarily increasing the
hold time on the sysfs_mutex.

v2: Returned to using INT_MAX as the EOF condition.
    seekdir is ambiguous unless all directory entries have
    a unique f_pos value.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---

Linus good catch.

 fs/sysfs/dir.c |   82 +++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..5d88e30 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -827,11 +827,46 @@ static inline unsigned char dt_type(struct sysfs_dirent *sd)
 	return (sd->s_mode >> 12) & 15;
 }
 
+static int sysfs_dir_release(struct inode *inode, struct file *filp)
+{
+	sysfs_put(filp->private_data);
+	return 0;
+}
+
+static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
+	ino_t ino, struct sysfs_dirent *pos)
+{
+	if (pos) {
+		int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
+			pos->s_parent == parent_sd &&
+			ino == pos->s_ino;
+		sysfs_put(pos);
+		if (valid)
+			return pos;
+	}
+	pos = NULL;
+	if ((ino > 1) && (ino < INT_MAX)) {
+		pos = parent_sd->s_dir.children;
+		while (pos && (ino > pos->s_ino))
+			pos = pos->s_sibling;
+	}
+	return pos;
+}
+
+static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
+	ino_t ino, struct sysfs_dirent *pos)
+{
+	pos = sysfs_dir_pos(parent_sd, ino, pos);
+	if (pos)
+		pos = pos->s_sibling;
+	return pos;
+}
+
 static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
 	struct dentry *dentry = filp->f_path.dentry;
 	struct sysfs_dirent * parent_sd ...
From: Linus Torvalds
Date: Friday, January 1, 2010 - 4:10 pm

That

	mutex_lock(&sysfs_mutex);
	if (ret < 0)
		break;

looks just silly. We know 'pos' is non-NULL, so the break will effectively 
just be a "mutex_unlock + return 0", and we just did the mutex_lock, so 
why not instead do

	if (ret < 0)
		return 0;
	mutex_lock(&sysfs_mutex);

there?

Not that it really _matters_, but it seems way clearer, no?

But other than that mindless nit, I can't see anything wrong with your 
logic, and it looks ok to me from just reading the patch itself.

So I guess that's an "Ack", although I'd prefer it to get some more 
testing and perhaps go through Greg's tree as sysfs patches usually go.

And by "testing" I mean both the "yes, this second version also breaks the 
lockdep chain and avoids the warning", but also some kind of actual 
testing of /sysfs itself. If there is any.

		Linus
--

From: Greg KH
Date: Friday, January 1, 2010 - 10:59 pm

I'll queue this up in my tree on Monday to get it some testing.

thanks,

greg k-h
--

From: Borislav Petkov
Date: Saturday, January 2, 2010 - 8:40 am

Yes, it survived a couple of suspend/resume cycles so far.

-- 
Regards/Gruss,
    Boris.
--


I just sent you my sysfs filldir scalability patch, so we can take that
red-herring off the plate.

The problem as I see it is that kobject_del is convenient.
kobject_del waits until all of the sysfs show and store methods for
that kobject have stopped executing.  Which imposes the rule that
kobject_del can not be called with any locks held that are taken in a
sysfs show or store method.  This is all invisible to lockdep as the
wait is done with a completion and not a lock.

Which unfortunately means fixing filldir only removes some noise from
the picture, and completely hides the problem from lockdep.


....

Looking at the case I am familiar with in the networking layer I think
I have stumbled on a way to sort out this locking problem.

Today the network layer effectively does:
rtnl_lock();
device_del(dev);
rtnl_unlock();
kobject_put(dev);


sysfs_deactivate happens in the device_del(), but if we were to move
sysfs_deactivate into the final kobject_put then in theory we can
continue to block and be friendly but not need to be called with
locations where locks are held.

The core idea is to allow unlisting devices from sysfs under a lock
while still waiting for all users to complete after it is safe to
drop the lock.

Does that work for the cpu hotplug case?  Doing everything from
notifiers makes me suspect it will fail.


....


Either way we will need some lockdep warnings for sysfs_deactivate
so that the problem does not continue to hide and silently foul
things up.  So I will see if I can cook something.

Eric
--

From: Tejun Heo
Date: Friday, January 1, 2010 - 7:59 pm

Hello, Eric.


The synchronization against read/write ops is in sysfs_deactivate on
purpose so that drivers (most common users) don't have to worry about
sysfs ops accessing different parts of data structures once
device_del() is complete.  Implementing the exlusion at the driver
level is possible but not easy because some hardware devices are
represented with complex data structures, some of them are reused when
devices are exchanged and some sysfs ops end up accessing the
hardware.  So, it's often not possible to simply disassociate the data
structure and float it till the last reference goes away.  There needs
to be a synchronization point where the driver can tell that nothing
is accessing released data structure or hardware resource after it and

Nobody would know when that final put will actually happen.  In
progress sysfs ops might access the hardware after the hardware is

I don't think this is really relevant to the problem at hand but
adding lockdep annotations would definitely be beneficial, which BTW
is another reason to leave the synchronization in sysfs_deactivate as
the trade off is between deadlocks which can be detected somewhat
reliably with lockdep and scary race conditions which may involve
hardware in mysterious ways.

Thanks.

-- 
tejun
--

From: Eric W. Biederman
Date: Saturday, January 2, 2010 - 2:37 pm

Holding locks over device_del -> kobject_del -> sysfs_deactivate can
cause deadlocks if those same locks are grabbed in sysfs show or store
methods.

The I model s_active count + completion as a sleeping read/write lock.
I describe to lockdep sysfs_get_active as a read_trylock,
sysfs_put_active as a read_unlock, and sysfs_deactivate as a
write_lock and write_unlock pair.  This seems to capture the essence
for purposes of finding deadlocks, and in my testing gives finds real
issues and ignores non-issues.

This brings us back to holding locks over kobject_del is a problem
that ideally we should find a way of addressing, but at least lockdep
can tell us about the problems instead of requiring developers to debug
rare strange system deadlocks, that happen when sysfs files are removed
while being written to.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/sysfs/dir.c   |   14 ++++++++++++--
 fs/sysfs/sysfs.h |   15 +++++++++++++++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5d88e30..5c4703d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -106,8 +106,10 @@ static struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
 			return NULL;
 
 		t = atomic_cmpxchg(&sd->s_active, v, v + 1);
-		if (likely(t == v))
+		if (likely(t == v)) {
+			rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
 			return sd;
+		}
 		if (t < 0)
 			return NULL;
 
@@ -130,6 +132,7 @@ static void sysfs_put_active(struct sysfs_dirent *sd)
 	if (unlikely(!sd))
 		return;
 
+	rwsem_release(&sd->dep_map, 1, _RET_IP_);
 	v = atomic_dec_return(&sd->s_active);
 	if (likely(v != SD_DEACTIVATED_BIAS))
 		return;
@@ -194,15 +197,21 @@ static void sysfs_deactivate(struct sysfs_dirent *sd)
 	BUG_ON(sd->s_sibling || !(sd->s_flags & SYSFS_FLAG_REMOVED));
 	sd->s_sibling = (void *)&wait;
 
+	rwsem_acquire(&sd->dep_map, 0, 0, _RET_IP_);
 	/* atomic_add_return() is a mb(), put_active() will always see
 	 * the updated ...
From: Tejun Heo
Date: Saturday, January 2, 2010 - 5:02 pm

Looks good to me.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks for doing this.

-- 
tejun
--

From: Ming Lei
Date: Sunday, January 17, 2010 - 9:26 am

On Sat, 02 Jan 2010 13:37:12 -0800

The model has hit a possible deadlock in pcmcia, and the lockdep warning
comes when I unplug my wlan card from pcmcia slot.

Looks like socket->skt_mutex is held in remove path, and it is also
grabbed in .stor method.


[ 9207.755883] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0
[ 9207.786583] 
[ 9207.786586] =======================================================
[ 9207.786595] [ INFO: possible circular locking dependency detected ]
[ 9207.786602] 2.6.33-rc4-wl #8
[ 9207.786607] -------------------------------------------------------
[ 9207.786614] pccardd/841 is trying to acquire lock:
[ 9207.786620]  (s_active){++++.+}, at: [<ffffffff811637f1>] sysfs_addrm_finish+0x36/0x55
[ 9207.786643] 
[ 9207.786645] but task is already holding lock:
[ 9207.786651]  (&socket->skt_mutex){+.+.+.}, at: [<ffffffff812ed04c>] pccardd+0x15d/0x25f
[ 9207.786669] 
[ 9207.786671] which lock already depends on the new lock.
[ 9207.786674] 
[ 9207.786679] 
[ 9207.786680] the existing dependency chain (in reverse order) is:
[ 9207.786687] 
[ 9207.786688] -> #1 (&socket->skt_mutex){+.+.+.}:
[ 9207.786702]        [<ffffffff810796c0>] __lock_acquire+0xb73/0xd2b
[ 9207.786716]        [<ffffffff8107a36b>] lock_acquire+0xe1/0x105
[ 9207.786726]        [<ffffffff813b5ac5>] __mutex_lock_common+0x59/0x49d
[ 9207.786741]        [<ffffffff813b5fbe>] mutex_lock_nested+0x39/0x3e
[ 9207.786752]        [<ffffffff812ef3ed>] pccard_store_resource+0x6b/0xc5
[ 9207.786763]        [<ffffffff812a55da>] dev_attr_store+0x20/0x22
[ 9207.786775]        [<ffffffff8116259a>] sysfs_write_file+0x108/0x144
[ 9207.786787]        [<ffffffff8110cf48>] vfs_write+0xae/0x10b
[ 9207.786798]        [<ffffffff8110d065>] sys_write+0x4a/0x6e
[ 9207.786808]        [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
[ 9207.786822] 
[ 9207.786824] -> #0 (s_active){++++.+}:
[ 9207.786835]        [<ffffffff8107956a>] __lock_acquire+0xa1d/0xd2b
[ 9207.786847]        [<ffffffff8107a36b>] ...
From: Eric W. Biederman
Date: Sunday, January 17, 2010 - 10:18 am

Looking a little closer this is simultaneously a legitimate problem
and also a false positive.

This is only legitimate if you add/remove a cardbus bridge, plugged into
another cardbus bridge, which I think is unlikely but physically possible.

Eric
--

From: Dominik Brodowski
Date: Sunday, January 17, 2010 - 11:03 am

Hey,


Unfortunately, it is not a false positive, as removing a PCMCIA device
racing with "pccardctl eject" seems to trigger this path as well. Patch is
being prepared...

Best,
	Dominik
--

From: Eric W. Biederman
Date: Saturday, January 2, 2010 - 2:49 pm

Alright than that is a bad possible split of the functionality.  Which
is all I was suggesting splitting the functionality not doing away
with the wait or moving it to a point where the wait would not work.
It was simply my bad assumption that the final kobject_put would
happen before the module that controlled that kobject could be
removed.

I still think it might make sense to separate kobject_del into two
parts.  One that we call with the locks held and one without, but that
does seem to be applicable to only a very small set of cases and our
problems appear to be much larger than that.

For the moment I have generated a patch that does the lockdep
annotations, and I have found that a simple:

   find /sys -type f | xargs cat {} > /dev/null

trivially generates lockdep warnings.  In particular:

[  165.049042] 
[  165.049044] =======================================================
[  165.052761] [ INFO: possible circular locking dependency detected ]
[  165.052761] 2.6.33-rc2x86_64 #3
[  165.052761] -------------------------------------------------------
[  165.052761] cat/5026 is trying to acquire lock:
[  165.052761]  (&serio->drv_mutex){+.+.+.}, at: [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
[  165.052761] 
[  165.052761] but task is already holding lock:
[  165.089443]  (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
[  165.089443] 
[  165.089443] which lock already depends on the new lock.
[  165.089443] 
[  165.089443] 
[  165.089443] the existing dependency chain (in reverse order) is:
[  165.089443] 
[  165.089443] -> #1 (s_active){++++.+}:
[  165.089443]        [<ffffffff81054956>] validate_chain+0xa25/0xd1d
[  165.089443]        [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
[  165.089443]        [<ffffffff81056112>] lock_acquire+0x5a/0x74
[  165.089443]        [<ffffffff810e8202>] sysfs_addrm_finish+0xba/0x125
[  165.089443]        [<ffffffff810e68b0>] sysfs_hash_and_remove+0x4f/0x6b
[  165.089443]        ...
From: Tejun Heo
Date: Saturday, January 2, 2010 - 5:32 pm

Hello,


The module should stay around.  The severing is necessary to protect
driver internal data structures and possibly removed or reattached (to

If such separation is necessary, we can implement the split interface
while leaving kobject_del() as is feature-wise and convert the
offending ones to use the split interface but I think it would be
better to simply fix the offending ones if there aren't too many and


Ummm... read of an input sysfs node can trigger
serio_disconnect_port() under serio->drv_mutex, which unfortunately
would need to wait for completion of in-progress sysfs ops thus
creating possibility for AB-BA deadlock.  Dmitry, is it possible to
make serio_disconnect_port() asynchronous from the sysfs ops (ie. put
it in a work or something)?

Thanks.

-- 
tejun
--

From: Eric W. Biederman
Date: Saturday, January 2, 2010 - 7:06 pm

Removed driver hardware isn't something sysfs can really guard
against, although it can help to make the window of vulnerability
smaller.  Protecting driver internal data structures if we can does
seem reasonable.

The case I was thinking of in particular is when someone does:
"rmmod driver" I think device_del protects from the code going away

- We have the network stack.
  I have hacked around that (when I thought it was a singleton)
  by introducing the idiom:

	if (!rtnl_trylock())
		return restart_sysscall();

  But that isn't sustainable, as there is already one new entry that
  just does rntl_lock unconditionally.

  Maybe we can move the device_del out from under the rtnl_lock, but I
  have my doubts.  Certainly the proc and sysctl bits (which have the
  same issue look more difficult.

- We almost have an issue in ext4.
  Device_del is certainly called under lock_kernel() and lock_super().

- We have what a cpu_hotplug.lock issue with
  /sys/devices/system/cpu/cpuN/microcode/reload, a variant of the problem
  that triggered this discussion and it looks very non-trivial to solve.

So I'm not certain what to say except that we have longstanding problems.

Eric
--

From: Tejun Heo
Date: Saturday, January 2, 2010 - 10:01 pm

Hello,


It can't protect against removal itself per-se but it does give the
driver a boundary which it can depend on while implementing hot
unplugging.  Hardwares which support hot unplugging can cope with
surprise removal and has mechanisms to detect and handle them but
software part still is tricky and driver needs to have a boundary


Nope, that's protected by reference counting via fops and/or other

It's interesting that the above cases arn't common drivers.  AFAICS,
the problem cases would usually be cases like above where the user is
a rather complex software entity or drivers which implement some form
of self detaching via sysfs.  For the former group, I agree that
splitting deleting and draining (or simply skipping the draining part
or active reference counting both of which basically achieve the same
thing) would be an easy way out as it would be generally easy to leave
the data structures dangling till the references go away.

How about simply introducing an interface to mark sysfs nodes which
don't require active reference counting and using them on those nodes?

Thanks.

-- 
tejun
--

From: Eric W. Biederman
Date: Saturday, January 2, 2010 - 10:38 pm

That might work.  However it does not seem to address the case of
bond_sysfs, especially with someone doing rmmod bonding.

I think the brainstorm is on the right track.  I think we just need to look
at a few more cases in depth so that we can see a pattern and generalize
what can be done.

Eric
--

From: Tejun Heo
Date: Saturday, January 2, 2010 - 11:05 pm

Hello,


Ah... okay, now I remember this.  Yeah, I ripped off module ref
counting from sysfs ops.  I completely forgot about that and was
thinking we still had module ref counting on sysfs ops.  The logical
thing to do would be restoring module ref counting on sysfs ops which
won't go through active ref counting.  ie. Let the interface which
switch off active ref counting require module owner so that it uses
module ref counting instead of active ref counting.

Thanks.

-- 
tejun
--

From: Dmitry Torokhov
Date: Sunday, January 3, 2010 - 12:47 am

Read? I checked and I do not see where read would cause disconnect.
Also, disconnect only involves unbinding driver from the port, not the
destruction of the port itself (children may be destroyed but they have


I am not sure it is needed. Also in the trace presented
serio_disconnect_port() is called from kseriod which certainly does not
access sysfs...

Overall I am not concerned about lockdep bitching about serio because it
still bitches if you simply reload psmouse on a box with Synaptics with a
pass-through port even though there are nested annotations and it is
silent first time around.

Out of curiosity, do yo uknow what caused psmouse disconnect and what
kind of mouse is in the box?

-- 
Dmitry
--

From: Eric W. Biederman
Date: Sunday, January 3, 2010 - 3:57 am

This is a new lockdep annotation, and looking into it this appears to
be a true possible deadlock in the serio/sysfs interactions.

We have serio_pin_driver() called from all of the sysfs attributes
which does:
   mutex_lock(&serio->drv_mutex);

We have serio_disconnect_driver() called on an unplug which does:
   mutex_lock(&serio->drv_mutex);

The deadlock potential is if someone reads say the psmouse rate
sysfs file while the mouse is being unplugged.  There is a race
such that we can have:

						  sysfs_read_file()
                                                    fill_read_buffer()
						       sysfs_get_active_two()
							 psmouse_attr_show_helper()
                                        		   serio_pin_driver()
serio_disconnect_driver()		
  mutex_lock(&serio->drv_mutex);		
				<----------------->	   mutex_lock(&serio_drv_mutex);
    psmouse_disconnect()
      sysfs_remove_group(... psmouse_attr_group);
	....
	sysfs_deactivate(); 
	  wait_for_completion();


So it is unlikely but possible to deadlock by accessing a serio
attribute of a serio device that is being removed.

What to do about it is another question.   It has just recently come to my

It is a simple ps2mouse connected through a kvm, and the kvm was not
switched to the machine in question during the run.

I am trying to wrap my head around what to do with this sysfs_deactivate
deadlock scenario, (other drivers also hold unfortunate locks over the
removal of sysfs files,  and it just happens that the ps2mouse case was the first
one I reproduced), and it was interesting because I had not seen it before.

Eric
--

From: Eric W. Biederman
Date: Sunday, January 3, 2010 - 4:14 am

In the specific case of serio what gets us in trouble is
the call to sysfs_remove_group.

If instead of independent calls to sysfs_create_group/sysfs_remove_group,
you could move the groups into a list on dev->groups than we could solve
two problems.

- Userspace would see all of the attributes when the hotplug event is
  fired remove races.

- We would not hold serio->drv_mutex over sysfs_remove_group so there
  would not be a possible deadlock on device removal.

Does that change sound possible?

Eric


--

From: Dmitry Torokhov
Date: Monday, January 4, 2010 - 12:16 pm

No, because attributes in question belong to driver+device combo. The
device will not go away when driver is unbound but we do want to remove
attributes.

-- 
Dmitry
--

From: Dmitry Torokhov
Date: Monday, January 4, 2010 - 11:57 am

Hmm, I guess I was too quick dismissing lockdep complaints here. Now
that sysfs remove waits deadlock indeed is possible. Actually the locks
on serio->drv_mutex in attributes were added to make sure we don't
access device that was unbound from the driver through stale sysfs

I think we should simply not take serio->drv_mutex in attributes and use
driver-private mutex to serialize "set" methods that may alter device
state.

-- 
Dmitry
--

From: Eric W. Biederman
Date: Monday, January 4, 2010 - 12:43 pm

Cool.  So we have solved the problem generically but we have left over

Do you have any ideas what those might be?  It looks like we are only
talking about psmouse and atkbd.  So the audit for this chunk should
not be too bad.

The psmouse code already has a mutex on it's set operations only the
atkbd does not.  The atkbd code does do a driver stop/start, which is
similar (but race prone without the serio->drv_mutex).

Except for the lack of atkbd_enable/disable locking the patch below should
be good.  Opinions from someone who knows the serio code better than I do
would be helpful.

Eric


---

From: Eric W. Biederman <ebiederm@xmission.com>
Subject: [PATCH] serio:  Remove uneeded and deadlock prone serio_pin_driver

sysfs_remove_group waits for sysfs attributes to be removed
so we don't need to take a mutex in each of the attributes to
prevent remove while the code in the attribute is running.

This removes a theoretical deadlock possibility of a keyboard
or mouse hotplug and someone accessing a sysfs attribute.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/input/keyboard/atkbd.c     |   27 +--------------------------
 drivers/input/mouse/psmouse-base.c |   32 +++-----------------------------
 include/linux/serio.h              |   19 -------------------
 3 files changed, 4 insertions(+), 74 deletions(-)

diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c
index a357357..7200100 100644
--- a/drivers/input/keyboard/atkbd.c
+++ b/drivers/input/keyboard/atkbd.c
@@ -1234,22 +1234,8 @@ static ssize_t atkbd_attr_show_helper(struct device *dev, char *buf,
 				ssize_t (*handler)(struct atkbd *, char *))
 {
 	struct serio *serio = to_serio_port(dev);
-	int retval;
-
-	retval = serio_pin_driver(serio);
-	if (retval)
-		return retval;
 
-	if (serio->drv != &atkbd_drv) {
-		retval = -ENODEV;
-		goto out;
-	}
-
-	retval = handler((struct atkbd *)serio_get_drvdata(serio), ...
From: Dmitry Torokhov
Date: Monday, January 4, 2010 - 2:12 pm

Thanks Eric, this looks good. I'll add the missing mutex to atkbd and
apply. I think it can wait for .34 though - the window is quite small.

-- 
Dmitry
--

From: Tejun Heo
Date: Monday, January 4, 2010 - 4:09 pm

Hello,


This is way too cool.  Only if we can have more moments like this.  :-)

-- 
tejun
--

From: Eric W. Biederman
Date: Thursday, December 31, 2009 - 1:40 am

kobject_del with a lock held scares me.

There is a possible deadlock (that lockdep is ignorant of) if you hold
a lock over sysfs_deactivate() and if any sysfs file takes that lock.

I won't argue with a claim of inconvenient locking semantics here, and
this is different to the problem you are seeing (except that fixing this

I agree that fixing sysfs_readdir to not hold the sysfs_mutex over filldir
is useful to reduce the lock hold time if nothing else.

The cheap fix here is mostly a matter of grabbing a reference to the
sysfs_dirent and then revalidating that the reference is still useful
after we reacquire the sysfs_mutex.  If not we already have the code for
restarting from just an offset.  We just don't want to use it too much as
that will give us O(n^2) times for sysfs readdir.


We very definitely have an ABBA deadlock with sysfs_deactivate and the
cpu_hotplug.lock.  arch/x86/kernel/microcode_core.c:reload_store() is the
code for a sysfs file that when written to calls get_online_cpus().

Regardless of what we do with sysfs_readdir we need to see if we can
fix cpu_down(), to remove this nasty deadlock.

Eric
--

Previous thread: RE: Tmem [PATCH 0/5] (Take 3): Transcendent memory by Dan Magenheimer on Thursday, December 24, 2009 - 1:51 pm. (1 message)

Next thread: OOM killer unexpectedly called with kernel 2.6.32 by A. Boulan on Thursday, December 24, 2009 - 4:42 pm. (2 messages)