.. or wahetever it is you'll be celebrating today/tomorrow.
And if you aren't celebrating anything at all, but instead sitting in your
dark basement feeling lonely and bored, you can at least try out the
latest -rc kernel. Because it's better than moping around doing nothing.
And if you _are_ celebrating, but are having a bit of an overload of ham
(or are getting fed up with losing all your clothes in strip dreidel and
having people point and snigger at you, if that's what gets you through
the holidays), take a break, compile a kernel, and try it out for size.
Because while your pants may not fit you for the next month, the new
kernel might just fit your computer perfectly.
The dirstat says it all: mostly drivers (much of it staging updates,
especially as nouveau also counts as staging even if it's not physically
there). But a smattering of changes all over (scheduler, vfs, arch, kfifo
etc):
2.9% Documentation/
2.4% arch/arm/
3.4% arch/powerpc/
2.7% arch/x86/
9.4% arch/
9.8% drivers/gpu/drm/nouveau/
2.8% drivers/gpu/drm/radeon/
15.4% drivers/gpu/drm/
6.4% drivers/platform/x86/
12.1% drivers/staging/dst/
13.4% drivers/staging/sm7xx/
27.1% drivers/staging/
10.3% drivers/usb/serial/
11.9% drivers/usb/
69.3% drivers/
3.7% fs/
6.4% include/linux/
6.5% include/
3.4% kernel/
2.4% sound/pci/hda/
3.3% sound/
I'd have been happier with a smaller -rc2, but I've seen worse too, so I'm
not going to complain.
And now I'm off to peel potatoes. And then to eat them.
Have fun,
Linus
---
Al Viro (4):
fix braindamage in audit_tree.c untag_chunk()
fix more leaks in audit_tree.c tag_chunk()
Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3)
Sanitize f_flags helpers
Alan Cox (1):
jfs: Fix 32bit build warning
Alan Stern (5):
PM: Use pm_runtime_put_sync in system resume
PM: Runtime PM documentation update
USB: ...Today's -tip crashed during bootup on one of my testsystems: [ 28.616777] initcall eeepc_laptop_init+0x0/0x4d returned -19 after 29853 usecs [ 28.624002] calling dell_wmi_init+0x0/0x126 @ 1 [ 28.629128] device: 'input1': device_add [ 28.633113] PM: Adding info for No Bus:input1 [ 28.637678] input: Dell WMI hotkeys as /class/input/input1 [ 28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0) [ 28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014 [ 28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70 [ 28.654439] *pde = 00000000 [ 28.654439] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 28.654439] last sysfs file: [ 28.654439] Modules linked in: [ 28.654439] [ 28.654439] Pid: 1, comm: swapper Not tainted 2.6.33-rc2-tip-00158-g2bd999a-dirty #2238 A8N-E/System Product Name [ 28.654439] EIP: 0060:[<c17f7f21>] EFLAGS: 00010282 CPU: 0 [ 28.654439] EIP is at wmi_install_notify_handler+0x31/0x70 [ 28.654439] EAX: 00000006 EBX: c17f55d0 ECX: 0000009d EDX: fffffff4 [ 28.654439] ESI: 00000000 EDI: 00000000 EBP: f7052f90 ESP: f7052f84 [ 28.654439] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 28.654439] Process swapper (pid: 1, ti=f7052000 task=f7070000 task.ti=f7052000) [ 28.654439] Stack: [ 28.654439] fffffff4 00000000 00000006 f7052fa4 c2165992 aa66398a 00000006 aa66398a [ 28.654439] <0> f7052fd0 c1001122 c1d58799 c2165874 00000001 0000749d 00000000 c2165874 [ 28.654439] <0> c219a214 c21271d6 00000000 f7052fe4 c2127262 00000000 000000e0 00000000 [ 28.654439] Call Trace: [ 28.654439] [<c2165992>] ? dell_wmi_init+0x11e/0x126 [ 28.654439] [<c1001122>] ? do_one_initcall+0x32/0x1d0 [ 28.654439] [<c2165874>] ? dell_wmi_init+0x0/0x126 [ 28.654439] [<c2165874>] ? dell_wmi_init+0x0/0x126 [ 28.654439] [<c21271d6>] ? kernel_init+0x0/0xe0 [ 28.654439] [<c2127262>] ? kernel_init+0x8c/0xe0 [ 28.654439] [<c10033c6>] ? kernel_thread_helper+0x6/0x10 [ 28.654439] Code: 89 ...
Hmm, the patch is busted in one way, but it should not be crashing like
that still... I wonder what is going on. Still, the patch below should
help it a bit.
--
Dmitry
dell-wmi - fix condition to abort driver loading
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The commit 1fdd407f4e3f2ecb453954cbebb6c22491c61853 incorrectly made driver
abort loading when known GUID is present when it should have done exactly
the opposite.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
---
drivers/platform/x86/dell-wmi.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..c980782 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -324,7 +324,7 @@ static int __init dell_wmi_init(void)
{
int err;
- if (wmi_has_guid(DELL_EVENT_GUID)) {
+ if (!wmi_has_guid(DELL_EVENT_GUID)) {
printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
return -ENODEV;
}
--
> dell-wmi - fix condition to abort driver loading applied to acpi-test thanks, --
Rather than reverting the broken patch that caused wmi to load, does applying this patch to deal with the broken error handling cause the oops to go away? thanks, -Len From d11e073ee3e3091d9190dace97ce480e960cca1b Mon Sep 17 00:00:00 2001 From: Len Brown <len.brown@intel.com> Date: Fri, 25 Dec 2009 23:14:26 -0500 Subject: [PATCH] Revert "wmi: Free the allocated acpi objects through wmi_get_event_data" X-Patchwork-Hint: ignore This reverts commit 3e9b988e4edf065d39c1343937f717319b1c1065. Reported-by: Sedat Dilek <sedat.dilek@googlemail.com> Tested-by Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com> --- drivers/platform/x86/dell-wmi.c | 1 - drivers/platform/x86/hp-wmi.c | 2 -- drivers/platform/x86/wmi.c | 4 ++-- 3 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c index 916ccb2..46244c6 100644 --- a/drivers/platform/x86/dell-wmi.c +++ b/drivers/platform/x86/dell-wmi.c @@ -238,7 +238,6 @@ static void dell_wmi_notify(u32 value, void *context) input_sync(dell_wmi_input_dev); } } - kfree(obj); } diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c index 8781d8f..222ab57 100644 --- a/drivers/platform/x86/hp-wmi.c +++ b/drivers/platform/x86/hp-wmi.c @@ -388,8 +388,6 @@ static void hp_wmi_notify(u32 value, void *context) } else printk(KERN_INFO "HP WMI: Unknown key pressed - %x\n", eventcode); - - kfree(obj); } static int __init hp_wmi_input_setup(void) diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c index 9f93d6c..e425a86 100644 --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -540,8 +540,8 @@ EXPORT_SYMBOL_GPL(wmi_remove_notify_handler); /** * wmi_get_event_data - Get WMI data associated with an event * - * @event: Event to find - * @out: Buffer to hold event data. out->pointer should be freed with kfree() + * @event - Event ...
These kfree's look correct, assuming we properly check
the return status. So perhaps instead you can test
the patch below?
thanks,
-Len
From 5caa3ab36da77d59017cff9b9d1e910862b489e7 Mon Sep 17 00:00:00 2001
Message-Id: <5caa3ab36da77d59017cff9b9d1e910862b489e7.1261887124.git.len.brown@intel.com>
In-Reply-To: <51b0f1c2b8c32ee44ff01ef74599a1f17e4fc565.1261887124.git.len.brown@intel.com>
References: <51b0f1c2b8c32ee44ff01ef74599a1f17e4fc565.1261887124.git.len.brown@intel.com>
From: Len Brown <len.brown@intel.com>
Date: Sat, 26 Dec 2009 23:02:24 -0500
Subject: [PATCH 3/3] dell-wmi, hp-wmi, msi-wmi: check wmi_get_event_data() return value
X-Patchwork-Hint: ignore
When acpi_evaluate_object() is passed ACPI_ALLOCATE_BUFFER,
the caller must kfree the returned buffer if AE_OK is returned.
The callers of wmi_get_event_data() pass ACPI_ALLOCATE_BUFFER,
and thus must check its return value before accessing
or kfree() on the buffer.
Signed-off-by: Len Brown <len.brown@intel.com>
---
drivers/platform/x86/dell-wmi.c | 7 ++++++-
drivers/platform/x86/hp-wmi.c | 7 ++++++-
drivers/platform/x86/msi-wmi.c | 7 ++++++-
3 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 4c7e702..500af8c 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -202,8 +202,13 @@ static void dell_wmi_notify(u32 value, void *context)
struct acpi_buffer response = { ACPI_ALLOCATE_BUFFER, NULL };
static struct key_entry *key;
union acpi_object *obj;
+ acpi_status status;
- wmi_get_event_data(value, &response);
+ status = wmi_get_event_data(value, &response);
+ if (status != AE_OK) {
+ printk(KERN_INFO "dell-wmi: bad event status 0x%x\n", status);
+ return;
+ }
obj = (union acpi_object *)response.pointer;
diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
index 18bf741..5b648f0 100644
--- a/drivers/platform/x86/hp-wmi.c
+++ ...Applied it to tip:out-of-tree for testing, and have dropped the revert as well. Will let you know how it goes. (if you dont hear from me later today you an assume it's all fixed.) Thanks, Ingo --
Still a very similar looking crash (attached). I went for the plain revert in tip:out-of-tree again. (Note that the system does not have this hardware, and that it's booted with the driver built-in. So the relevant codepath should be very simple. Config attached.) Ingo [ 27.447053] initcall compal_init+0x0/0xf7 returned -19 after 3 usecs [ 27.453409] calling dell_wmi_init+0x0/0x129 @ 1 [ 27.458255] PM: Adding info for No Bus:input3 [ 27.462676] input: Dell WMI hotkeys as /class/input/input3 [ 27.468179] BUG: unable to handle kernel NULL pointer dereference at 00000014 [ 27.472165] IP: [<c1f26aa8>] wmi_install_notify_handler+0x28/0x80 [ 27.472165] *pde = 00000000 [ 27.472165] Oops: 0000 [#1] PREEMPT SMP [ 27.472165] last sysfs file: [ 27.472165] [ 27.472165] Pid: 1, comm: swapper Not tainted 2.6.33-rc2-tip-00212-g3c2365e-dirty #3297 A8N-E/System Product Name [ 27.472165] EIP: 0060:[<c1f26aa8>] EFLAGS: 00010282 CPU: 0 [ 27.472165] EIP is at wmi_install_notify_handler+0x28/0x80 [ 27.472165] EAX: fffffff4 EBX: c1f1e200 ECX: 22b612b0 EDX: c2b2d1e0 [ 27.472165] ESI: 00000000 EDI: 00000001 EBP: f64b3f84 ESP: f64b3f78 [ 27.472165] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 27.472165] Process swapper (pid: 1, ti=f64b3000 task=f64d0000 task.ti=f64b3000) [ 27.472165] Stack: [ 27.472165] fffffff4 00000000 00000006 f64b3f98 c28d70d7 64a0689d 00000006 64a0689d [ 27.472165] <0> f64b3fc4 c100112b c255aa54 c28d6fb6 00000001 00000003 00000000 c28d6fb6 [ 27.472165] <0> c294a50c c2882310 00000000 f64b3fd0 c28822fd c29493e8 f64b3fe4 c2882392 [ 27.472165] Call Trace: [ 27.472165] [<c28d70d7>] ? dell_wmi_init+0x121/0x129 [ 27.472165] [<c100112b>] ? do_one_initcall+0x2b/0x1c0 [ 27.472165] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129 [ 27.472165] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129 [ 27.472165] [<c2882310>] ? kernel_init+0x0/0xc9 [ 27.472165] [<c28822fd>] ? do_basic_setup+0x44/0x57 [ 27.472165] [<c2882392>] ? ...
Hello, On Mon, 28 Dec 2009 13:01:25 +0100 Looks like it is very similar to what I have when I run : modprobe dell-wmi on my Dell laptop (Vostro 1520). Message from syslogd@tux at Dec 28 16:00:22 ... kernel:Oops: 0000 [#1] SMP Message from syslogd@tux at Dec 28 16:00:22 ... kernel:last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq Message from syslogd@tux at Dec 28 16:00:22 ... kernel:Stack: Message from syslogd@tux at Dec 28 16:00:22 ... kernel:Call Trace: Message from syslogd@tux at Dec 28 16:00:22 ... kernel:Code: 65 f8 48 89 f3 49 89 d4 48 85 f6 74 35 48 85 ff 74 30 48 8d 75 e8 e8 27 fd ff ff 48 8b 55 e8 b8 06 00 00 00 48 85 d2 74 09 b0 15 <48> 83 7a 30 00 74 20 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44 Message from syslogd@tux at Dec 28 16:00:22 ... kernel:CR2: 0000000100000024 Dec 28 16:00:22 tux kernel: ACPI: WMI: Mapper loaded Dec 28 16:00:22 tux kernel: input: Dell WMI hotkeys as /devices/virtual/input/input13 Dec 28 16:00:22 tux kernel: BUG: unable to handle kernel paging request at 0000000100000024 Dec 28 16:00:22 tux kernel: IP: [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi] Dec 28 16:00:22 tux kernel: PGD 565f3067 PUD 0 Dec 28 16:00:22 tux kernel: Oops: 0000 [#1] SMP Dec 28 16:00:22 tux kernel: last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq Dec 28 16:00:22 tux kernel: CPU 1 Dec 28 16:00:22 tux kernel: Pid: 20215, comm: modprobe Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520 Dec 28 16:00:22 tux kernel: RIP: 0010:[<ffffffffa0003789>] [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi] Dec 28 16:00:22 tux kernel: RSP: 0018:ffff88009434bed8 EFLAGS: 00010202 Dec 28 16:00:22 tux kernel: RAX: 0000000000000015 RBX: ffffffffa014f0f0 RCX: 000000000000009d Dec 28 16:00:22 tux kernel: RDX: 00000000fffffff4 RSI: ffff88009434be98 RDI: ffff88009434be8c Dec 28 16:00:22 tux kernel: RBP: ffff88009434bef8 R08: ffff88009434be88 R09: 0000000000000010 Dec 28 16:00:22 tux kernel: R10: ...
Hello,
On Mon, 28 Dec 2009 16:02:12 +0100
I've fixed this one by changing wmi.c :
--- wmi.c 2009-12-28 17:11:52.000000000 +0100
+++ wmi.c.changed 2009-12-28 17:12:17.000000000 +0100
@@ -488,12 +488,13 @@
{
struct wmi_block *block;
acpi_status status;
+ bool err;
if (!guid || !handler)
return AE_BAD_PARAMETER;
- find_guid(guid, &block);
- if (!block)
+ err = find_guid(guid, &block);
+ if (!err)
return AE_NOT_EXIST;
if (block->handler)
Signed-off-by: rol@as2917.net <Paul Rolland>
but now I have :
input: Dell WMI hotkeys as /devices/virtual/input/input13
dell-wmi: Unable to register notify handler - 6
sys_init_module: 'dell_wmi'->init suspiciously returned 6, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 2220, comm: modprobe Not tainted 2.6.33-rc2 #1
Call Trace:
[<ffffffff8107450a>] sys_init_module+0x1da/0x250
[<ffffffff810024ab>] system_call_fastpath+0x16/0x1b
Then, I tried to remove dell-wmi to fix the warning, and then I have :
BUG: unable to handle kernel NULL pointer dereference at 00000000000008b0
IP: [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
PGD bb506067 PUD bd170067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
CPU 1
Pid: 3833, comm: rmmod Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520
RIP: 0010:[<ffffffffa024971b>] [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
RSP: 0000:ffff8800bd3e7ea8 EFLAGS: 00010202
RAX: 000000000000000a RBX: 0000000000000880 RCX: 000000000000009d
RDX: 0000000000000000 RSI: ffff8800bd3e7e68 RDI: 0000000000000880
RBP: ffff8800bd3e7eb8 R08: ffff8800bd3e7e58 R09: 0000000000000010
R10: ffff8800bd3e7eb0 R11: 0000000000000006 R12: ffffffffa0251780
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f628c4c46f0(0000) GS:ffff880028300000(0000) ...Fixed this completely with the following two patches on top of 2.6.33-rc2 :
--- wmi.c.orig 2009-12-28 17:27:15.000000000 +0100
+++ wmi.c 2009-12-28 17:39:01.000000000 +0100
@@ -488,12 +488,13 @@
{
struct wmi_block *block;
acpi_status status;
+ bool err;
if (!guid || !handler)
return AE_BAD_PARAMETER;
- find_guid(guid, &block);
- if (!block)
+ err = find_guid(guid, &block);
+ if (!err)
return AE_NOT_EXIST;
if (block->handler)
@@ -517,12 +518,13 @@
{
struct wmi_block *block;
acpi_status status;
+ bool err;
if (!guid)
return AE_BAD_PARAMETER;
- find_guid(guid, &block);
- if (!block)
+ err = find_guid(guid, &block);
+ if (!err)
return AE_NOT_EXIST;
if (!block->handler)
--- dell-wmi.c.orig 2009-12-28 17:49:09.000000000 +0100
+++ dell-wmi.c 2009-12-28 17:41:36.000000000 +0100
@@ -343,7 +343,7 @@
printk(KERN_ERR
"dell-wmi: Unable to register notify handler - %d\n",
err);
- return err;
+ return -err;
}
return 0;
Signed-off-by: rol@as2917.net <Paul Rolland>
Paul
--
Hi Paul, This still leaks AE_* error codes to the upper layers which do not care for them and expect the errors from Exxxx namespace. I wonder why wmi is not using standard error codes but instead decided to come up with it's own. -- Dmitry --
Unfortunately, find_guid() returns 1 for success and 0 for failure, making this code look backwards. I've applied this, but changed it to if (!find_guid()) This change is not quite correct, but the correct fix is already in the acpi-test tree in the patch "dell-wmi: sys_init_module: 'dell_wmi'->init suspiciously returned 21, it should follow 0/-E convention" I'll reply with both of the patches in a sec. Please let me know if they are sufficient to fix your system. thanks, --
Hi Len, On Wed, 30 Dec 2009 01:14:48 -0500 (EST) Yes, Dmitry sent a mail indicating that yesterday. I must admit I've been They are. You can add a : Tested-by: Paul Rolland <rol@as2917.net> for what it matters ;) Paul --
From: rol@as2917.net <Paul Rolland> Signed-off-by: rol@as2917.net <Paul Rolland> Signed-off-by: Len Brown <len.brown@intel.com> --- drivers/platform/x86/wmi.c | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c index 9f93d6c..cc9ad74 100644 --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -492,8 +492,7 @@ wmi_notify_handler handler, void *data) if (!guid || !handler) return AE_BAD_PARAMETER; - find_guid(guid, &block); - if (!block) + if (!find_guid(guid, &block)) return AE_NOT_EXIST; if (block->handler) @@ -521,8 +520,7 @@ acpi_status wmi_remove_notify_handler(const char *guid) if (!guid) return AE_BAD_PARAMETER; - find_guid(guid, &block); - if (!block) + if (!find_guid(guid, &block)) return AE_NOT_EXIST; if (!block->handler) -- 1.6.6.rc4.11.g129a5 --
From: Len Brown <len.brown@intel.com>
wmi_install_notify_handler() returns an acpi_error,
but dell_wmi_init() needs return a -errno style error.
Signed-off-by: Len Brown <len.brown@intel.com>
---
drivers/platform/x86/dell-wmi.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..4c7e702 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -323,6 +323,7 @@ static int __init dell_wmi_input_setup(void)
static int __init dell_wmi_init(void)
{
int err;
+ acpi_status status;
if (wmi_has_guid(DELL_EVENT_GUID)) {
printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
@@ -336,14 +337,14 @@ static int __init dell_wmi_init(void)
if (err)
return err;
- err = wmi_install_notify_handler(DELL_EVENT_GUID,
+ status = wmi_install_notify_handler(DELL_EVENT_GUID,
dell_wmi_notify, NULL);
- if (err) {
+ if (ACPI_FAILURE(status)) {
input_unregister_device(dell_wmi_input_dev);
printk(KERN_ERR
"dell-wmi: Unable to register notify handler - %d\n",
- err);
- return err;
+ status);
+ return -ENODEV;
}
return 0;
--
1.6.6.rc4.11.g129a5
--
El Thu, 24 Dec 2009 14:00:24 -0800, Linus Torvalds escribió: Hi. I've upgraded from 2.6.32 and I get a blank screen when the system is starting KMS. My motherboard is an Intel DG43NB with onboard video. I traced back to the patch that stopped working on my system and it is patch-2.6.32-git7. patch-2.6.32-git6 works fine. From dmesg of git7 I see this: i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 i915 0000:00:02.0: setting latency timer to 64 mtrr: type mismatch for c0000000,10000000 old: write-back new: write- combining [drm] MTRR allocation failed. Graphics performance may suffer. i915 0000:00:02.0: irq 28 for MSI/MSI-X [drm] set up 31M of stolen space ------------[ cut here ]------------ WARNING: at drivers/gpu/drm/drm_crtc_helper.c:1035 drm_helper_initial_config+0x5e/0x60 [drm_kms_helper]() Hardware name: No connectors reported connected with modes Modules linked in: i915(+) snd_hda_codec bttv(+) v4l2_common videodev ir_common videobuf_dma_sg videobuf_core drm_kms_helper snd_pcm btcx_risc uhci_hcd i2c_i801 tveeprom drm snd_timer i2c_algo_bit snd cfbcopyarea ehci_hcd psmouse video i2c_core soundcore e1000e sr_mod intel_agp usbcore backlight 8139too output cfbimgblt cfbfillrect snd_page_alloc cdrom agpgart evdev thermal rtc_cmos rtc_core rtc_lib button Pid: 628, comm: modprobe Not tainted 2.6.32-git7 #1 Call Trace: [<c103215e>] ? warn_slowpath_common+0x6e/0xb0 [<f8875abe>] ? drm_helper_initial_config+0x5e/0x60 [drm_kms_helper] [<c10321eb>] ? warn_slowpath_fmt+0x2b/0x30 [<f8875abe>] ? drm_helper_initial_config+0x5e/0x60 [drm_kms_helper] [<f89a3a91>] ? i915_driver_load+0x1321/0x14c0 [i915] [<f89a2750>] ? i915_vga_set_decode+0x0/0x20 [i915] [<f87b8eaf>] ? drm_get_dev+0x29f/0x470 [drm] [<c10db051>] ? sysfs_do_create_link+0xa1/0x140 [<f89cdc15>] ? i915_pci_probe+0x0/0x130 [i915] [<c112d6db>] ? local_pci_probe+0xb/0x10 [<c112df29>] ? pci_device_probe+0x69/0x90 [<c118ffeb>] ? driver_probe_device+0x7b/0x170 [<c112d848>] ? ...
I've bisected the kernel and have found that my problem is due to this
commit:
commit fc816655236cd9da162356e96e74c7cfb0834d92
drm/i915: Don't set up HDMI ports that aren't in the BIOS device table.
author Zhao Yakui <yakui.zhao@intel.com>
Tue, 24 Nov 2009 01:48:45 +0000 (09:48 +0800)
committer Eric Anholt <eric@anholt.net>
Tue, 1 Dec 2009 00:41:48 +0000 (16:41 -0800)
Use the child device array to decide whether the given HDMI output should
be
initialized. If the given HDMI port can't be found in child device array,
it is not present and won't be initialized.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/
intel_hdmi.c
index c33451a..2ff5d03 100644 (file)
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -225,7 +225,52 @@ static const struct drm_encoder_funcs
intel_hdmi_enc_funcs = {
.destroy = intel_hdmi_enc_destroy,
};
-
+/*
+ * Enumerate the child dev array parsed from VBT to check whether
+ * the given HDMI is present.
+ * If it is present, return 1.
+ * If it is not present, return false.
+ * If no child dev is parsed from VBT, it assumes that the given
+ * HDMI is present.
+ */
+int hdmi_is_present_in_vbt(struct drm_device *dev, int hdmi_reg)
+{
+ struct drm_i915_private *dev_priv = dev->dev_private;
+ struct child_device_config *p_child;
+ int i, hdmi_port, ret;
+
+ if (!dev_priv->child_dev_num)
+ return 1;
+
+ if (hdmi_reg == SDVOB)
+ hdmi_port = DVO_B;
+ else if (hdmi_reg == SDVOC)
+ hdmi_port = DVO_C;
+ else if (hdmi_reg == HDMIB)
+ hdmi_port = DVO_B;
+ else if (hdmi_reg == HDMIC)
+ hdmi_port = DVO_C;
+ else if (hdmi_reg == HDMID)
+ hdmi_port = DVO_D;
+ else
+ return 0;
+
+ ret = 0;
+ for (i = 0; i < ...I can reproduce this issue on one of my test boxes. Rafael --
Hi,
the r8169 driver fails loading here with the following message:
[ 0.353955] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[ 0.354258] r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[ 0.354391] r8169 0000:02:00.0: PCI INT A disabled
[ 0.354527] r8169: probe of 0000:02:00.0 failed with error -22
Machine is Acer Aspire One, Atom N270 CPU.
Actually, the breakage seems to have appeared a bit earlier, sometime
between .32 and .33-rc1 as the bisection result shows:
ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Mon Oct 26 13:20:44 2009 -0700
PCI: determine CLS more intelligently
Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right. On most configurations, the chance is that
firmware configures the correct value during boot.
This patch makes pci_init() determine CLS by looking at what firmware
has configured. It scans all devices and if all non-zero values
agree, the value is used. If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used. arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.
ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.
While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
And since this result looks unrelated to my eye I did the whole
bisection search twice just to make sure. In ...Ok here's what happens:
pci_apply_final_quirks() dumps on the console
[ 0.369252] PCI: CLS 0 bytes, default 64
which means that it hasn't fallen back to setting the default cache line
size. Also, the call
pci_read_config_byte(dev, PCI_CACHE_LINE_SIZE, &tmp);
sets tmp = 0 and the following condition hits everytime
if (!cls)
cls = tmp;
if (!tmp || cls == tmp)
continue;
Which means that we never get around the set the default CLS.
The following dirty fix solves the issue on my machine:
--
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 7cfa7c3..9854c26 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2629,7 +2629,10 @@ static int __init pci_apply_final_quirks(void)
if (!pci_cache_line_size) {
printk(KERN_DEBUG "PCI: CLS %u bytes, default %u\n",
cls << 2, pci_dfl_cache_line_size << 2);
- pci_cache_line_size = cls;
+ if (!cls)
+ pci_cache_line_size = pci_dfl_cache_line_size;
+ else
+ pci_cache_line_size = cls;
}
return 0;
--
Regards/Gruss,
Boris.
--
On Fri, 25 Dec 2009 22:50:32 +0100 There should be a fix queued in my for-linus branch (I've been out this week so haven't sent the pull req yet), can you give it a try and make sure? Jesse --
Yep, 2820f333e3b4ad96590093efbed7b3400bcf492b fixes it.
Thanks.
--
Regards/Gruss,
Boris.
--
Hi, I'm getting the following warning upon resume. Kernel is 2.6.33-rc2 + one unrelated PCI fix¹ ------------[ cut here ]------------ WARNING: at fs/sysfs/dir.c:477 sysfs_add_one+0x107/0x121() Hardware name: System Product Name sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:12.2/usb1/1-2/1-2:1.0/ep_81' Modules linked in: binfmt_misc kvm_amd kvm powernow_k8 cpufreq_ondemand cpufreq_powersave cpufreq_userspace freq_table c pufreq_conservative ipv6 vfat fat 8250_pnp 8250 pcspkr serial_core ohci_hcd k10temp edac_core Pid: 279, comm: khubd Not tainted 2.6.33-rc2-00001-g6d7daec #1 Call Trace: [<ffffffff8103b010>] warn_slowpath_common+0x7c/0x94 [<ffffffff8103b07f>] warn_slowpath_fmt+0x41/0x43 [<ffffffff8112c448>] sysfs_add_one+0x107/0x121 [<ffffffff8118b8c1>] ? kobject_add_internal+0x9e/0x1a0 [<ffffffff8112c9d8>] create_dir+0x5d/0x98 [<ffffffff8112ca50>] sysfs_create_dir+0x3d/0x50 [<ffffffff813eb98b>] ? _raw_spin_unlock+0x35/0x52 [<ffffffff8118b902>] kobject_add_internal+0xdf/0x1a0 [<ffffffff8118ba99>] kobject_add_varg+0x41/0x50 [<ffffffff8118bb63>] kobject_add+0x64/0x66 [<ffffffff8118b79a>] ? kobject_get+0x1a/0x22 [<ffffffff812930dc>] device_add+0xad/0x4e6 [<ffffffff81195a7b>] ? __raw_spin_lock_init+0x31/0x52 [<ffffffff81293533>] device_register+0x1e/0x22 [<ffffffff812ea5ae>] usb_create_ep_devs+0xfd/0x127 [<ffffffff812e3604>] create_intf_ep_devs+0x50/0x6c [<ffffffff812e4a81>] usb_set_interface+0x228/0x23a [<ffffffff813eb938>] ? _raw_spin_unlock_irqrestore+0x5b/0x79 [<ffffffff812de26d>] usb_reset_and_verify_device+0x412/0x497 [<ffffffff812de577>] usb_port_resume+0x164/0x25b [<ffffffff813e9404>] ? mutex_unlock+0xe/0x10 [<ffffffff812edc84>] generic_resume+0x1c/0x1e [<ffffffff812e5b1c>] usb_resume_device+0x3a/0x3c [<ffffffff812e61a9>] usb_resume_both+0x70/0x10a [<ffffffff812e6e6f>] usb_external_resume_device+0x39/0x77 [<ffffffff812dfd51>] hub_thread+0x4eb/0xfc6 [<ffffffff813eb8c0>] ? _raw_spin_unlock_irq+0x41/0x5e ...
Hi,
this jumped into dmesg upon resume (.config and dmesg are attached in
the previous "EHCI resume sysfs duplicates..." message in this thread):
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.33-rc2-00001-g6d7daec #1
-------------------------------------------------------
Xorg/3076 is trying to acquire lock:
(&dev->struct_mutex){+.+.+.}, at: [<ffffffff81223fd4>] drm_mmap+0x38/0x5c
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff810b7509>] sys_mmap_pgoff+0xd6/0x1b4
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&mm->mmap_sem){++++++}:
[<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff810bb2b5>] might_fault+0x95/0xb8
[<ffffffff810e87d6>] filldir+0x75/0xd0
[<ffffffff8112be2a>] sysfs_readdir+0x10f/0x149
[<ffffffff810e895b>] vfs_readdir+0x6b/0xa8
[<ffffffff810e8ae1>] sys_getdents+0x81/0xd1
[<ffffffff810022f2>] system_call_fastpath+0x16/0x1b
-> #2 (sysfs_mutex){+.+.+.}:
[<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
[<ffffffff8112c488>] sysfs_addrm_start+0x26/0x28
[<ffffffff8112c940>] sysfs_remove_dir+0x52/0x8d
[<ffffffff8118b6f9>] kobject_del+0x16/0x37
[<ffffffff8118b758>] kobject_release+0x3e/0x66
[<ffffffff8118c5b5>] kref_put+0x43/0x4d
[<ffffffff8118b674>] kobject_put+0x47/0x4b
[<ffffffff813e11c1>] cacheinfo_cpu_callback+0xa2/0xdb
[<ffffffff8105c317>] notifier_call_chain+0x37/0x63
[<ffffffff8105c3c7>] raw_notifier_call_chain+0x14/0x16
[<ffffffff813d58ec>] _cpu_down+0x1a5/0x29a
[<ffffffff8103c851>] disable_nonboot_cpus+0x74/0x10d
[<ffffffff8107793e>] hibernation_snapshot+0x99/0x1d3
...Hi This output seems to suggest need to fix drm. --
Actually, this painful dependency may technically be a bug in drm, but at the same time, it's really filldir() that makes it _very_ hard for certain locking scenarios because it has that callback to the VFS layer that takes the mmap_sem, and sysfs itself wants the sysfs_mutex in paths where it is not at all unreasonable to hold the mmap_sem. We've seen it several times (yes, mostly with drm, but it's been seen with others too), and it's very annoying. It can be fixed by having very careful readdir implementations, but I really would blame sysfs in particular for having a very annoying lock reversal issue when used reasonably. So the optimal situation would be for sysfs to not have that annoying lock dependency, and it would really have to be sysfs_readdir() that drops the sysfs_mutex around the filldir call (and that obviously implies having to re-validate and be really careful). Added Eric and Greg to the cc, in case the sysfs people want to solve it. The other alternative would be to fix this on a VFS layer by changing how 'readdir/filldir' interacts, and instead make it fill in a kernel buffer - avoiding the mmap_sem issue entirely. And than later (in readdir) that kernel buffer could be copied to user space without holding any other locks. I like the VFS approach because I think we could possibly use that as a first approach to eventually try to think about caching readdir() results at a VFS level - readdir() is currently the _only_ main filesystem callback that always calls into the low-level filesystem, and always takes a lot of locks. I'm adding Al to the Cc for that - he knows about this issue from me previously thinking aloud along these lines. And yes, one option would be to just fix drm - by avoiding calling any sysfs functions while holding the mmap_lock (either in the mmap callback or the page fault paths). However, as mentioned, I really do think that the blame can be laid on sysfs for trying to be a nice generic interface, ...
Maybe. The mnmap_sem has some interesting issues all of it's own. There are scalability reasons for dropping the sysfs_mutex in sysfs_readdir and I have some tenative patches for that. I will take a look after I come back from the holidays, in a couple of days. I don't understand Could be. I have simplified the sysfs locking quite a bit this last round. I don't know if there is much more than corner cases left to improve. Eric --
The details are in the original thread on lkml, but it boils down to basically (the below may not be the exact sequence, but it's close) - drm_mmap (called with mmap_sem) takes 'dev->struct_mutex' to protect it's own device data (very reasonable) - drm_release takes 'dev->struct_mutex' again to protect its own data, and calls "mtrr_del_page()" which ends up taking cpu_hotplug.lock. Again, that doesn't sound "wrong" in any way. - hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) -> .. kref_put .. -> sysfs_addrm_start (sysfs_mutex) Again, nothing suspicious or "bad", and this part of the dependency chain has nothing to do with the DRM code itself. - sysfs_readdir() (and this is the big problem) holds sysfs_mutex in its readdir implementation over the call to filldir. And filldir copies the data to user space, so now you have sysfs_mutex -> mmap_sem. See? None of the chains look bad. Except sysfs_readdir() obviously has that sysfs_mutex -> mmap_sem thing, which is _very_ annoying, because now you end up with a chain like mmap_sem -> dev->struct_mutex -> cpu_hotplug.lock -> sysfs_mutex -> mmap_sem and I think you'll agree that of all the lock chains, the place to break the association is at sysfs_mutex. And the obvious place to break it would Ok, hopefully the above chain explains it to you, and also makes it clear that it's rather hard to break anywhere else, and it's not somebody else doing anything "obviously bogus". Btw, the scalability issues with readdir() in general is why I'd be open to try to change the rules for filldir(), and always make it a kernel space copy. I suspect a number of users would like being able to use spinlocks over filldir, but it's currently impossible. However, we have a lot of filldir implementations (knfsd and several different system call interfaces), and while some of them already use kernel buffers (eg knfsd) others would need to allocate temporary storage ...
kobject_del with a lock held scares me. There is a possible deadlock (that lockdep is ignorant of) if you hold a lock over sysfs_deactivate() and if any sysfs file takes that lock. I won't argue with a claim of inconvenient locking semantics here, and this is different to the problem you are seeing (except that fixing this I agree that fixing sysfs_readdir to not hold the sysfs_mutex over filldir is useful to reduce the lock hold time if nothing else. The cheap fix here is mostly a matter of grabbing a reference to the sysfs_dirent and then revalidating that the reference is still useful after we reacquire the sysfs_mutex. If not we already have the code for restarting from just an offset. We just don't want to use it too much as that will give us O(n^2) times for sysfs readdir. We very definitely have an ABBA deadlock with sysfs_deactivate and the cpu_hotplug.lock. arch/x86/kernel/microcode_core.c:reload_store() is the code for a sysfs file that when written to calls get_online_cpus(). Regardless of what we do with sysfs_readdir we need to see if we can fix cpu_down(), to remove this nasty deadlock. Eric --
I would not object at _all_ if sysfs fixed the locking here instead of in filldir. The problem is that releasing objects (with kref_put() and friends) is something that is _commonly_ done with various locks held. Btw, that "cpu_down()" is by no means the only case. I would suggest you just google for sysfs_mutex lockdep and you'll find a _lot_ of cases, most of them not involving drm at all, but ext4 and btrfs. (Side note: almost all of them tend to _also_ have mmap_sem in the chain: I suspect that filldir is almost always implicated because mmap_sem is so hard to do just one way: both page faulting and mmap have it held, and so a lot of locks need to be gotten _after_ it, while filldir very often has the exact reverse requirement. So that's why filldir is kind of special (and the fundamental _reason_ it is special is exactly because pretty much all other VFS operations work with generic caches, and the actual filesystem only fills in the caches, it doesn't copy to user space directly while holding any locks - although ioctl's sometimes have the same issue as filldir for all the same reasons). Anyway, I'm in _no_ way saying that you need to break it at filldir: the reason I pick on filldir is because I hate it, and think that it's a really annoying special case at the VFS level. But from a sysfs standpoint, I could well see that there are worse problems than that kind of annoying VFS problem. So if you can break it at that kref_put layer (which leads to releasing a sysfs object etc), then that would be great. In fact, it would be better, since kref_put and friends are in many ways "more fundamental" than some Well, the _trivial_ fix is to just move the mutex_lock/unlock _inside_ the loop instead of of outside. Something like the appended might just work, and is the really stupid approach. Totally untested. And it will do a _lot_ more sysfs mutex accesses, since now it will lock/unlock around each entry we return. A smarter ...
When sysfs_readdir stops short we now cache the next sysfs_dirent to
return to user space in filp->private_data. There is no impact on the
rest of sysfs by doing this and in the common case it allows us to
pick up exactly where we left off with no seeking.
Additionally I drop and regrab the sysfs_mutex around filldir to avoid
a page fault arbitrarily increasing the hold time on the sysfs_mutex.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
I haven't stressed this patch but it is sound in principle, and is a
general sysfs improvement, regardless of any locking issues.
fs/sysfs/dir.c | 85 ++++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 62 insertions(+), 23 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..ad8a57e 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -827,19 +827,51 @@ static inline unsigned char dt_type(struct sysfs_dirent *sd)
return (sd->s_mode >> 12) & 15;
}
+static int sysfs_dir_release(struct inode *inode, struct file *filp)
+{
+ sysfs_put(filp->private_data);
+ return 0;
+}
+
+static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ if (pos) {
+ int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
+ pos->s_parent == parent_sd &&
+ ino == pos->s_ino;
+ sysfs_put(pos);
+ if (valid)
+ return pos;
+ }
+ pos = parent_sd->s_dir.children;
+ while (pos && (ino > pos->s_ino))
+ pos = pos->s_sibling;
+ return pos;
+}
+
+static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ pos = sysfs_dir_pos(parent_sd, ino, pos);
+ if (pos)
+ pos = pos->s_sibling;
+ return pos;
+}
+
static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct sysfs_dirent * parent_sd = dentry->d_fsdata;
- struct sysfs_dirent *pos;
+ struct sysfs_dirent *pos = ...I've slapped it ontop of v2.6.33-rc2-249-gcd6e125 here and the circular
locking warning is gone. I'll keep an eye on it in the next couple of
days, just in case.
Tested-by: Borislav Petkov <petkovbb@gmail.com>
--
Regards/Gruss,
Boris.
--
These are all incorrect in the presense of 'lseek'. You can't do that
if (!pos && "test filp->f_pos")
thing, because you get all the wrong results for both the case of an lseek
before doing any readdir (which is undefined behavior, so I guess that's
technically ok) _and_ for the 'lseek back to zero _after_ doing a readdir'
case (which is _not_ undefined behavior!
It looks like it might be easy to fix by making a sysfs_llseek() function
that does something like
.. sysfs_llseek(..)
{
mutex_lock(&sysfs_mutex);
sysfs_release();
filp->private_data = NULL;
mutex_unlock(&sysfs_mutex);
return generic_file_llseek(..);
}
or similar. Except themn you'll need to change the EOF condition testing
and turn it into a re-validation event. Or maybe do the re-validation in
sysfs_llseek() itself, rather than just dropping the cached data.
Hmm? I haven't thought it through very deeply, so maybe I'm missing
something.
Linus
--
When sysfs_readdir stops short we now cache the next
sysfs_dirent to return to user space in filp->private_data.
There is no impact on the rest of sysfs by doing this and
in the common case it allows us to pick up exactly where
we left off with no seeking.
Additionally I drop and regrab the sysfs_mutex around
filldir to avoid a page fault abritrarily increasing the
hold time on the sysfs_mutex.
v2: Returned to using INT_MAX as the EOF condition.
seekdir is ambiguous unless all directory entries have
a unique f_pos value.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
Linus good catch.
fs/sysfs/dir.c | 82 +++++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 60 insertions(+), 22 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..5d88e30 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -827,11 +827,46 @@ static inline unsigned char dt_type(struct sysfs_dirent *sd)
return (sd->s_mode >> 12) & 15;
}
+static int sysfs_dir_release(struct inode *inode, struct file *filp)
+{
+ sysfs_put(filp->private_data);
+ return 0;
+}
+
+static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ if (pos) {
+ int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
+ pos->s_parent == parent_sd &&
+ ino == pos->s_ino;
+ sysfs_put(pos);
+ if (valid)
+ return pos;
+ }
+ pos = NULL;
+ if ((ino > 1) && (ino < INT_MAX)) {
+ pos = parent_sd->s_dir.children;
+ while (pos && (ino > pos->s_ino))
+ pos = pos->s_sibling;
+ }
+ return pos;
+}
+
+static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ pos = sysfs_dir_pos(parent_sd, ino, pos);
+ if (pos)
+ pos = pos->s_sibling;
+ return pos;
+}
+
static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct sysfs_dirent * parent_sd ...That mutex_lock(&sysfs_mutex); if (ret < 0) break; looks just silly. We know 'pos' is non-NULL, so the break will effectively just be a "mutex_unlock + return 0", and we just did the mutex_lock, so why not instead do if (ret < 0) return 0; mutex_lock(&sysfs_mutex); there? Not that it really _matters_, but it seems way clearer, no? But other than that mindless nit, I can't see anything wrong with your logic, and it looks ok to me from just reading the patch itself. So I guess that's an "Ack", although I'd prefer it to get some more testing and perhaps go through Greg's tree as sysfs patches usually go. And by "testing" I mean both the "yes, this second version also breaks the lockdep chain and avoids the warning", but also some kind of actual testing of /sysfs itself. If there is any. Linus --
I'll queue this up in my tree on Monday to get it some testing. thanks, greg k-h --
Yes, it survived a couple of suspend/resume cycles so far.
--
Regards/Gruss,
Boris.
--
I just sent you my sysfs filldir scalability patch, so we can take that red-herring off the plate. The problem as I see it is that kobject_del is convenient. kobject_del waits until all of the sysfs show and store methods for that kobject have stopped executing. Which imposes the rule that kobject_del can not be called with any locks held that are taken in a sysfs show or store method. This is all invisible to lockdep as the wait is done with a completion and not a lock. Which unfortunately means fixing filldir only removes some noise from the picture, and completely hides the problem from lockdep. .... Looking at the case I am familiar with in the networking layer I think I have stumbled on a way to sort out this locking problem. Today the network layer effectively does: rtnl_lock(); device_del(dev); rtnl_unlock(); kobject_put(dev); sysfs_deactivate happens in the device_del(), but if we were to move sysfs_deactivate into the final kobject_put then in theory we can continue to block and be friendly but not need to be called with locations where locks are held. The core idea is to allow unlisting devices from sysfs under a lock while still waiting for all users to complete after it is safe to drop the lock. Does that work for the cpu hotplug case? Doing everything from notifiers makes me suspect it will fail. .... Either way we will need some lockdep warnings for sysfs_deactivate so that the problem does not continue to hide and silently foul things up. So I will see if I can cook something. Eric --
Hello, Eric. The synchronization against read/write ops is in sysfs_deactivate on purpose so that drivers (most common users) don't have to worry about sysfs ops accessing different parts of data structures once device_del() is complete. Implementing the exlusion at the driver level is possible but not easy because some hardware devices are represented with complex data structures, some of them are reused when devices are exchanged and some sysfs ops end up accessing the hardware. So, it's often not possible to simply disassociate the data structure and float it till the last reference goes away. There needs to be a synchronization point where the driver can tell that nothing is accessing released data structure or hardware resource after it and Nobody would know when that final put will actually happen. In progress sysfs ops might access the hardware after the hardware is I don't think this is really relevant to the problem at hand but adding lockdep annotations would definitely be beneficial, which BTW is another reason to leave the synchronization in sysfs_deactivate as the trade off is between deadlocks which can be detected somewhat reliably with lockdep and scary race conditions which may involve hardware in mysterious ways. Thanks. -- tejun --
Holding locks over device_del -> kobject_del -> sysfs_deactivate can
cause deadlocks if those same locks are grabbed in sysfs show or store
methods.
The I model s_active count + completion as a sleeping read/write lock.
I describe to lockdep sysfs_get_active as a read_trylock,
sysfs_put_active as a read_unlock, and sysfs_deactivate as a
write_lock and write_unlock pair. This seems to capture the essence
for purposes of finding deadlocks, and in my testing gives finds real
issues and ignores non-issues.
This brings us back to holding locks over kobject_del is a problem
that ideally we should find a way of addressing, but at least lockdep
can tell us about the problems instead of requiring developers to debug
rare strange system deadlocks, that happen when sysfs files are removed
while being written to.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/sysfs/dir.c | 14 ++++++++++++--
fs/sysfs/sysfs.h | 15 +++++++++++++++
2 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5d88e30..5c4703d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -106,8 +106,10 @@ static struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
return NULL;
t = atomic_cmpxchg(&sd->s_active, v, v + 1);
- if (likely(t == v))
+ if (likely(t == v)) {
+ rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
return sd;
+ }
if (t < 0)
return NULL;
@@ -130,6 +132,7 @@ static void sysfs_put_active(struct sysfs_dirent *sd)
if (unlikely(!sd))
return;
+ rwsem_release(&sd->dep_map, 1, _RET_IP_);
v = atomic_dec_return(&sd->s_active);
if (likely(v != SD_DEACTIVATED_BIAS))
return;
@@ -194,15 +197,21 @@ static void sysfs_deactivate(struct sysfs_dirent *sd)
BUG_ON(sd->s_sibling || !(sd->s_flags & SYSFS_FLAG_REMOVED));
sd->s_sibling = (void *)&wait;
+ rwsem_acquire(&sd->dep_map, 0, 0, _RET_IP_);
/* atomic_add_return() is a mb(), put_active() will always see
* the updated ...Looks good to me. Acked-by: Tejun Heo <tj@kernel.org> Thanks for doing this. -- tejun --
On Sat, 02 Jan 2010 13:37:12 -0800
The model has hit a possible deadlock in pcmcia, and the lockdep warning
comes when I unplug my wlan card from pcmcia slot.
Looks like socket->skt_mutex is held in remove path, and it is also
grabbed in .stor method.
[ 9207.755883] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0
[ 9207.786583]
[ 9207.786586] =======================================================
[ 9207.786595] [ INFO: possible circular locking dependency detected ]
[ 9207.786602] 2.6.33-rc4-wl #8
[ 9207.786607] -------------------------------------------------------
[ 9207.786614] pccardd/841 is trying to acquire lock:
[ 9207.786620] (s_active){++++.+}, at: [<ffffffff811637f1>] sysfs_addrm_finish+0x36/0x55
[ 9207.786643]
[ 9207.786645] but task is already holding lock:
[ 9207.786651] (&socket->skt_mutex){+.+.+.}, at: [<ffffffff812ed04c>] pccardd+0x15d/0x25f
[ 9207.786669]
[ 9207.786671] which lock already depends on the new lock.
[ 9207.786674]
[ 9207.786679]
[ 9207.786680] the existing dependency chain (in reverse order) is:
[ 9207.786687]
[ 9207.786688] -> #1 (&socket->skt_mutex){+.+.+.}:
[ 9207.786702] [<ffffffff810796c0>] __lock_acquire+0xb73/0xd2b
[ 9207.786716] [<ffffffff8107a36b>] lock_acquire+0xe1/0x105
[ 9207.786726] [<ffffffff813b5ac5>] __mutex_lock_common+0x59/0x49d
[ 9207.786741] [<ffffffff813b5fbe>] mutex_lock_nested+0x39/0x3e
[ 9207.786752] [<ffffffff812ef3ed>] pccard_store_resource+0x6b/0xc5
[ 9207.786763] [<ffffffff812a55da>] dev_attr_store+0x20/0x22
[ 9207.786775] [<ffffffff8116259a>] sysfs_write_file+0x108/0x144
[ 9207.786787] [<ffffffff8110cf48>] vfs_write+0xae/0x10b
[ 9207.786798] [<ffffffff8110d065>] sys_write+0x4a/0x6e
[ 9207.786808] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
[ 9207.786822]
[ 9207.786824] -> #0 (s_active){++++.+}:
[ 9207.786835] [<ffffffff8107956a>] __lock_acquire+0xa1d/0xd2b
[ 9207.786847] [<ffffffff8107a36b>] ...Looking a little closer this is simultaneously a legitimate problem and also a false positive. This is only legitimate if you add/remove a cardbus bridge, plugged into another cardbus bridge, which I think is unlikely but physically possible. Eric --
Hey, Unfortunately, it is not a false positive, as removing a PCMCIA device racing with "pccardctl eject" seems to trigger this path as well. Patch is being prepared... Best, Dominik --
Alright than that is a bad possible split of the functionality. Which
is all I was suggesting splitting the functionality not doing away
with the wait or moving it to a point where the wait would not work.
It was simply my bad assumption that the final kobject_put would
happen before the module that controlled that kobject could be
removed.
I still think it might make sense to separate kobject_del into two
parts. One that we call with the locks held and one without, but that
does seem to be applicable to only a very small set of cases and our
problems appear to be much larger than that.
For the moment I have generated a patch that does the lockdep
annotations, and I have found that a simple:
find /sys -type f | xargs cat {} > /dev/null
trivially generates lockdep warnings. In particular:
[ 165.049042]
[ 165.049044] =======================================================
[ 165.052761] [ INFO: possible circular locking dependency detected ]
[ 165.052761] 2.6.33-rc2x86_64 #3
[ 165.052761] -------------------------------------------------------
[ 165.052761] cat/5026 is trying to acquire lock:
[ 165.052761] (&serio->drv_mutex){+.+.+.}, at: [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
[ 165.052761]
[ 165.052761] but task is already holding lock:
[ 165.089443] (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
[ 165.089443]
[ 165.089443] which lock already depends on the new lock.
[ 165.089443]
[ 165.089443]
[ 165.089443] the existing dependency chain (in reverse order) is:
[ 165.089443]
[ 165.089443] -> #1 (s_active){++++.+}:
[ 165.089443] [<ffffffff81054956>] validate_chain+0xa25/0xd1d
[ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
[ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
[ 165.089443] [<ffffffff810e8202>] sysfs_addrm_finish+0xba/0x125
[ 165.089443] [<ffffffff810e68b0>] sysfs_hash_and_remove+0x4f/0x6b
[ 165.089443] ...Hello, The module should stay around. The severing is necessary to protect driver internal data structures and possibly removed or reattached (to If such separation is necessary, we can implement the split interface while leaving kobject_del() as is feature-wise and convert the offending ones to use the split interface but I think it would be better to simply fix the offending ones if there aren't too many and Ummm... read of an input sysfs node can trigger serio_disconnect_port() under serio->drv_mutex, which unfortunately would need to wait for completion of in-progress sysfs ops thus creating possibility for AB-BA deadlock. Dmitry, is it possible to make serio_disconnect_port() asynchronous from the sysfs ops (ie. put it in a work or something)? Thanks. -- tejun --
Removed driver hardware isn't something sysfs can really guard against, although it can help to make the window of vulnerability smaller. Protecting driver internal data structures if we can does seem reasonable. The case I was thinking of in particular is when someone does: "rmmod driver" I think device_del protects from the code going away - We have the network stack. I have hacked around that (when I thought it was a singleton) by introducing the idiom: if (!rtnl_trylock()) return restart_sysscall(); But that isn't sustainable, as there is already one new entry that just does rntl_lock unconditionally. Maybe we can move the device_del out from under the rtnl_lock, but I have my doubts. Certainly the proc and sysctl bits (which have the same issue look more difficult. - We almost have an issue in ext4. Device_del is certainly called under lock_kernel() and lock_super(). - We have what a cpu_hotplug.lock issue with /sys/devices/system/cpu/cpuN/microcode/reload, a variant of the problem that triggered this discussion and it looks very non-trivial to solve. So I'm not certain what to say except that we have longstanding problems. Eric --
Hello, It can't protect against removal itself per-se but it does give the driver a boundary which it can depend on while implementing hot unplugging. Hardwares which support hot unplugging can cope with surprise removal and has mechanisms to detect and handle them but software part still is tricky and driver needs to have a boundary Nope, that's protected by reference counting via fops and/or other It's interesting that the above cases arn't common drivers. AFAICS, the problem cases would usually be cases like above where the user is a rather complex software entity or drivers which implement some form of self detaching via sysfs. For the former group, I agree that splitting deleting and draining (or simply skipping the draining part or active reference counting both of which basically achieve the same thing) would be an easy way out as it would be generally easy to leave the data structures dangling till the references go away. How about simply introducing an interface to mark sysfs nodes which don't require active reference counting and using them on those nodes? Thanks. -- tejun --
That might work. However it does not seem to address the case of bond_sysfs, especially with someone doing rmmod bonding. I think the brainstorm is on the right track. I think we just need to look at a few more cases in depth so that we can see a pattern and generalize what can be done. Eric --
Hello, Ah... okay, now I remember this. Yeah, I ripped off module ref counting from sysfs ops. I completely forgot about that and was thinking we still had module ref counting on sysfs ops. The logical thing to do would be restoring module ref counting on sysfs ops which won't go through active ref counting. ie. Let the interface which switch off active ref counting require module owner so that it uses module ref counting instead of active ref counting. Thanks. -- tejun --
Read? I checked and I do not see where read would cause disconnect. Also, disconnect only involves unbinding driver from the port, not the destruction of the port itself (children may be destroyed but they have I am not sure it is needed. Also in the trace presented serio_disconnect_port() is called from kseriod which certainly does not access sysfs... Overall I am not concerned about lockdep bitching about serio because it still bitches if you simply reload psmouse on a box with Synaptics with a pass-through port even though there are nested annotations and it is silent first time around. Out of curiosity, do yo uknow what caused psmouse disconnect and what kind of mouse is in the box? -- Dmitry --
This is a new lockdep annotation, and looking into it this appears to
be a true possible deadlock in the serio/sysfs interactions.
We have serio_pin_driver() called from all of the sysfs attributes
which does:
mutex_lock(&serio->drv_mutex);
We have serio_disconnect_driver() called on an unplug which does:
mutex_lock(&serio->drv_mutex);
The deadlock potential is if someone reads say the psmouse rate
sysfs file while the mouse is being unplugged. There is a race
such that we can have:
sysfs_read_file()
fill_read_buffer()
sysfs_get_active_two()
psmouse_attr_show_helper()
serio_pin_driver()
serio_disconnect_driver()
mutex_lock(&serio->drv_mutex);
<-----------------> mutex_lock(&serio_drv_mutex);
psmouse_disconnect()
sysfs_remove_group(... psmouse_attr_group);
....
sysfs_deactivate();
wait_for_completion();
So it is unlikely but possible to deadlock by accessing a serio
attribute of a serio device that is being removed.
What to do about it is another question. It has just recently come to my
It is a simple ps2mouse connected through a kvm, and the kvm was not
switched to the machine in question during the run.
I am trying to wrap my head around what to do with this sysfs_deactivate
deadlock scenario, (other drivers also hold unfortunate locks over the
removal of sysfs files, and it just happens that the ps2mouse case was the first
one I reproduced), and it was interesting because I had not seen it before.
Eric
--
In the specific case of serio what gets us in trouble is the call to sysfs_remove_group. If instead of independent calls to sysfs_create_group/sysfs_remove_group, you could move the groups into a list on dev->groups than we could solve two problems. - Userspace would see all of the attributes when the hotplug event is fired remove races. - We would not hold serio->drv_mutex over sysfs_remove_group so there would not be a possible deadlock on device removal. Does that change sound possible? Eric --
No, because attributes in question belong to driver+device combo. The device will not go away when driver is unbound but we do want to remove attributes. -- Dmitry --
Hmm, I guess I was too quick dismissing lockdep complaints here. Now that sysfs remove waits deadlock indeed is possible. Actually the locks on serio->drv_mutex in attributes were added to make sure we don't access device that was unbound from the driver through stale sysfs I think we should simply not take serio->drv_mutex in attributes and use driver-private mutex to serialize "set" methods that may alter device state. -- Dmitry --
Cool. So we have solved the problem generically but we have left over
Do you have any ideas what those might be? It looks like we are only
talking about psmouse and atkbd. So the audit for this chunk should
not be too bad.
The psmouse code already has a mutex on it's set operations only the
atkbd does not. The atkbd code does do a driver stop/start, which is
similar (but race prone without the serio->drv_mutex).
Except for the lack of atkbd_enable/disable locking the patch below should
be good. Opinions from someone who knows the serio code better than I do
would be helpful.
Eric
---
From: Eric W. Biederman <ebiederm@xmission.com>
Subject: [PATCH] serio: Remove uneeded and deadlock prone serio_pin_driver
sysfs_remove_group waits for sysfs attributes to be removed
so we don't need to take a mutex in each of the attributes to
prevent remove while the code in the attribute is running.
This removes a theoretical deadlock possibility of a keyboard
or mouse hotplug and someone accessing a sysfs attribute.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/input/keyboard/atkbd.c | 27 +--------------------------
drivers/input/mouse/psmouse-base.c | 32 +++-----------------------------
include/linux/serio.h | 19 -------------------
3 files changed, 4 insertions(+), 74 deletions(-)
diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c
index a357357..7200100 100644
--- a/drivers/input/keyboard/atkbd.c
+++ b/drivers/input/keyboard/atkbd.c
@@ -1234,22 +1234,8 @@ static ssize_t atkbd_attr_show_helper(struct device *dev, char *buf,
ssize_t (*handler)(struct atkbd *, char *))
{
struct serio *serio = to_serio_port(dev);
- int retval;
-
- retval = serio_pin_driver(serio);
- if (retval)
- return retval;
- if (serio->drv != &atkbd_drv) {
- retval = -ENODEV;
- goto out;
- }
-
- retval = handler((struct atkbd *)serio_get_drvdata(serio), ...Thanks Eric, this looks good. I'll add the missing mutex to atkbd and apply. I think it can wait for .34 though - the window is quite small. -- Dmitry --
Hello, This is way too cool. Only if we can have more moments like this. :-) -- tejun --
kobject_del with a lock held scares me. There is a possible deadlock (that lockdep is ignorant of) if you hold a lock over sysfs_deactivate() and if any sysfs file takes that lock. I won't argue with a claim of inconvenient locking semantics here, and this is different to the problem you are seeing (except that fixing this I agree that fixing sysfs_readdir to not hold the sysfs_mutex over filldir is useful to reduce the lock hold time if nothing else. The cheap fix here is mostly a matter of grabbing a reference to the sysfs_dirent and then revalidating that the reference is still useful after we reacquire the sysfs_mutex. If not we already have the code for restarting from just an offset. We just don't want to use it too much as that will give us O(n^2) times for sysfs readdir. We very definitely have an ABBA deadlock with sysfs_deactivate and the cpu_hotplug.lock. arch/x86/kernel/microcode_core.c:reload_store() is the code for a sysfs file that when written to calls get_online_cpus(). Regardless of what we do with sysfs_readdir we need to see if we can fix cpu_down(), to remove this nasty deadlock. Eric --
