Re: Oops on -rc2-git1, possibly md_raid1 or xfs related. (Was: Re: Linux 2.6.26-rc2)

Previous thread: [PATCH][INFINIBAND]: Make ipath_portdata work with struct pid * not pid_t. by Pavel Emelyanov on Monday, May 12, 2008 - 7:43 am. (4 messages)

Next thread: [PATCH] NTP: Fix calculation of the next jiffie to trigger RTC sync by Maciej W. Rozycki on Monday, May 12, 2008 - 7:58 am. (1 message)
From: Linus Torvalds
Date: Monday, May 12, 2008 - 7:55 am

About 45% architecture updates (counting the include files too), about 30% 
drivers, and about 25% odds-and-ends. The odds-and-ends are mainly 
Documentation, filesystems (mostly cifs) and core kernel (scheduler 
updates etc).

The dirstat and shortlog is appended, because while not exactly tiny it 
should still fit easily in the lkml size limits. And if you read the 
shortlog and get the feeling that most of it is pretty boring small 
details, you'd be right. There is little exciting there.

A fairly small part of it, but quite possibly the most noticeable one, is 
how the semaphore changes impacted the BKL (the old "big kernel lock" that 
is still used for some legacy code, for you non-core people out there), 
which in the past had different versions ("regular", "preemptable").

A few months ago we dropped the regular BKL version, but in 2.6.25-rc1 we 
then had performance (and then correctness) issues with the interaction 
between the semaphore implementation and the preemptable BKL, so we're 
back to the old regular version for now.

Let's see if anybody notices. It looks likely that some latency issues 
have regressed since (cond_resched()), and we'll need to see what we can 
do about that whole thing.

			Linus

---
   3.2% Documentation/scheduler/
   5.1% Documentation/
   3.4% arch/arm/
   6.1% arch/blackfin/mach-common/
   7.2% arch/blackfin/
   3.3% arch/powerpc/
   7.5% arch/sh/boards/mpc1211/
   7.9% arch/sh/boards/
   3.0% arch/sh/kernel/
  11.2% arch/sh/
   3.7% arch/x86/
  35.3% arch/
   8.4% drivers/ata/
   3.3% drivers/infiniband/hw/ipath/
   5.0% drivers/infiniband/hw/
   5.2% drivers/infiniband/
   8.3% drivers/net/
   3.1% drivers/s390/cio/
   3.2% drivers/s390/
  30.3% drivers/
   4.6% fs/cifs/
   5.8% fs/
   8.1% include/asm-sh/mpc1211/
   8.4% include/asm-sh/
  12.2% include/
   7.6% kernel/


Adrian Bunk (10):
      udf: fs/udf/partition.c:udf_get_pblock() mustn't be inline
      x86: make start_secondary() static
      x86: ...
From: Bart Van Assche
Date: Monday, May 12, 2008 - 12:32 pm

On Mon, May 12, 2008 at 4:55 PM, Linus Torvalds

Hello Linus,

Sorry if it's my fault that I do not understand the above message
completely. But from the above it's not completely clear to me which
kernel versions (2.6.2?.? releases) are affected and which are not
affected by the performance and correctness issues due to the
interaction between the semaphore implementation and the preemptable
BKL. Can someone please be so kind to post the affected kernel
versions ? And whether or not this issue was triggered probably
depends on the CONFIG_... options with which the kernel was built ?

Bart.
--

From: Linus Torvalds
Date: Monday, May 12, 2008 - 12:55 pm

No released kernels are affected. It's purely a matter that has happened 
after 2.6.25. The semaphore simplifcation in -rc1 caused a huge 
performance regression on some benchmarks, and the fix to that in turn 
caused a semaphore correctness issue, so I just rolled back to the 
original BKL code that doesn't have any of those interactions.

In a historical context, the issues involved would only have happened with 
CONFIG_PREEMPT_BKL. That config option was made the only one in January, 
and as a result of these issues, we effectively switched it off.

So you can *think* of the effect of the changes as having gone from 
CONFIG_PREEMPT_BKL=y to CONFIG_PREEMPT_BKL=n, even though technically we 
had removed the actual config option to let people choose (so the config 
option has basically become a static code change).

We may end up having to re-instate the config option due to this. 
Personally, I hope not. It would be nicer if we could just avoid 
PREEMPT_BKL entirely. 

(To make things somewhat more confusing, some non-PREEMPT_BKL code has 
then bitrotted since, so if can actually see latency issues, you might 
want to try the patch here at the end of this email to see if it fixes 
the worst of them. "cond_resched()" has regressed since the PREEMPT_BKL 
config option went away).

			Linus
---
 include/linux/sched.h |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5a63f2d..75c284f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2038,17 +2038,10 @@ static inline int need_resched(void)
  * cond_resched_softirq() will enable bhs before scheduling.
  */
 extern int _cond_resched(void);
-#ifdef CONFIG_PREEMPT
-static inline int cond_resched(void)
-{
-	return 0;
-}
-#else
 static inline int cond_resched(void)
 {
 	return _cond_resched();
 }
-#endif
 extern int cond_resched_lock(spinlock_t * lock);
 extern int cond_resched_softirq(void);
 static inline int ...
From: Kasper Sandberg
Date: Monday, May 12, 2008 - 4:22 pm

uhm.. but .25 doesent have PREEMPT_BKL either.. does that mean its on or

you mean avoid preempting the bkl, or avoid having the option to choose

--

From: Kasper Sandberg
Date: Monday, May 12, 2008 - 6:49 pm

Just booted -rc2-git1, and got an oops(entier dmesg):
Linux version 2.6.26-rc2-git1 (root@quadstation) (gcc version 4.3.0
(Gentoo 4.3.0 p1.1) ) #1 SMP PREEMPT Tue May 13 03:32:40 CEST 2008
Command line: root=/dev/md1
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000cfee0000 (usable)
 BIOS-e820: 00000000cfee0000 - 00000000cfee3000 (ACPI NVS)
 BIOS-e820: 00000000cfee3000 - 00000000cfef0000 (ACPI data)
 BIOS-e820: 00000000cfef0000 - 00000000cff00000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000e4000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000130000000 (usable)
Entering add_active_range(0, 0, 158) 0 entries of 256 used
Entering add_active_range(0, 256, 851680) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1245184) 2 entries of 256 used
max_pfn_mapped = 1245184
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
init_memory_mapping
DMI 2.4 present.
ACPI: RSDP 000F6EB0, 0014 (r0 GBT   )
ACPI: RSDT CFEE3040, 0034 (r1 GBT    GBTUACPI 42302E31 GBTU  1010101)
ACPI: FACP CFEE30C0, 0074 (r1 GBT    GBTUACPI 42302E31 GBTU  1010101)
ACPI: DSDT CFEE3180, 4C2E (r1 GBT    GBTUACPI     1000 MSFT  100000C)
ACPI: FACS CFEE0000, 0040
ACPI: HPET CFEE7F00, 0038 (r1 GBT    GBTUACPI 42302E31 GBTU       98)
ACPI: MCFG CFEE7F80, 003C (r1 GBT    GBTUACPI 42302E31 GBTU  1010101)
ACPI: APIC CFEE7E00, 0084 (r1 GBT    GBTUACPI 42302E31 GBTU  1010101)
Entering add_active_range(0, 0, 158) 0 entries of 256 used
Entering add_active_range(0, 256, 851680) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1245184) 2 entries of 256 used
  early res: 0 [0-fff] BIOS data page
  early res: 1 [6000-7fff] TRAMPOLINE
  early res: 2 [200000-64dfb3] TEXT DATA BSS
  early res: 3 [9e800-fffff] BIOS reserved
  early res: 4 ...
From: Kasper Sandberg
Date: Monday, May 12, 2008 - 6:55 pm

Just a thought, could it be the x86 PAT? the reason i dont just test it,
is because i dont want to unnessecarily risk the data on my disks, if
you may easily be able to rule that out or something, given the call

--

Previous thread: [PATCH][INFINIBAND]: Make ipath_portdata work with struct pid * not pid_t. by Pavel Emelyanov on Monday, May 12, 2008 - 7:43 am. (4 messages)

Next thread: [PATCH] NTP: Fix calculation of the next jiffie to trigger RTC sync by Maciej W. Rozycki on Monday, May 12, 2008 - 7:58 am. (1 message)