Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

Previous thread: [PATCH] mmc: Disabler for Ricoh MMC controller by Philip Langdale on Monday, October 1, 2007 - 8:23 pm. (6 messages)

Next thread: [PATCH] Add Documentation/power/00-INDEX by Rob Landley on Monday, October 1, 2007 - 8:44 pm. (4 messages)
From: Linus Torvalds
Date: Monday, October 1, 2007 - 8:41 pm

I said I was hoping that -rc8 was the last -rc, and I hate doing this, but 
we've had more changes since -rc8 than we had in -rc8. And while most of 
them are pretty trivial, I really couldn't face doing a 2.6.23 release and 
take the risk of some really stupid brown-paper-bag thing.

So there's a final -rc out there, and right now my plan is to make this 
series really short, and release 2.6.23 in a few days. So please do give 
it a last good testing, and holler about any issues you find!

This is also a good time to warn about the fact that we're doing the x86 
merge very soon (as in the next day or two) after 2.6.23 is out, so if you 
have pending patches for the next series that touch arch/i386 or x86-64, 
you should get in touch with Thomas Gleixner and Ingo Molnar, who are the 
keepers of the merge scripts, and will help you prepare..

Doing it as early as possible in the 2.6.24-rc4 series (basically I'll do 
it first thing) will mean that we'll have the maximum amount of time to 
sort out any issues, and the thing is, Thomas and Ingo already have a tree 
ready to go, so people can check their work against that, and don't need 
to think that they have to do any fixups after it his *my* tree. It would 
be much better if everybody was just ready for it, and not taken by 
surprise.

In other words, people who know they may be affected and would want to 
prepare can look at (for example)

	git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86

and generally get ready for the switch-over. 

			Linus
-

From: Mel Gorman
Date: Tuesday, October 2, 2007 - 5:07 am

Dirt. Booting with "profile=sleep,2" is broken in 2.6.23-rc9 and
2.6.23-rc8 but working in 2.6.22. I was checking it out as part of a
discussion in another thread and noticed it broken in -mm as well
(2.6.23-rc8-mm2). Bisect is in progress but suggestions as to the prime
candidates are welcome or preferably, pointing out that I'm an idiot
because I missed twiddling some config change.

2.6.22 output
gringo:~# readprofile | sort -rn
 69604 total                                      0.0309
 27287 m_start                                  243.6339
 16430 sync_page                                205.3750
 13161 sync_buffer                              205.6406
  4035 sys_init_module                            0.6121
  2842 msleep                                    88.8125
  2573 call_usermodehelper_keys                  10.7208
  1554 ps2_sendbyte                               6.0703
   803 log_wait_commit                            2.7882
   378 do_lookup                                  0.9844
   160 do_get_write_access                        0.1111
    89 synchronize_rcu                            1.3906
    76 ps2_command                                0.0792
    66 ide_do_drive_cmd                           0.2292
    59 do_fork                                    0.1085
    54 congestion_wait                            0.3750
    29 __rtnl_unlock                              1.8125
     4 kthread                                    0.0357
     2 *unknown*
     2 journal_stop                               0.0038
     1 kthreadd                                   0.0035
     1 kthread_create                             0.0063

latest git output
gringo:~# readprofile
     0 *unknown*
     0 total                                      0.0000

I checked the obvious stuff like DEBUG options being set,

-

From: Ingo Molnar
Date: Tuesday, October 2, 2007 - 5:15 am

Mel, does the patch below fix this bug for you? (Note: you will need to 
enable CONFIG_SCHEDSTATS=y too.)

if yes, then Linus please pull this single fix from:

  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

  | Ingo Molnar (1):
  |      sched: fix profile=sleep
  |
  |  sched_fair.c |   10 ++++++++++
  |  1 file changed, 10 insertions(+)

risk is low: the new code only runs with CONFIG_SCHEDSTATS=y 
(default:off) and profile=sleep (default:off), so it ought to be fairly 
safe to add at this point. (and we had very similar code in v2.6.22 
anyway)

	Ingo

------------------------->
Subject: sched: fix profile=sleep
From: Ingo Molnar <mingo@elte.hu>

fix sleep profiling - we lost this chunk in the CFS merge.

Found-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -639,6 +639,16 @@ static void enqueue_sleeper(struct cfs_r
 
 		se->block_start = 0;
 		se->sum_sleep_runtime += delta;
+
+		/*
+		 * Blocking time is in units of nanosecs, so shift by 20 to
+		 * get a milliseconds-range estimation of the amount of
+		 * time that the task spent sleeping:
+		 */
+		if (unlikely(prof_on == SLEEP_PROFILING)) {
+			profile_hits(SLEEP_PROFILING, (void *)get_wchan(tsk),
+				     delta >> 20);
+		}
 	}
 #endif
 }
-

From: Mel Gorman
Date: Tuesday, October 2, 2007 - 10:21 am

Nice one Ingo - got it first try. The problem commit was
dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 and it's clear that the code removed
in this commit is put back by this latest patch.  When applied, profile=sleep
works as long as CONFIG_SCHEDSTAT is set.


Tested-by: Mel Gorman <mel@csn.ul.ie>

That said, I am not super-keen on this only working when SCHEDSTAT is set
without telling the user about it. It's not urgent enough to pick up as a
late-late fix but prehaps something like this?

=============

profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes the
limitation in Documentation/kernel-parameters.txt and prints a warning at
boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 Documentation/kernel-parameters.txt |    3 ++-
 kernel/profile.c                    |    5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt
--- linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt	2007-10-02 04:24:52.000000000 +0100
+++ linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt	2007-10-02 16:43:41.000000000 +0100
@@ -1395,7 +1395,8 @@ and is between 256 and 4096 characters. 
 			Param: "schedule" - profile schedule points.
 			Param: <number> - step/bucket size as a power of 2 for
 				statistical time based profiling.
-			Param: "sleep" - profile D-state sleeping (millisecs)
+			Param: "sleep" - profile D-state sleeping (millisecs).
+				Requires CONFIG_SCHEDSTATS to work
 
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/kernel/profile.c linux-2.6.23-rc9-010_document_profilesleep/kernel/profile.c
--- ...
From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 1:19 am

great - thanks for testing it. I'm glad you caught it as sleep=profile 
is pretty useful in "why is my system so slow" tests. (which problems 
are usually reported _after_ a stable kernel is released ...)

	Ingo
-

From: Bill Davidsen
Date: Tuesday, October 2, 2007 - 3:09 pm

And if it isn't set? I can easily see building a new kernel with stats 
off and forgetting to change the boot options.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-

From: Mel Gorman
Date: Tuesday, October 2, 2007 - 5:37 pm

If CONFIG_SCHEDSTAT is off and profile=sleep is set, you see with Ingo's
patch and readprofile;

     0 *unknown*
     0 total                                      0.0000

That is a tad confusing hence my follow-up patch which would say
"/proc/profile" doesn't exist when readprofile is used and the warning
in dmesg.

-- 
Mel Gorman

-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 1:21 am

yep - that's the best we can do for the stable release.

We could improve quality of behavior here by not offering /proc/profile 
in that case and by printk-ing something if profile=sleep is specified 
on a !CONFIG_SCHEDSTATS kernel. I'm willing to apply patches that do 
that :)

	Ingo
-

From: Mel Gorman
Date: Wednesday, October 3, 2007 - 5:51 am

I included a candidate patch in the last mail but it was shoved down at
the bottom so it could easily have been missed.

==============
Subject: Document profile=sleep requiring CONFIG_SCHEDSTATS

profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes the
limitation in Documentation/kernel-parameters.txt and prints a warning at
boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 Documentation/kernel-parameters.txt |    3 ++-
 kernel/profile.c                    |    5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt
--- linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt	2007-10-02 04:24:52.000000000 +0100
+++ linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt	2007-10-02 16:43:41.000000000 +0100
@@ -1395,7 +1395,8 @@ and is between 256 and 4096 characters. 
 			Param: "schedule" - profile schedule points.
 			Param: <number> - step/bucket size as a power of 2 for
 				statistical time based profiling.
-			Param: "sleep" - profile D-state sleeping (millisecs)
+			Param: "sleep" - profile D-state sleeping (millisecs).
+				Requires CONFIG_SCHEDSTATS
 
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/kernel/profile.c linux-2.6.23-rc9-010_document_profilesleep/kernel/profile.c
--- linux-2.6.23-rc9-005_ingo_profile_fix/kernel/profile.c	2007-10-02 04:24:52.000000000 +0100
+++ linux-2.6.23-rc9-010_document_profilesleep/kernel/profile.c	2007-10-02 16:44:50.000000000 +0100
@@ -60,6 +60,7 @@ static int __init profile_setup(char * s
 	int par;
 
 	if (!strncmp(str, sleepstr, strlen(sleepstr))) {
+#ifdef CONFIG_SCHEDSTATS
 		prof_on = ...
From: Ingo Molnar
Date: Monday, October 22, 2007 - 8:56 am

thanks, applied.

	Ingo
-

From: John Stoffel
Date: Tuesday, October 2, 2007 - 7:44 am

Linus> I said I was hoping that -rc8 was the last -rc, and I hate
Linus> doing this, but we've had more changes since -rc8 than we had
Linus> in -rc8. And while most of them are pretty trivial, I really
Linus> couldn't face doing a 2.6.23 release and take the risk of some
Linus> really stupid brown-paper-bag thing.

Linus> So there's a final -rc out there, and right now my plan is to
Linus> make this series really short, and release 2.6.23 in a few
Linus> days. So please do give it a last good testing, and holler
Linus> about any issues you find!

Just to let people know, I was running 2.6.23-rc for over 53 days
without any issues.  Mix of SCSI, Sata, tape drives, disks, MD, LVM,
SMP, etc.  I suspect we've got a pretty darn stable release coming out
soon.

John
-

From: John Stoffel
Date: Tuesday, October 2, 2007 - 8:45 am

Linus> I said I was hoping that -rc8 was the last -rc, and I hate
Linus> doing this, but we've had more changes since -rc8 than we had
Linus> in -rc8. And while most of them are pretty trivial, I really
Linus> couldn't face doing a 2.6.23 release and take the risk of some
Linus> really stupid brown-paper-bag thing.

Linus> So there's a final -rc out there, and right now my plan is to
Linus> make this series really short, and release 2.6.23 in a few
Linus> days. So please do give it a last good testing, and holler
Linus> about any issues you find!

John> Just to let people know, I was running 2.6.23-rc for over 53
John> days without any issues.  Mix of SCSI, Sata, tape drives, disks,
John> MD, LVM, SMP, etc.  I suspect we've got a pretty darn stable
John> release coming out soon.

2.6.23-rc2 is what I meant.  Oops...
-

From: Ingo Molnar
Date: Tuesday, October 2, 2007 - 8:03 am

that's pretty impressive! v2.6.23-rc2, right?

	Ingo
-

From: Bill Davidsen
Date: Tuesday, October 2, 2007 - 3:13 pm

I've been running rc8-git3 since it came out, and while I've built git-5 
and will build rc9, I probably will continue testing until I find a bug 
or have to boot for some other reason. Running really well, even with a 
lot of kvm stuff going on, kernel builds for other machines, etc.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-

From: Willy Tarreau
Date: Tuesday, October 2, 2007 - 3:44 pm

Looks pretty good at first glance. Dual-K7, adaptec 29160, NFS, e1000,
root on /dev/sda*. Not even one bad thing to report yet.

Cheers,
Willy

-

From: Alistair John Strachan
Date: Tuesday, October 2, 2007 - 3:51 pm

On Tuesday 02 October 2007 04:41:49 Linus Torvalds wrote:

This is certainly a tool issue, but if I use Debian's kernel-image "make-kpkg" 
wrapper around the kernel build system, it fails with:

cp: cannot stat `arch/x86_64/boot/bzImage': No such file or directory

Obviously, this file has moved to arch/x86/boot, but it seems like possibly 
unnecessary breakage. I've been copying bzImage for years from 
arch/x86_64/boot, and I'm sure there's a handful of scripts (other than 
Debian's kernel-image) doing this too.

For now, I hacked the tool[1]. Maybe, if we care, a symlink could be set up 
between arch/x86/boot and arch/$ARCH/boot ? Or would papering over this be 
more trouble than it's worth?

[1] http://devzero.co.uk/~alistair/kernel-package-changes.diff

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.

-

From: Glauber de Oliveira Costa
Date: Tuesday, October 2, 2007 - 4:00 pm

I believe most sane tools would be using the output of uname -m, so a
possible way to fix this would be fixing the data passed to userspace
from uname. However, that might be the case that it creates a new set
of problems too, with tools relying on the output of uname -m to
determine wheter the machine is 32 or 64 bit, and so on.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-

From: Ingo Molnar
Date: Thursday, October 4, 2007 - 10:41 pm

there are two problems with the use of uname -m:

- the build machine architecture is not necessarily the same as the
  target architecture. (for example i cross-compile all my 32-bit
  kernels on a 64-bit box.)

- we kept uname -m compatile. multilib depends on it, and other pieces
  of userspace as well. So uname -m still outputs 'i386' on 32-bit and
  'x86_64' on 64-bit - not 'x86'.

a symlink looks like the best solution to me.

	Ingo
-

From: Ingo Molnar
Date: Thursday, October 4, 2007 - 10:38 pm

yeah, a symlink is the right solution i think. Our first-step goal is to 
make the switchover seamless for all practical purposes, and a 
compatibility symlink in arch/i386/boot/ will not hurt. (we shouldnt 
worry about the really old zImage target though)

	Ingo
-

From: Sam Ravnborg
Date: Thursday, October 4, 2007 - 11:11 pm

But when can we then get rid of it?
This is a simple question about when we take the noise..
And right now people know we are shifting to x86 - so it makes
sense to let the dependent userspace tools take the pain now and not later.

Starting to fill up a build kernel with symlinks for compatibility with
random progarms seems to be the wrong approach.

	Sam - that dislike especially the asm symlink
-

From: Thomas Gleixner
Date: Friday, October 5, 2007 - 1:32 am

Sam,

I completely agree with you, but we want to keep the migration noise
as low as possible. Adding the symlink right now along with an entry
into features-removal.txt (6 month grace period) allows a smoother
transition. The distro folks should better get their gear together
until then.

	tglx
-

From: Alistair John Strachan
Date: Sunday, October 7, 2007 - 4:44 pm

I'll certainly file a bug report with the Debian BTS, but the fix will 
probably involve something as abortive as my original patch.

How did the PPC merge handle this? I can't see any similar hacks in 
kernel-image for these architectures.

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.
-

From: Diego Calleja
Date: Tuesday, October 2, 2007 - 4:07 pm

Also...if someone dislikes something in http://kernelnewbies.org/Linux_2_6_23 ,
or wants to fix my english, do it soon :)
-

From: Linus Torvalds
Date: Tuesday, October 2, 2007 - 4:32 pm

Heh. The "remove sk98lin driver" bullet is sadly wrong. We had to 
reinstate it because it supported some cards that the skge driver doesn't 
handle.

		Linus
-

From: Diego Calleja
Date: Wednesday, October 3, 2007 - 8:28 am

Thanks, fixed
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 1:46 am

hm, i just triggered the procfs crash below with -rc9 on a testbox. 
Config attached. It's easy to reproduce it via 'service sshd restart'. 
The crash site is:

 (gdb) list *0xc017599d
 0xc017599d is in seq_path (fs/seq_file.c:354).
 349             if (m->count < m->size) {
 350                     char *s = m->buf + m->count;
 351                     char *p = d_path(dentry, mnt, s, m->size - m->count);
 352                     if (!IS_ERR(p)) {
 353                             while (s <= p) {
 354                                     char c = *p++;
 355                                     if (!c) {
 356                                             p = m->buf + m->count;
 357                                             m->count = s - m->buf;
 358                                             return s - p;
 (gdb)

any ideas? Fortunately i was able to do an strace of the incident:

 3247  munmap(0xb7f3e000, 4096)          = 0
 3247  open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
 3247  fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 3247  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f3e000
 3247  read(3,  <unfinished ...>
 3247  +++ killed by SIGSEGV +++

and doing "cat /proc/mounts" triggers the crash reliably.

	Ingo

---------------->
BUG: unable to handle kernel paging request at virtual address f2a40000
 printing eip:
c017599d
*pdpt = 0000000000001001
*pde = 0000000000aee067
*pte = 0000000032a40000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in:
CPU:    0
EIP:    0060:[<c017599d>]    Not tainted VLI
EFLAGS: 00010297   (2.6.23-rc9 #89)
EIP is at seq_path+0x60/0xca
eax: f2a3fffe   ebx: c290c8d4   ecx: f6e341f0   edx: f2a3fffe
esi: f2a3f007   edi: c29097f0   ebp: ec5ddf1c   esp: ec5ddf04
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process sshd (pid: 2743, ti=ec5dc000 task=f6e341f0 task.ti=ec5dc000)
Stack: 00000ff9 c2bf6b40 f2a3fffe c29097c0 c2bf6b40 c29097f0 ec5ddf34 c0173c41 
       c05ffe64 ...
From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 1:50 am

update: occasionally the reading of /proc/mounts succeeds, and it's:

 open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 290
 write(1, "rootfs / rootfs rw 0 0\n/dev/root"..., 290rootfs / rootfs rw 0 0
 /dev/root / ext3 rw,noatime,nodiratime,data=ordered 0 0
 /proc /proc proc rw 0 0
 /proc/bus/usb /proc/bus/usb usbfs rw 0 0
 /sys /sys sysfs rw 0 0
 /dev/devpts /dev/pts devpts rw 0 0
 /dev/sda2 /home ext3 rw,noatime,nodiratime,data=ordered 0 0
 nodev /debug debugfs rw 0 0
 ) = 290
 read(3, "", 4096)                       = 0
 close(3)                                = 0

there's nothing particularly interesting in it. (perhaps debugfs)

	Ingo
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 2:12 am

disabling debugfs makes the crash go away so it's debugfs related. The 
.config delta is below.

	Ingo

--- .config.broken.000	2007-10-03 10:28:14.000000000 +0200
+++ .config.good.000	2007-10-03 11:11:18.000000000 +0200
@@ -85,7 +85,7 @@ CONFIG_MODULE_SRCVERSION_ALL=y
 # CONFIG_KMOD is not set
 CONFIG_BLOCK=y
 # CONFIG_LBD is not set
-CONFIG_BLK_DEV_IO_TRACE=y
+# CONFIG_BLK_DEV_IO_TRACE is not set
 CONFIG_LSF=y
 CONFIG_BLK_DEV_BSG=y
 
@@ -631,7 +631,6 @@ CONFIG_CFG80211=y
 CONFIG_WIRELESS_EXT=y
 CONFIG_MAC80211=y
 CONFIG_MAC80211_LEDS=y
-CONFIG_MAC80211_DEBUGFS=y
 # CONFIG_MAC80211_DEBUG is not set
 CONFIG_IEEE80211=m
 CONFIG_IEEE80211_DEBUG=y
@@ -1689,7 +1688,7 @@ CONFIG_TRACE_IRQFLAGS_SUPPORT=y
 CONFIG_ENABLE_MUST_CHECK=y
 # CONFIG_MAGIC_SYSRQ is not set
 CONFIG_UNUSED_SYMBOLS=y
-CONFIG_DEBUG_FS=y
+# CONFIG_DEBUG_FS is not set
 # CONFIG_HEADERS_CHECK is not set
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_SHIRQ=y
@@ -1724,8 +1724,6 @@ CONFIG_FAULT_INJECTION=y
 # CONFIG_FAILSLAB is not set
 CONFIG_FAIL_PAGE_ALLOC=y
 CONFIG_FAIL_MAKE_REQUEST=y
-CONFIG_FAULT_INJECTION_DEBUG_FS=y
-CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
 CONFIG_EARLY_PRINTK=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 # CONFIG_DEBUG_STACK_USAGE is not set
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 2:23 am

it's CONFIG_MAC80211_DEBUGFS=y causing the crash.

	Ingo
-


Charming...  So we get d_path() either returning junk or we get something
that isn't NUL-terminated.  Which one it is?  I.e. what does p look like
and what's in s?
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 7:08 am

could be use-after-free as well, as CONFIG_PAGEALLOC was enabled.

	Ingo
-


Umm...  d_path() had just written there, so use-after-free is not too
likely to trigger page fault on read immediately afterwards - you'd
need a pretty tight race to hit it.
-

From: Arjan van de Ven
Date: Wednesday, October 3, 2007 - 8:12 am

On Wed, 3 Oct 2007 15:26:01 +0100

I suspect we want the following patch out of general principles; Ingo,
can you see if this one helps?
(if not, it's still worth considering; it looks like we're first
destroying the device object (which holds the name of the directory)
before we unregister the directory... if that fails then we have a mess.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


--- linux-2.6.23-rc2/net/wireless/core.c~	2007-10-03 08:04:45.000000000 -0700
+++ linux-2.6.23-rc2/net/wireless/core.c	2007-10-03 08:04:45.000000000 -0700
@@ -133,8 +133,8 @@ void wiphy_unregister(struct wiphy *wiph
 	mutex_unlock(&drv->mtx);
 
 	list_del(&drv->list);
-	device_del(&drv->wiphy.dev);
 	debugfs_remove(drv->wiphy.debugfsdir);
+	device_del(&drv->wiphy.dev);
 
 	mutex_unlock(&cfg80211_drv_mutex);
 }
-

From: Linus Torvalds
Date: Wednesday, October 3, 2007 - 8:11 am

You have a terminally buggy piece of shit compiler.

Lookie here:

 - the bug happens on this:

	char c = *p++;

 - which has been compiled into

	8b 3a		mov    (%edx),%edi

   which is a *word* access.

 - the pointer is at the end of a page (very much on purpose):

	edx: f2a3fffe	

 - and as a result you get an exception on the *next* page:

	BUG: unable to handle kernel paging request at virtual address f2a40000

and btw, there is no question what-so-ever about whether your compiler 
might be doing a legal optimization - the compiler really is wrong, and is 
total shit. You need to make a gcc bug-report. Because this is not a 
question of "the standard is ambiguous", this is a question of "the 
compiler turned good code into code that could SIGSEGV in user space too, 
if 'malloc()' happened to return a pointer at the end of an allocation".

			Linus
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 8:40 am

hm, it's 4.0.2. Not the latest & greatest but i've been using it for 2 
years and this would be the first time it miscompiles a 32-bit kernel 

Hm, are you sure? This is a CONFIG_DEBUG_PAGEALLOC=y kernel, so even a 
slight overrun of a non-NIL terminated string (as suspected by Al) could 
run into a non-mapped kernel page. (which would indicate not a compiler 
bug but use-after free)

i just found another config under which i get similar crashes, config 
attached. One common theme is CONFIG_DEBUG_FS and DEBUG_PAGEALLOC - and 
CONFIG_MAC80211_DEBUGFS is not enabled in this one so it's off the hook 
i think. (the crashes are attached below)

(my serial log on this box goes back about 6 months, and that alone 
shows more than 3500 successful kernel bootups on that particular 
testsystem, each kernel built by this compiler - and there's another 
testsystem that i use even more frequently. Despite that, a compiler bug 
is still possible of course.)

	Ingo

--------------->
kobject_uevent_env
fill_kobj_path: path = '/class/vc/vcsa8'
kobject vcsa8: cleaning up
BUG: unable to handle kernel paging request at virtual address f6207000
 printing eip:
c016ecf1
*pdpt = 0000000000003001
*pde = 0000000000ac1067
*pte = 0000000036207000
Oops: 0000 [#1]
SMP DEBUG_PAGEALLOC
Modules linked in:
CPU:    1
EIP:    0060:[<c016ecf1>]    Not tainted VLI
EFLAGS: 00010297   (2.6.23-rc9 #20)
EIP is at seq_path+0x60/0xca
eax: f6206ffe   ebx: c2de0f50   ecx: 0000002b   edx: f6206ffe
esi: f6206007   edi: c2dddfb0   ebp: f6503f18   esp: f6503f00
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process awk (pid: 1160, ti=f6503000 task=f73a8390 task.ti=f6503000)
Stack: 00000ff9 f6e5cf70 f6206ffe c2dddf80 f6e5cf70 c2dddfb0 f6503f30 c016ce40 
       c05d71b5 f6730f38 f6e5cf70 c2dddfb0 f6503f70 c016f05d 00000400 08098f18 
       f6730f38 f6e5cf90 00000000 0806bc2e 00000003 08094320 f6503fb0 00000000 
Call Trace:
 [<c0103c8d>] show_trace_log_lvl+0x19/0x2e
 [<c0103d3f>] ...
From: Linus Torvalds
Date: Wednesday, October 3, 2007 - 9:07 am

I am 100% sure. I can look at the disassembly, and point to the fact that 
your Oops happens on code that is simply totally bogus.

That string is NUL-terminated, which is why the access is to f2a3fffe in 
the first place: we explicitly asked d_path() to create us a string at the 
end of the page (it creates them backwards), so the path string has a NUL 
a the end at address f2a3ffff, which is exactly what we'd expect.

Your compiler really does seem to be total crap.

Do a "make fs/seq_file.s" (and make sure you *disable* CONFIG_DEBUG_INFO 
first, otherwise the result will be unreadable crud), and look at 
seq_path(). It's going to be more readable than the disassembly that I got 

.. of *course* DEBUG_PAGEALLOC is going to be implied in the problem. If 
you don't have DEBUG_PAGEALLOC, you'll never see this, because you'll have 
all pages mapped, and the only page that it could happen to is the very 

It's not about "possible". It's a fact. Send me your "seq_file.s" output 
for that function to be sure - it *could* be memory corruption that 
changes a "movb" into a "movl", and maybe the compiler did a byte move to 
start with, but quite frankly, that is such a remote possibility that I 

This looks like *exactly* the same thing, except you're in 
"show_vfsmnt()" this time.

Again: the oopsing instruction (8b 3a) is "movl". And again, the address 
is f6206ffe, and it oopses because the (incorrect) 32-bit access will 
touch the next page, so you get a paging request fault on f6207000 - which 
is some *totally* different allocation, and one that isn't mapped because 
it doesn't exist, so DEBUG_PAGE_ALLOC has removed it.



And I can even tell you exactly what path it is:

 - it's going to be the first path that shows up in the path list, since 
   the seq_file interface will re-use that page, so if you hit it, you'll 
   hit it on the first entry (unless seq_file has *lots* of data and needs 
   more than a single-page allocation)

 - it must be a single-byte path, ...

Pedant: valid. Almost all optimizations are legal, nobody has yet written
laws about compilers. Sorry but I'm forever fixing misuse of the word

Agreed - the standard is not ambiguous here. (For reference the standard
says that a valid pointer must point at an object _OR_ one past the end
of the object (in the latter case it is not dereferencable)). So its a
compiler bug.

Alan


-

From: Linus Torvalds
Date: Wednesday, October 3, 2007 - 9:09 am

Heh.

When I'm ruler of the universe, it *will* be illegal. I'm just getting a 
bit ahead of myself.

			Linus
-

From: Jan Engelhardt
Date: Wednesday, October 3, 2007 - 9:25 am

Any time frame when that will happen?
-

From: Linus Torvalds
Date: Wednesday, October 3, 2007 - 10:07 am

I'm working on it, I'm working on it. I'm just as frustrated as you are. 
It turns out to be a non-trivial problem. 

			Linus
-


Ingo can't send a gcc bug-report since gcc 4.0 is no longer supported 
upstream and a 4.1.2 compiler was confirmed to work.

Our only options are to either stop supporting the broken gcc versions 
as compiler for the kernel or to work around this compiler bug in the 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Alexey Dobriyan
Date: Thursday, October 4, 2007 - 8:08 am

Distro can backport a fix for miscompilation while leaving, say,
__GNUC_MINOR__ intact, so banning version numbers aren't terribly
useful.

Perhaps, someone should write a script/test program to check for known
miscompilations. to cure himself from deprecation disease.

    Alexey "make cc_check" Dobriyan
-

From: Linus Torvalds
Date: Wednesday, October 3, 2007 - 8:47 am

Btw, this definitely doesn't happen for me, either on x86-64 or plain x86. 
The x86 thing I tested was Fedora 8 testing (ie not even some stable 
setup), so I wonder what experimental compiler you have.

Your compiler generates

	movl    -16(%ebp),%edx
	movl    (%edx),%edi		/* this is _totally_ bogus! */
	incl    %edx
	movl    %edx,-16(%ebp)
	movl    %edi,%ecx
	testb   %cl,%cl
	je      ...

while I get (gcc version 4.1.2 20070925 (Red Hat 4.1.2-28)):

        movl    -16(%ebp), %eax # p,
        movzbl  (%eax), %edi    #, c	/* not bogus! */
        movl    %edi, %edx      # c,
        testb   %dl, %dl        #
        je      .L64    #,
        incl    %eax    #
        movsbl  %dl,%ebx        #, D.12414
        movl    %eax, -16(%ebp) #, p

where the difference (apart from doing the increment differently and 
different register allocation) is that I have a "movzbl" (correct), while 
you have a "movl" (pure and utter crap).

I *suspect* that the compiler bug is along the lines of:
 (a) start off with movzbl
 (b) notice that the higher bits don't matter, because nobody subsequently 
     uses them
 (c) turn the thing into just a byte move. 
 (d) make the totally incorrect optimization of using a full 32-bit move 
     in order to avoid a partial register access stall

and the thing is, that final optimization can actually speed things up 
(although it can also slow things down for any access that crosses a cache 
sector boundary - 8/16 bytes), but it's seriously bogus, exactly because 
it can cause an invalid access to the three next bytes that may not even 
exist.

			Linus
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 8:49 am

i'll try with another compiler in a minute.

	Ingo
-

From: Ingo Molnar
Date: Wednesday, October 3, 2007 - 9:07 am

i just tried:

  gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)

and indeed the crash is gone. So you are completely right, it's a 
compiler bug in 4.0.2 (it's vanilla gcc 4.0.2 built by me, not a distro 
compiler). It should not affect normal kernels too much this bug needs 
CONFIG_DEBUG_PAGEALLOC. (or it needs a _really_ unlucky allocation being 
at the far upper end of RAM - but those are usually taken up by 
boot-time allocations anyway).

i also just re-tried the other config as well - and crash is gone there 
too. (not surprisingly)

	Ingo
-

From: Ingo Molnar
Date: Thursday, October 4, 2007 - 4:55 am

Subject: net, 9p: build fix with !CONFIG_SYSCTL
From: Ingo Molnar <mingo@elte.hu>

found via make randconfig build testing: 

 net/built-in.o: In function `init_p9':
 mod.c:(.init.text+0x3b39): undefined reference to `p9_sysctl_register'
 net/built-in.o: In function `exit_p9':
 mod.c:(.exit.text+0x36b): undefined reference to `p9_sysctl_unregister'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/net/9p/9p.h |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: linux/include/net/9p/9p.h
===================================================================
--- linux.orig/include/net/9p/9p.h
+++ linux/include/net/9p/9p.h
@@ -412,6 +412,18 @@ int p9_idpool_check(int id, struct p9_id
 
 int p9_error_init(void);
 int p9_errstr2errno(char *, int);
+
+#ifdef CONFIG_SYSCTL
 int __init p9_sysctl_register(void);
 void __exit p9_sysctl_unregister(void);
+#else
+static inline int p9_sysctl_register(void)
+{
+	return 0;
+}
+static inline void p9_sysctl_unregister(void)
+{
+}
+#endif
+
 #endif /* NET_9P_H */
-

From: Jeff Garzik
Date: Wednesday, October 3, 2007 - 12:07 pm

I'm a bit confused...  you typed 2.6.24-rc4; I'm guessing you meant 
2.6.24-rc1, but did you mean

	"x86 merge as soon as 2.6.23 is released" (merge window opens)

or

	"x86 merge as soon as 2.6.24-rc1 is released"

?

Thanks,

	Jeff


-

From: Linus Torvalds
Date: Wednesday, October 3, 2007 - 12:25 pm

is the correct interpretation.

It will probably be a day or two after the 2.6.23 release, just because I 
like evertybody to look at the release tree after a release, but it would 
be the first set of changes that get merged (assuming there are no stupid 
brown-paper-bag issues that would get priority).

		Linus
-

From: Thomas Gleixner
Date: Tuesday, October 2, 2007 - 2:17 am

I have uploaded an update of the arch/x86 tree based on -rc9 to

	git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86

For convenience there is a patch fixup script which helps you to
convert pending patches against this tree.

	http://userweb.kernel.org/~tglx/x86/x86-fixup-patches.py

It's generated from the merge script and fixes the namespace of
patches. There will still be some rejects which can not be fixed up
automatically, but this should be rare.

I did a test with Andrews -mm series and only ~10 arch/x86 related
patches had rejects, out of 230+ patches, so the 100%-painless
conversion ratio is better than 95%. Those patches with rejects were
trivial to fix.

Usage: x86-fixup-patches.py sourcepatch destpatch

source and dest can be the same.

A helper script to convert complete quilt series is here:
	http://userweb.kernel.org/~tglx/x86/fixupseries.sh

If there is anything we can help with the transition, please do not
hesitate to ask.

Thanks,

	Thomas, Ingo
-

From: Eric St-Laurent
Date: Tuesday, October 2, 2007 - 8:53 pm

On Tue, 2007-10-02 at 11:17 +0200, Thomas Gleixner wrote:



Hi Thomas,

This latest x86 branch build and boot without problem with my usual
x86_64 config.

If you remember our conversation one month ago, I was unable to build
your tree.

I've upgraded my Ubuntu distribution from 7.04 to 7.10 beta this week,
maybe this fixed it.

But I still had to do some manual fixes to get the packaging steps
working:

mkdir arch/x86_64/boot/
ln -s ../../../arch/x86/boot/bzImage arch/x86_64/boot/bzImage


Best regards,

- Eric


-

From: Andi Kleen
Date: Tuesday, October 2, 2007 - 2:21 am

Yes I have ~100 patches for arch/x86_64, arch/i386

Should I just drop them?

-Andi
-

From: Jeff Garzik
Date: Tuesday, October 2, 2007 - 3:37 am

Why don't you work with Thomas and Ingo to make sure everything is in 
sync and prepped for 2.6.24?

	Jeff



-

From: Andi Kleen
Date: Tuesday, October 2, 2007 - 3:48 am

The easiest way to do that would be to first merge all the queued and
collected patches from the last months. Once they are in people
can then create whatever mess they like.

The other way round  (adapting 100+ patches to a possibly completely
different tree) will be a huge amount of work which I am
frankly not very motivated to do because I think it's quite unnecessary. 

I would probably just push the work back to all the patch submitters -- that is 
what I meant with dropping the patches.

I assume mess up first would be also a minor catastrophe for Andrew --
in addition to my patches he also has a large number of patches
touching {x86_64,i386}

-Andi
-

From: Thomas Gleixner
Date: Tuesday, October 2, 2007 - 4:05 am

I picked up your queue at

	ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current/current.tar.gz

and converted it with the fixup script to:

	http://www.tglx.de/~tglx/patches-ak.tar.bz2

Hope that helps,

	tglx
-

From: Ingo Molnar
Date: Tuesday, October 2, 2007 - 7:07 am

thanks Thomas - i have applied this queue ontop of the unified arch/x86 
tree (i skipped vdso-text-offset which change is already upstream) and 
it built and booted fine on a couple of x86 systems - 32-bit and 64-bit 
alike. So your script worked like a charm.

Andi, could you please send us the list of patches from the 
current.tar.gz queue above that you consider 2.6.24 candidates? (and 
please add to the list if there's anything else pending) Thanks,

	Ingo
-

From: Andi Kleen
Date: Tuesday, October 2, 2007 - 7:23 am

I'm still merging/fixing etc. so the list is not final yet.

-Andi
-

From: Ingo Molnar
Date: Tuesday, October 2, 2007 - 7:31 am

ok, the ones marked TBD are:

 cflags-probe
 cpa-clflush
 sched-clock-share
 svm-disabled

please merge it ontop of the arch/x86 tree so that we can start 
reviewing and testing it based on the unified tree ASAP. (but sending us 
a queue to the old layout is fine too - whichever variant you can do 
fastest) Thanks,

	Ingo
-

From: Sam Ravnborg
Date: Tuesday, October 2, 2007 - 7:58 am

Hi Andi/Ingo.

I plan to integrate cflags-probe in kbuild.git if there is no objection.
And I will address any x86 issues when I do so.
On top of that I will most likely do the same change for i386.

	Sam
-

From: Ingo Molnar
Date: Tuesday, October 2, 2007 - 8:27 am

great. I think that's the most straightforward merge path for such 
Makefile updates. Andi, Thomas, any objections?

	Ingo
-

From: Andi Kleen
Date: Tuesday, October 2, 2007 - 8:24 am

Fine for me. Please take what you want.

-Andi
-

From: Thomas Gleixner
Date: Tuesday, October 2, 2007 - 8:30 am

Sam,


Makes sense. While you are at it, can you please have an eye on the
Build system changes I did to make arch/x86 with the two stub
directories arch/i386 and arch/x86_64 work.

Thanks,

	tglx

-

From: Sam Ravnborg
Date: Tuesday, October 2, 2007 - 8:40 am

By the way - that patch depends on a few other patches in Andi's queue
Thats on my TODO list - but I do not think I will find time before the merge.

	Sam
-

From: Andi Kleen
Date: Tuesday, October 2, 2007 - 7:54 am

It will be uploaded to the usual location.

-Andi

-

From: Jiri Kosina
Date: Tuesday, October 2, 2007 - 5:04 am

I asuume that Andrew is periodically pulling your queue into -mm, isn't 
he? If so, Thomas explicitly stated that -mm can be converted easily with 
just a few rejects, right?

-- 
Jiri Kosina
-

From: Rafael J. Wysocki
Date: Tuesday, October 2, 2007 - 1:12 pm

Well, there are several arch-dependent power management patches in -mm queued
up for merging.  Do I need to take care of converting them myself, or will that
be done automatically, or ...?

Greetings,
Rafael
-

From: Andrew Morton
Date: Tuesday, October 2, 2007 - 1:11 pm

On Tue, 2 Oct 2007 22:12:13 +0200

It should be OK.  I'll wait until this lot hits Linus's tree and then I'll
redo the whole -mm patch queue.

The one problem with this is that I will have trouble repulling and remerging
the 81 subsystem tree which are part of -mm until their owners have fixed
everything up - I'll either need to temporarily drop them or will need to
fix them up with Thomas's script each time I fetch them.

But whatever - I'll sort it out..
-

From: Rafael J. Wysocki
Date: Tuesday, October 2, 2007 - 1:31 pm

Many thanks!
-

From: Roland Dreier
Date: Tuesday, October 2, 2007 - 1:32 pm

> The one problem with this is that I will have trouble repulling and remerging
 > the 81 subsystem tree which are part of -mm until their owners have fixed
 > everything up - I'll either need to temporarily drop them or will need to
 > fix them up with Thomas's script each time I fetch them.

FWIW, I just pulled Thomas's x86 branch into my for-2.6.24 branch and
test-booted that on one of my systems with no obvious problems.  (Hey,
it compiled, ship it...)

 - R.
-

From: Mathieu Chouquet-Stringer
Date: Thursday, October 4, 2007 - 10:05 am

Hey there,

I've seen the changes you made in commit b6a2fea39318 and I guess they
might be responsible for my xargs breakage...

In the kernel source tree, if I run a stupid find | xargs ls, I now get
this:
xargs: ls: Argument list too long

Which is kind of annoying but I can work around it though make distclean in
my kernel tree dies with the same symptom (aka -E2BIG).

I run a vanilla 2.6.23-rc9 (Linux version 2.6.23-rc9 (mchouque@shookaylt)
(gcc version 4.1.2 20070925 (Red Hat 4.1.2-27)) #1 Tue Oct 2 08:13:47 EDT
2007) on FC7...

Let me know if I can do anything.  I'm going to try to bisect the problem
after I recompile the kernel without this patch...

Best,
Mathieu


-- 
Mathieu Chouquet-Stringer                           mchouque@free.fr
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --
-

From: Peter Zijlstra
Date: Thursday, October 4, 2007 - 10:17 am

/me tries

yep works like a charm, and that is a tree with a full git repo and

what happens if you up the stack limit to say 128M ?

Also, do you happen to have execve syscall audit stuff enabled?

-

From: Mathieu Chouquet-Stringer
Date: Thursday, October 4, 2007 - 1:47 pm

Nope.

-- 
Mathieu Chouquet-Stringer                           mchouque@free.fr
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --
-

From: Mathieu Chouquet-Stringer
Date: Thursday, October 4, 2007 - 2:58 pm

Actually, you were right, not only it's enabled but it's also the
culprit.  If I stop it, all is well...

Sorry for the noise.

-- 
Mathieu Chouquet-Stringer                           mchouque@free.fr
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --
-

From: Linus Torvalds
Date: Thursday, October 4, 2007 - 10:27 am

What does your "ulimit -s" say?

I suspect that you might hit the code that limits execve() arguments to 
one quarter of the maximum stack size.

We could change that from 25% to something else (half? three quarters?), 
but if you really are hitting that limit, it sounds like you may have a 
really small stack size to begin with (ie if 25% is smaller than the old 
argument size limit of 128kB, you're running with a stack limit of less 
than half a meg, which sounds pretty dang small).

So I'd like to verify that the stack limit really is the issue, and not 
something else.

		Linus
-

From: Mathieu Chouquet-Stringer
Date: Thursday, October 4, 2007 - 1:44 pm

Thank you for getting back to me.


That's actually the first thing I checked.

mchouque - /usr/src/kernel/linux %ulimit -s
unlimited

And for the record, ulimit -a yields:
-t: cpu time (seconds)         unlimited
-f: file size (blocks)         unlimited
-d: data seg size (kbytes)     unlimited
-s: stack size (kbytes)        unlimited
-c: core file size (blocks)    0
-m: resident set size (kbytes) unlimited
-u: processes                  16375
-n: file descriptors           1024
-l: locked-in-memory size (kb) 32
-v: address space (kb)         unlimited
-x: file locks                 unlimited
-i: pending signals            16375
-q: bytes in POSIX msg queues  819200
-N 13:                         0

Anything else you'd like me to try?

-- 
Mathieu Chouquet-Stringer                           mchouque@free.fr
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --
-

From: Linus Torvalds
Date: Thursday, October 4, 2007 - 2:21 pm

Well, since others definitely don't see this, including me, and I can do 
things like 62MB exec arrays:

	[torvalds@woody linux]$ echo $(find /home/torvalds/) | wc
	      1  883304 63000962

without getting any overflows (much less just on the kernel sources, which 
is less than a megabyte of pathnames), I think it would be good if you 
were to just instrument the kernel and make it do a "printk()" when it 
returns E2BIG in fs/execve.c (or the NULL returns from get_arg_page()).

Just to figure out *which* test fails for you but apparently nobody else.

		Linus
-

From: Paul Mackerras
Date: Thursday, October 4, 2007 - 3:27 pm

That wouldn't actually do an exec, assuming you're using bash, since
echo is a shell builtin in bash.  You'd need to do /bin/echo.

Paul.
-

From: Linus Torvalds
Date: Thursday, October 4, 2007 - 5:12 pm

Right you are, silly me. But yes, it works for me even with that (and 
since I downloaded the gcc source tree, it now has six more megs of 
arguments).

I also tested that "ulimit -s" seems to do the right thing for me.

I'm also assuming Mathieu is running x86 (or x86-64): HP-PA has a stack 
that grows upwards, and that has traditionally been exciting.

IA64 also has some strange things for the register backing store.

			Linus
-

From: Mathieu Chouquet-Stringer
Date: Thursday, October 4, 2007 - 8:22 pm

Correct, x86 it is but as I said it's this stupid auditd thing that
breaks the whole process.  I'm gonna file a bug against it.

Thanks for the help though.
-- 
Mathieu Chouquet-Stringer                           mchouque@free.fr
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --
-

From: Peter Zijlstra
Date: Friday, October 5, 2007 - 12:43 am

Eric Paris just posted patches to solve this.
From: Chuck Ebbert
Date: Thursday, October 4, 2007 - 2:50 pm

Can you strace it to see what syscall is failing?
-

From: Mathieu Chouquet-Stringer
Date: Thursday, October 4, 2007 - 2:54 pm

Sure:
25789 <... execve resumed> )            = -1 E2BIG (Argument list too long)

I'm going to reboot to a kernel that has Linus' printks...

-- 
Mathieu Chouquet-Stringer                           mchouque@free.fr
            The sun itself sees not till heaven clears.
	             -- William Shakespeare --
-

From: Hans-Peter Jansen
Date: Saturday, October 6, 2007 - 1:29 am

Have you tried to remove xarg from the equation above, just in case that it 
stumbles upon the elemination of the reason for its existence in the first 
place..

Pete
-

From: Hans-Peter Jansen
Date: Saturday, October 6, 2007 - 4:29 am

Sorry guys, filtering by "linus" in kmail suppressed the solution messages 
in this thread.

Pete
-

From: Bill Davidsen
Date: Saturday, October 6, 2007 - 10:36 am

You can work around it many ways, using the options provided for xargs 
or using ls directly being among them.
    find . -ls

I don't see it with 2.6.23-rc8-git3 so it may be related to xargs 


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-

From: Timo Jantunen
Date: Wednesday, October 3, 2007 - 7:21 am

The r8169 nic performance regression is still there.

2.6.22: send 82MB/s, receive 86MB/s
2.6.23-rc9: send 32MB/s, receive 98MB/s

I debugged this with Francois Romieu but haven't heard from him since 
testing his fixes.

I attached a patch from him which is a partial revert of commit 
6dccd16b7c2703e8bbf8bca62b5cf248332afbe2.

With this patch I get 93MB send and 97MB receive and I have been running it 
for a week but I don't know if the patch has any downsides on other 
systems.





From 34875931ba2e473e2867d941980131edd609dbe4 Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 26 Sep 2007 23:44:03 +0200
Subject: [PATCH] r8169: more revert

Part of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/r8169.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index cb4c412..6d8611c 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1905,7 +1905,11 @@ static void rtl_hw_start_8169(struct net_device *dev)
 
 	rtl_set_rx_max_size(ioaddr);
 
-	rtl_set_rx_tx_config_registers(tp);
+	if ((tp->mac_version == RTL_GIGA_MAC_VER_01) ||
+	    (tp->mac_version == RTL_GIGA_MAC_VER_02) ||
+	    (tp->mac_version == RTL_GIGA_MAC_VER_03) ||
+	    (tp->mac_version == RTL_GIGA_MAC_VER_04))
+		rtl_set_rx_tx_config_registers(tp);
 
 	tp->cp_cmd |= rtl_rw_cpluscmd(ioaddr) | PCIMulRW;
 
@@ -1926,6 +1930,14 @@ static void rtl_hw_start_8169(struct net_device *dev)
 
 	rtl_set_rx_tx_desc_registers(tp, ioaddr);
 
+	if ((tp->mac_version != RTL_GIGA_MAC_VER_01) &&
+	    (tp->mac_version != RTL_GIGA_MAC_VER_02) &&
+	    (tp->mac_version != RTL_GIGA_MAC_VER_03) &&
+	    (tp->mac_version != RTL_GIGA_MAC_VER_04)) {
+		RTL_W8(ChipCmd, CmdTxEnb | CmdRxEnb);
+		rtl_set_rx_tx_config_registers(tp);
+	}
+
 	RTL_W8(Cfg9346, Cfg9346_Lock);
 
 	/* Initially a 10 us delay. Turned it into a PCI commit. - FR ...
Previous thread: [PATCH] mmc: Disabler for Ricoh MMC controller by Philip Langdale on Monday, October 1, 2007 - 8:23 pm. (6 messages)

Next thread: [PATCH] Add Documentation/power/00-INDEX by Rob Landley on Monday, October 1, 2007 - 8:44 pm. (4 messages)