I said I was hoping that -rc8 was the last -rc, and I hate doing this, but
we've had more changes since -rc8 than we had in -rc8. And while most of
them are pretty trivial, I really couldn't face doing a 2.6.23 release and
take the risk of some really stupid brown-paper-bag thing.So there's a final -rc out there, and right now my plan is to make this
series really short, and release 2.6.23 in a few days. So please do give
it a last good testing, and holler about any issues you find!This is also a good time to warn about the fact that we're doing the x86
merge very soon (as in the next day or two) after 2.6.23 is out, so if you
have pending patches for the next series that touch arch/i386 or x86-64,
you should get in touch with Thomas Gleixner and Ingo Molnar, who are the
keepers of the merge scripts, and will help you prepare..Doing it as early as possible in the 2.6.24-rc4 series (basically I'll do
it first thing) will mean that we'll have the maximum amount of time to
sort out any issues, and the thing is, Thomas and Ingo already have a tree
ready to go, so people can check their work against that, and don't need
to think that they have to do any fixups after it his *my* tree. It would
be much better if everybody was just ready for it, and not taken by
surprise.In other words, people who know they may be affected and would want to
prepare can look at (for example)git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86
and generally get ready for the switch-over.
Linus
-
The r8169 nic performance regression is still there.
2.6.22: send 82MB/s, receive 86MB/s
2.6.23-rc9: send 32MB/s, receive 98MB/sI debugged this with Francois Romieu but haven't heard from him since
testing his fixes.I attached a patch from him which is a partial revert of commit
6dccd16b7c2703e8bbf8bca62b5cf248332afbe2.With this patch I get 93MB send and 97MB receive and I have been running it
for a week but I don't know if the patch has any downsides on other
systems.From 34875931ba2e473e2867d941980131edd609dbe4 Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 26 Sep 2007 23:44:03 +0200
Subject: [PATCH] r8169: more revertPart of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
drivers/net/r8169.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index cb4c412..6d8611c 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1905,7 +1905,11 @@ static void rtl_hw_start_8169(struct net_device *dev)rtl_set_rx_max_size(ioaddr);
- rtl_set_rx_tx_config_registers(tp);
+ if ((tp->mac_version == RTL_GIGA_MAC_VER_01) ||
+ (tp->mac_version == RTL_GIGA_MAC_VER_02) ||
+ (tp->mac_version == RTL_GIGA_MAC_VER_03) ||
+ (tp->mac_version == RTL_GIGA_MAC_VER_04))
+ rtl_set_rx_tx_config_registers(tp);tp->cp_cmd |= rtl_rw_cpluscmd(ioaddr) | PCIMulRW;
@@ -1926,6 +1930,14 @@ static void rtl_hw_start_8169(struct net_device *dev)
rtl_set_rx_tx_desc_registers(tp, ioaddr);
+ if ((tp->mac_version != RTL_GIGA_MAC_VER_01) &&
+ (tp->mac_version != RTL_GIGA_MAC_VER_02) &&
+ (tp->mac_version != RTL_GIGA_MAC_VER_03) &&
+ (tp->mac_version != RTL_GIGA_MAC_VER_04)) {
+ RTL_W8(ChipCmd, CmdTxEnb | CmdRxEnb);
+ rtl_set_rx_tx_config_registers(tp);
+ }
+
RTL_W8(Cfg9346, Cfg9346_Lock);/* Ini...
Hey there,
I've seen the changes you made in commit b6a2fea39318 and I guess they
might be responsible for my xargs breakage...In the kernel source tree, if I run a stupid find | xargs ls, I now get
this:
xargs: ls: Argument list too longWhich is kind of annoying but I can work around it though make distclean in
my kernel tree dies with the same symptom (aka -E2BIG).I run a vanilla 2.6.23-rc9 (Linux version 2.6.23-rc9 (mchouque@shookaylt)
(gcc version 4.1.2 20070925 (Red Hat 4.1.2-27)) #1 Tue Oct 2 08:13:47 EDT
2007) on FC7...Let me know if I can do anything. I'm going to try to bisect the problem
after I recompile the kernel without this patch...Best,
Mathieu--
Mathieu Chouquet-Stringer mchouque@free.fr
The sun itself sees not till heaven clears.
-- William Shakespeare --
-
You can work around it many ways, using the options provided for xargs
or using ls directly being among them.
find . -lsI don't see it with 2.6.23-rc8-git3 so it may be related to xargs
--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-
Have you tried to remove xarg from the equation above, just in case that it
stumbles upon the elemination of the reason for its existence in the first
place..Pete
-
Sorry guys, filtering by "linus" in kmail suppressed the solution messages
in this thread.Pete
-
Can you strace it to see what syscall is failing?
-
Sure:
25789 <... execve resumed> ) = -1 E2BIG (Argument list too long)I'm going to reboot to a kernel that has Linus' printks...
--
Mathieu Chouquet-Stringer mchouque@free.fr
The sun itself sees not till heaven clears.
-- William Shakespeare --
-
What does your "ulimit -s" say?
I suspect that you might hit the code that limits execve() arguments to
one quarter of the maximum stack size.We could change that from 25% to something else (half? three quarters?),
but if you really are hitting that limit, it sounds like you may have a
really small stack size to begin with (ie if 25% is smaller than the old
argument size limit of 128kB, you're running with a stack limit of less
than half a meg, which sounds pretty dang small).So I'd like to verify that the stack limit really is the issue, and not
something else.Linus
-
Thank you for getting back to me.
That's actually the first thing I checked.
mchouque - /usr/src/kernel/linux %ulimit -s
unlimitedAnd for the record, ulimit -a yields:
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) unlimited
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 16375
-n: file descriptors 1024
-l: locked-in-memory size (kb) 32
-v: address space (kb) unlimited
-x: file locks unlimited
-i: pending signals 16375
-q: bytes in POSIX msg queues 819200
-N 13: 0Anything else you'd like me to try?
--
Mathieu Chouquet-Stringer mchouque@free.fr
The sun itself sees not till heaven clears.
-- William Shakespeare --
-
Well, since others definitely don't see this, including me, and I can do
things like 62MB exec arrays:[torvalds@woody linux]$ echo $(find /home/torvalds/) | wc
1 883304 63000962without getting any overflows (much less just on the kernel sources, which
is less than a megabyte of pathnames), I think it would be good if you
were to just instrument the kernel and make it do a "printk()" when it
returns E2BIG in fs/execve.c (or the NULL returns from get_arg_page()).Just to figure out *which* test fails for you but apparently nobody else.
Linus
-
That wouldn't actually do an exec, assuming you're using bash, since
echo is a shell builtin in bash. You'd need to do /bin/echo.Paul.
-
Right you are, silly me. But yes, it works for me even with that (and
since I downloaded the gcc source tree, it now has six more megs of
arguments).I also tested that "ulimit -s" seems to do the right thing for me.
I'm also assuming Mathieu is running x86 (or x86-64): HP-PA has a stack
that grows upwards, and that has traditionally been exciting.IA64 also has some strange things for the register backing store.
Linus
-
Correct, x86 it is but as I said it's this stupid auditd thing that
breaks the whole process. I'm gonna file a bug against it.Thanks for the help though.
--
Mathieu Chouquet-Stringer mchouque@free.fr
The sun itself sees not till heaven clears.
-- William Shakespeare --
-
Eric Paris just posted patches to solve this.
/me tries
yep works like a charm, and that is a tree with a full git repo and
what happens if you up the stack limit to say 128M ?
Also, do you happen to have execve syscall audit stuff enabled?
-
Actually, you were right, not only it's enabled but it's also the
culprit. If I stop it, all is well...Sorry for the noise.
--
Mathieu Chouquet-Stringer mchouque@free.fr
The sun itself sees not till heaven clears.
-- William Shakespeare --
-
Nope.
--
Mathieu Chouquet-Stringer mchouque@free.fr
The sun itself sees not till heaven clears.
-- William Shakespeare --
-
I have uploaded an update of the arch/x86 tree based on -rc9 to
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86
For convenience there is a patch fixup script which helps you to
convert pending patches against this tree.http://userweb.kernel.org/~tglx/x86/x86-fixup-patches.py
It's generated from the merge script and fixes the namespace of
patches. There will still be some rejects which can not be fixed up
automatically, but this should be rare.I did a test with Andrews -mm series and only ~10 arch/x86 related
patches had rejects, out of 230+ patches, so the 100%-painless
conversion ratio is better than 95%. Those patches with rejects were
trivial to fix.Usage: x86-fixup-patches.py sourcepatch destpatch
source and dest can be the same.
A helper script to convert complete quilt series is here:
http://userweb.kernel.org/~tglx/x86/fixupseries.shIf there is anything we can help with the transition, please do not
hesitate to ask.Thanks,
Thomas, Ingo
-
Well, there are several arch-dependent power management patches in -mm queued
up for merging. Do I need to take care of converting them myself, or will that
be done automatically, or ...?Greetings,
Rafael
-
On Tue, 2 Oct 2007 22:12:13 +0200
It should be OK. I'll wait until this lot hits Linus's tree and then I'll
redo the whole -mm patch queue.The one problem with this is that I will have trouble repulling and remerging
the 81 subsystem tree which are part of -mm until their owners have fixed
everything up - I'll either need to temporarily drop them or will need to
fix them up with Thomas's script each time I fetch them.But whatever - I'll sort it out..
-
> The one problem with this is that I will have trouble repulling and remerging
> the 81 subsystem tree which are part of -mm until their owners have fixed
> everything up - I'll either need to temporarily drop them or will need to
> fix them up with Thomas's script each time I fetch them.FWIW, I just pulled Thomas's x86 branch into my for-2.6.24 branch and
test-booted that on one of my systems with no obvious problems. (Hey,
it compiled, ship it...)- R.
-
Many thanks!
-
Yes I have ~100 patches for arch/x86_64, arch/i386
Should I just drop them?
-Andi
-
I asuume that Andrew is periodically pulling your queue into -mm, isn't
he? If so, Thomas explicitly stated that -mm can be converted easily with
just a few rejects, right?--
Jiri Kosina
-
Why don't you work with Thomas and Ingo to make sure everything is in
sync and prepped for 2.6.24?Jeff
-
The easiest way to do that would be to first merge all the queued and
collected patches from the last months. Once they are in people
can then create whatever mess they like.The other way round (adapting 100+ patches to a possibly completely
different tree) will be a huge amount of work which I am
frankly not very motivated to do because I think it's quite unnecessary.I would probably just push the work back to all the patch submitters -- that is
what I meant with dropping the patches.I assume mess up first would be also a minor catastrophe for Andrew --
in addition to my patches he also has a large number of patches
touching {x86_64,i386}-Andi
-
I picked up your queue at
ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current/current.tar.gz
and converted it with the fixup script to:
http://www.tglx.de/~tglx/patches-ak.tar.bz2
Hope that helps,
tglx
-
thanks Thomas - i have applied this queue ontop of the unified arch/x86
tree (i skipped vdso-text-offset which change is already upstream) and
it built and booted fine on a couple of x86 systems - 32-bit and 64-bit
alike. So your script worked like a charm.Andi, could you please send us the list of patches from the
current.tar.gz queue above that you consider 2.6.24 candidates? (and
please add to the list if there's anything else pending) Thanks,Ingo
-
I'm still merging/fixing etc. so the list is not final yet.
-Andi
-
ok, the ones marked TBD are:
cflags-probe
cpa-clflush
sched-clock-share
svm-disabledplease merge it ontop of the arch/x86 tree so that we can start
reviewing and testing it based on the unified tree ASAP. (but sending us
a queue to the old layout is fine too - whichever variant you can do
fastest) Thanks,Ingo
-
It will be uploaded to the usual location.
-Andi
-
Hi Andi/Ingo.
I plan to integrate cflags-probe in kbuild.git if there is no objection.
And I will address any x86 issues when I do so.
On top of that I will most likely do the same change for i386.Sam
-
Sam,
Makes sense. While you are at it, can you please have an eye on the
Build system changes I did to make arch/x86 with the two stub
directories arch/i386 and arch/x86_64 work.Thanks,
tglx
-
By the way - that patch depends on a few other patches in Andi's queue
Thats on my TODO list - but I do not think I will find time before the merge.Sam
-
Fine for me. Please take what you want.
-Andi
-
great. I think that's the most straightforward merge path for such
Makefile updates. Andi, Thomas, any objections?Ingo
-
On Tue, 2007-10-02 at 11:17 +0200, Thomas Gleixner wrote:
Hi Thomas,
This latest x86 branch build and boot without problem with my usual
x86_64 config.If you remember our conversation one month ago, I was unable to build
your tree.I've upgraded my Ubuntu distribution from 7.04 to 7.10 beta this week,
maybe this fixed it.But I still had to do some manual fixes to get the packaging steps
working:mkdir arch/x86_64/boot/
ln -s ../../../arch/x86/boot/bzImage arch/x86_64/boot/bzImageBest regards,
- Eric
-
I'm a bit confused... you typed 2.6.24-rc4; I'm guessing you meant
2.6.24-rc1, but did you mean"x86 merge as soon as 2.6.23 is released" (merge window opens)
or
"x86 merge as soon as 2.6.24-rc1 is released"
?
Thanks,
Jeff
-
is the correct interpretation.
It will probably be a day or two after the 2.6.23 release, just because I
like evertybody to look at the release tree after a release, but it would
be the first set of changes that get merged (assuming there are no stupid
brown-paper-bag issues that would get priority).Linus
-
Subject: net, 9p: build fix with !CONFIG_SYSCTL
From: Ingo Molnar <mingo@elte.hu>found via make randconfig build testing:
net/built-in.o: In function `init_p9':
mod.c:(.init.text+0x3b39): undefined reference to `p9_sysctl_register'
net/built-in.o: In function `exit_p9':
mod.c:(.exit.text+0x36b): undefined reference to `p9_sysctl_unregister'Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/net/9p/9p.h | 12 ++++++++++++
1 file changed, 12 insertions(+)Index: linux/include/net/9p/9p.h
===================================================================
--- linux.orig/include/net/9p/9p.h
+++ linux/include/net/9p/9p.h
@@ -412,6 +412,18 @@ int p9_idpool_check(int id, struct p9_idint p9_error_init(void);
int p9_errstr2errno(char *, int);
+
+#ifdef CONFIG_SYSCTL
int __init p9_sysctl_register(void);
void __exit p9_sysctl_unregister(void);
+#else
+static inline int p9_sysctl_register(void)
+{
+ return 0;
+}
+static inline void p9_sysctl_unregister(void)
+{
+}
+#endif
+
#endif /* NET_9P_H */
-
hm, i just triggered the procfs crash below with -rc9 on a testbox.
Config attached. It's easy to reproduce it via 'service sshd restart'.
The crash site is:(gdb) list *0xc017599d
0xc017599d is in seq_path (fs/seq_file.c:354).
349 if (m->count < m->size) {
350 char *s = m->buf + m->count;
351 char *p = d_path(dentry, mnt, s, m->size - m->count);
352 if (!IS_ERR(p)) {
353 while (s <= p) {
354 char c = *p++;
355 if (!c) {
356 p = m->buf + m->count;
357 m->count = s - m->buf;
358 return s - p;
(gdb)any ideas? Fortunately i was able to do an strace of the incident:
3247 munmap(0xb7f3e000, 4096) = 0
3247 open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
3247 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
3247 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f3e000
3247 read(3, <unfinished ...>
3247 +++ killed by SIGSEGV +++and doing "cat /proc/mounts" triggers the crash reliably.
Ingo
---------------->
BUG: unable to handle kernel paging request at virtual address f2a40000
printing eip:
c017599d
*pdpt = 0000000000001001
*pde = 0000000000aee067
*pte = 0000000032a40000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in:
CPU: 0
EIP: 0060:[<c017599d>] Not tainted VLI
EFLAGS: 00010297 (2.6.23-rc9 #89)
EIP is at seq_path+0x60/0xca
eax: f2a3fffe ebx: c290c8d4 ecx: f6e341f0 edx: f2a3fffe
esi: f2a3f007 edi: c29097f0 ebp: ec5ddf1c esp: ec5ddf04
ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
Process sshd (pid: 2743, ti=ec5dc000 task=f6e341f0 task.ti=ec5dc000)
Stack: 00000ff9 c2bf6b40 f2a3fffe c29097c0 c2bf6b40 c290...
You have a terminally buggy piece of shit compiler.
Lookie here:
- the bug happens on this:
char c = *p++;
- which has been compiled into
8b 3a mov (%edx),%edi
which is a *word* access.
- the pointer is at the end of a page (very much on purpose):
edx: f2a3fffe
- and as a result you get an exception on the *next* page:
BUG: unable to handle kernel paging request at virtual address f2a40000
and btw, there is no question what-so-ever about whether your compiler
might be doing a legal optimization - the compiler really is wrong, and is
total shit. You need to make a gcc bug-report. Because this is not a
question of "the standard is ambiguous", this is a question of "the
compiler turned good code into code that could SIGSEGV in user space too,
if 'malloc()' happened to return a pointer at the end of an allocation".Linus
-
Btw, this definitely doesn't happen for me, either on x86-64 or plain x86.
The x86 thing I tested was Fedora 8 testing (ie not even some stable
setup), so I wonder what experimental compiler you have.Your compiler generates
movl -16(%ebp),%edx
movl (%edx),%edi /* this is _totally_ bogus! */
incl %edx
movl %edx,-16(%ebp)
movl %edi,%ecx
testb %cl,%cl
je ...while I get (gcc version 4.1.2 20070925 (Red Hat 4.1.2-28)):
movl -16(%ebp), %eax # p,
movzbl (%eax), %edi #, c /* not bogus! */
movl %edi, %edx # c,
testb %dl, %dl #
je .L64 #,
incl %eax #
movsbl %dl,%ebx #, D.12414
movl %eax, -16(%ebp) #, pwhere the difference (apart from doing the increment differently and
different register allocation) is that I have a "movzbl" (correct), while
you have a "movl" (pure and utter crap).I *suspect* that the compiler bug is along the lines of:
(a) start off with movzbl
(b) notice that the higher bits don't matter, because nobody subsequently
uses them
(c) turn the thing into just a byte move.
(d) make the totally incorrect optimization of using a full 32-bit move
in order to avoid a partial register access stalland the thing is, that final optimization can actually speed things up
(although it can also slow things down for any access that crosses a cache
sector boundary - 8/16 bytes), but it's seriously bogus, exactly because
it can cause an invalid access to the three next bytes that may not even
exist.Linus
-
i'll try with another compiler in a minute.
Ingo
-
i just tried:
gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)
and indeed the crash is gone. So you are completely right, it's a
compiler bug in 4.0.2 (it's vanilla gcc 4.0.2 built by me, not a distro
compiler). It should not affect normal kernels too much this bug needs
CONFIG_DEBUG_PAGEALLOC. (or it needs a _really_ unlucky allocation being
at the far upper end of RAM - but those are usually taken up by
boot-time allocations anyway).i also just re-tried the other config as well - and crash is gone there
too. (not surprisingly)Ingo
-
Ingo can't send a gcc bug-report since gcc 4.0 is no longer supported
upstream and a 4.1.2 compiler was confirmed to work.Our only options are to either stop supporting the broken gcc versions
as compiler for the kernel or to work around this compiler bug in thecu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
Distro can backport a fix for miscompilation while leaving, say,
__GNUC_MINOR__ intact, so banning version numbers aren't terribly
useful.Perhaps, someone should write a script/test program to check for known
miscompilations. to cure himself from deprecation disease.Alexey "make cc_check" Dobriyan
-
Pedant: valid. Almost all optimizations are legal, nobody has yet written
laws about compilers. Sorry but I'm forever fixing misuse of the wordAgreed - the standard is not ambiguous here. (For reference the standard
says that a valid pointer must point at an object _OR_ one past the end
of the object (in the latter case it is not dereferencable)). So its a
compiler bug.Alan
-
Heh.
When I'm ruler of the universe, it *will* be illegal. I'm just getting a
bit ahead of myself.Linus
-
Any time frame when that will happen?
-
I'm working on it, I'm working on it. I'm just as frustrated as you are.
It turns out to be a non-trivial problem.Linus
-
hm, it's 4.0.2. Not the latest & greatest but i've been using it for 2
years and this would be the first time it miscompiles a 32-bit kernelHm, are you sure? This is a CONFIG_DEBUG_PAGEALLOC=y kernel, so even a
slight overrun of a non-NIL terminated string (as suspected by Al) could
run into a non-mapped kernel page. (which would indicate not a compiler
bug but use-after free)i just found another config under which i get similar crashes, config
attached. One common theme is CONFIG_DEBUG_FS and DEBUG_PAGEALLOC - and
CONFIG_MAC80211_DEBUGFS is not enabled in this one so it's off the hook
i think. (the crashes are attached below)(my serial log on this box goes back about 6 months, and that alone
shows more than 3500 successful kernel bootups on that particular
testsystem, each kernel built by this compiler - and there's another
testsystem that i use even more frequently. Despite that, a compiler bug
is still possible of course.)Ingo
--------------->
kobject_uevent_env
fill_kobj_path: path = '/class/vc/vcsa8'
kobject vcsa8: cleaning up
BUG: unable to handle kernel paging request at virtual address f6207000
printing eip:
c016ecf1
*pdpt = 0000000000003001
*pde = 0000000000ac1067
*pte = 0000000036207000
Oops: 0000 [#1]
SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1
EIP: 0060:[<c016ecf1>] Not tainted VLI
EFLAGS: 00010297 (2.6.23-rc9 #20)
EIP is at seq_path+0x60/0xca
eax: f6206ffe ebx: c2de0f50 ecx: 0000002b edx: f6206ffe
esi: f6206007 edi: c2dddfb0 ebp: f6503f18 esp: f6503f00
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process awk (pid: 1160, ti=f6503000 task=f73a8390 task.ti=f6503000)
Stack: 00000ff9 f6e5cf70 f6206ffe c2dddf80 f6e5cf70 c2dddfb0 f6503f30 c016ce40
c05d71b5 f6730f38 f6e5cf70 c2dddfb0 f6503f70 c016f05d 00000400 08098f18
f6730f38 f6e5cf90 00000000 0806bc2e 00000003 08094320 f6503fb0 00000000
Call Trace:
[<c0103c8d>] show_trace_log_lvl+0x19/0x2e
[<c0103d3f>] show_s...
I am 100% sure. I can look at the disassembly, and point to the fact that
your Oops happens on code that is simply totally bogus.That string is NUL-terminated, which is why the access is to f2a3fffe in
the first place: we explicitly asked d_path() to create us a string at the
end of the page (it creates them backwards), so the path string has a NUL
a the end at address f2a3ffff, which is exactly what we'd expect.Your compiler really does seem to be total crap.
Do a "make fs/seq_file.s" (and make sure you *disable* CONFIG_DEBUG_INFO
first, otherwise the result will be unreadable crud), and look at
seq_path(). It's going to be more readable than the disassembly that I got.. of *course* DEBUG_PAGEALLOC is going to be implied in the problem. If
you don't have DEBUG_PAGEALLOC, you'll never see this, because you'll have
all pages mapped, and the only page that it could happen to is the veryIt's not about "possible". It's a fact. Send me your "seq_file.s" output
for that function to be sure - it *could* be memory corruption that
changes a "movb" into a "movl", and maybe the compiler did a byte move to
start with, but quite frankly, that is such a remote possibility that IThis looks like *exactly* the same thing, except you're in
"show_vfsmnt()" this time.Again: the oopsing instruction (8b 3a) is "movl". And again, the address
is f6206ffe, and it oopses because the (incorrect) 32-bit access will
touch the next page, so you get a paging request fault on f6207000 - which
is some *totally* different allocation, and one that isn't mapped because
it doesn't exist, so DEBUG_PAGE_ALLOC has removed it.And I can even tell you exactly what path it is:
- it's going to be the first path that shows up in the path list, since
the seq_file interface will re-use that page, so if you hit it, you'll
hit it on the first entry (unless seq_file has *lots* of data and needs
more than a single-page allocation)- it must be a single-byte path, bec...
Charming... So we get d_path() either returning junk or we get something
that isn't NUL-terminated. Which one it is? I.e. what does p look like
and what's in s?
-
could be use-after-free as well, as CONFIG_PAGEALLOC was enabled.
Ingo
-
Umm... d_path() had just written there, so use-after-free is not too
likely to trigger page fault on read immediately afterwards - you'd
need a pretty tight race to hit it.
-
On Wed, 3 Oct 2007 15:26:01 +0100
I suspect we want the following patch out of general principles; Ingo,
can you see if this one helps?
(if not, it's still worth considering; it looks like we're first
destroying the device object (which holds the name of the directory)
before we unregister the directory... if that fails then we have a mess.Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
--- linux-2.6.23-rc2/net/wireless/core.c~ 2007-10-03 08:04:45.000000000 -0700
+++ linux-2.6.23-rc2/net/wireless/core.c 2007-10-03 08:04:45.000000000 -0700
@@ -133,8 +133,8 @@ void wiphy_unregister(struct wiphy *wiph
mutex_unlock(&drv->mtx);list_del(&drv->list);
- device_del(&drv->wiphy.dev);
debugfs_remove(drv->wiphy.debugfsdir);
+ device_del(&drv->wiphy.dev);mutex_unlock(&cfg80211_drv_mutex);
}
-
update: occasionally the reading of /proc/mounts succeeds, and it's:
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 290
write(1, "rootfs / rootfs rw 0 0\n/dev/root"..., 290rootfs / rootfs rw 0 0
/dev/root / ext3 rw,noatime,nodiratime,data=ordered 0 0
/proc /proc proc rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
/dev/devpts /dev/pts devpts rw 0 0
/dev/sda2 /home ext3 rw,noatime,nodiratime,data=ordered 0 0
nodev /debug debugfs rw 0 0
) = 290
read(3, "", 4096) = 0
close(3) = 0there's nothing particularly interesting in it. (perhaps debugfs)
Ingo
-
disabling debugfs makes the crash go away so it's debugfs related. The
.config delta is below.Ingo
--- .config.broken.000 2007-10-03 10:28:14.000000000 +0200
+++ .config.good.000 2007-10-03 11:11:18.000000000 +0200
@@ -85,7 +85,7 @@ CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_KMOD is not set
CONFIG_BLOCK=y
# CONFIG_LBD is not set
-CONFIG_BLK_DEV_IO_TRACE=y
+# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y
CONFIG_BLK_DEV_BSG=y@@ -631,7 +631,6 @@ CONFIG_CFG80211=y
CONFIG_WIRELESS_EXT=y
CONFIG_MAC80211=y
CONFIG_MAC80211_LEDS=y
-CONFIG_MAC80211_DEBUGFS=y
# CONFIG_MAC80211_DEBUG is not set
CONFIG_IEEE80211=m
CONFIG_IEEE80211_DEBUG=y
@@ -1689,7 +1688,7 @@ CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_ENABLE_MUST_CHECK=y
# CONFIG_MAGIC_SYSRQ is not set
CONFIG_UNUSED_SYMBOLS=y
-CONFIG_DEBUG_FS=y
+# CONFIG_DEBUG_FS is not set
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
@@ -1724,8 +1724,6 @@ CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAIL_MAKE_REQUEST=y
-CONFIG_FAULT_INJECTION_DEBUG_FS=y
-CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
-
it's CONFIG_MAC80211_DEBUGFS=y causing the crash.
Ingo
-
Also...if someone dislikes something in http://kernelnewbies.org/Linux_2_6_23 ,
or wants to fix my english, do it soon :)
-
Heh. The "remove sk98lin driver" bullet is sadly wrong. We had to
reinstate it because it supported some cards that the skge driver doesn't
handle.Linus
-
Thanks, fixed
-
On Tuesday 02 October 2007 04:41:49 Linus Torvalds wrote:
This is certainly a tool issue, but if I use Debian's kernel-image "make-kpkg"
wrapper around the kernel build system, it fails with:cp: cannot stat `arch/x86_64/boot/bzImage': No such file or directory
Obviously, this file has moved to arch/x86/boot, but it seems like possibly
unnecessary breakage. I've been copying bzImage for years from
arch/x86_64/boot, and I'm sure there's a handful of scripts (other than
Debian's kernel-image) doing this too.For now, I hacked the tool[1]. Maybe, if we care, a symlink could be set up
between arch/x86/boot and arch/$ARCH/boot ? Or would papering over this be
more trouble than it's worth?[1] http://devzero.co.uk/~alistair/kernel-package-changes.diff
--
Cheers,
Alistair.137/1 Warrender Park Road, Edinburgh, UK.
-
yeah, a symlink is the right solution i think. Our first-step goal is to
make the switchover seamless for all practical purposes, and a
compatibility symlink in arch/i386/boot/ will not hurt. (we shouldnt
worry about the really old zImage target though)Ingo
-
But when can we then get rid of it?
This is a simple question about when we take the noise..
And right now people know we are shifting to x86 - so it makes
sense to let the dependent userspace tools take the pain now and not later.Starting to fill up a build kernel with symlinks for compatibility with
random progarms seems to be the wrong approach.Sam - that dislike especially the asm symlink
-
Sam,
I completely agree with you, but we want to keep the migration noise
as low as possible. Adding the symlink right now along with an entry
into features-removal.txt (6 month grace period) allows a smoother
transition. The distro folks should better get their gear together
until then.tglx
-
I'll certainly file a bug report with the Debian BTS, but the fix will
probably involve something as abortive as my original patch.How did the PPC merge handle this? I can't see any similar hacks in
kernel-image for these architectures.--
Cheers,
Alistair.137/1 Warrender Park Road, Edinburgh, UK.
-
I believe most sane tools would be using the output of uname -m, so a
possible way to fix this would be fixing the data passed to userspace
from uname. However, that might be the case that it creates a new set
of problems too, with tools relying on the output of uname -m to
determine wheter the machine is 32 or 64 bit, and so on.--
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net"The less confident you are, the more serious you have to act."
-
there are two problems with the use of uname -m:
- the build machine architecture is not necessarily the same as the
target architecture. (for example i cross-compile all my 32-bit
kernels on a 64-bit box.)- we kept uname -m compatile. multilib depends on it, and other pieces
of userspace as well. So uname -m still outputs 'i386' on 32-bit and
'x86_64' on 64-bit - not 'x86'.a symlink looks like the best solution to me.
Ingo
-
Looks pretty good at first glance. Dual-K7, adaptec 29160, NFS, e1000,
root on /dev/sda*. Not even one bad thing to report yet.Cheers,
Willy-
Linus> I said I was hoping that -rc8 was the last -rc, and I hate
Linus> doing this, but we've had more changes since -rc8 than we had
Linus> in -rc8. And while most of them are pretty trivial, I really
Linus> couldn't face doing a 2.6.23 release and take the risk of some
Linus> really stupid brown-paper-bag thing.Linus> So there's a final -rc out there, and right now my plan is to
Linus> make this series really short, and release 2.6.23 in a few
Linus> days. So please do give it a last good testing, and holler
Linus> about any issues you find!Just to let people know, I was running 2.6.23-rc for over 53 days
without any issues. Mix of SCSI, Sata, tape drives, disks, MD, LVM,
SMP, etc. I suspect we've got a pretty darn stable release coming out
soon.John
-
I've been running rc8-git3 since it came out, and while I've built git-5
and will build rc9, I probably will continue testing until I find a bug
or have to boot for some other reason. Running really well, even with a
lot of kvm stuff going on, kernel builds for other machines, etc.--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-
that's pretty impressive! v2.6.23-rc2, right?
Ingo
-
Linus> I said I was hoping that -rc8 was the last -rc, and I hate
Linus> doing this, but we've had more changes since -rc8 than we had
Linus> in -rc8. And while most of them are pretty trivial, I really
Linus> couldn't face doing a 2.6.23 release and take the risk of some
Linus> really stupid brown-paper-bag thing.Linus> So there's a final -rc out there, and right now my plan is to
Linus> make this series really short, and release 2.6.23 in a few
Linus> days. So please do give it a last good testing, and holler
Linus> about any issues you find!John> Just to let people know, I was running 2.6.23-rc for over 53
John> days without any issues. Mix of SCSI, Sata, tape drives, disks,
John> MD, LVM, SMP, etc. I suspect we've got a pretty darn stable
John> release coming out soon.2.6.23-rc2 is what I meant. Oops...
-
Dirt. Booting with "profile=sleep,2" is broken in 2.6.23-rc9 and
2.6.23-rc8 but working in 2.6.22. I was checking it out as part of a
discussion in another thread and noticed it broken in -mm as well
(2.6.23-rc8-mm2). Bisect is in progress but suggestions as to the prime
candidates are welcome or preferably, pointing out that I'm an idiot
because I missed twiddling some config change.2.6.22 output
gringo:~# readprofile | sort -rn
69604 total 0.0309
27287 m_start 243.6339
16430 sync_page 205.3750
13161 sync_buffer 205.6406
4035 sys_init_module 0.6121
2842 msleep 88.8125
2573 call_usermodehelper_keys 10.7208
1554 ps2_sendbyte 6.0703
803 log_wait_commit 2.7882
378 do_lookup 0.9844
160 do_get_write_access 0.1111
89 synchronize_rcu 1.3906
76 ps2_command 0.0792
66 ide_do_drive_cmd 0.2292
59 do_fork 0.1085
54 congestion_wait 0.3750
29 __rtnl_unlock 1.8125
4 kthread 0.0357
2 *unknown*
2 journal_stop 0.0038
1 kthreadd 0.0035
1 kthread_create 0.0063latest git output
gringo:~# readprofile
0 *unknown*
0 total 0.0000I checked the obvious stuff like DEBUG options being set,
-
Mel, does the patch below fix this bug for you? (Note: you will need to
enable CONFIG_SCHEDSTATS=y too.)if yes, then Linus please pull this single fix from:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
| Ingo Molnar (1):
| sched: fix profile=sleep
|
| sched_fair.c | 10 ++++++++++
| 1 file changed, 10 insertions(+)risk is low: the new code only runs with CONFIG_SCHEDSTATS=y
(default:off) and profile=sleep (default:off), so it ought to be fairly
safe to add at this point. (and we had very similar code in v2.6.22
anyway)Ingo
------------------------->
Subject: sched: fix profile=sleep
From: Ingo Molnar <mingo@elte.hu>fix sleep profiling - we lost this chunk in the CFS merge.
Found-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched_fair.c | 10 ++++++++++
1 file changed, 10 insertions(+)Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -639,6 +639,16 @@ static void enqueue_sleeper(struct cfs_rse->block_start = 0;
se->sum_sleep_runtime += delta;
+
+ /*
+ * Blocking time is in units of nanosecs, so shift by 20 to
+ * get a milliseconds-range estimation of the amount of
+ * time that the task spent sleeping:
+ */
+ if (unlikely(prof_on == SLEEP_PROFILING)) {
+ profile_hits(SLEEP_PROFILING, (void *)get_wchan(tsk),
+ delta >> 20);
+ }
}
#endif
}
-
Nice one Ingo - got it first try. The problem commit was
dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 and it's clear that the code removed
in this commit is put back by this latest patch. When applied, profile=sleep
works as long as CONFIG_SCHEDSTAT is set.Tested-by: Mel Gorman <mel@csn.ul.ie>
That said, I am not super-keen on this only working when SCHEDSTAT is set
without telling the user about it. It's not urgent enough to pick up as a
late-late fix but prehaps something like this?=============
profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes the
limitation in Documentation/kernel-parameters.txt and prints a warning at
boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
Documentation/kernel-parameters.txt | 3 ++-
kernel/profile.c | 5 +++++
2 files changed, 7 insertions(+), 1 deletion(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt
--- linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt 2007-10-02 04:24:52.000000000 +0100
+++ linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt 2007-10-02 16:43:41.000000000 +0100
@@ -1395,7 +1395,8 @@ and is between 256 and 4096 characters.
Param: "schedule" - profile schedule points.
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
- Param: "sleep" - profile D-state sleeping (millisecs)
+ Param: "sleep" - profile D-state sleeping (millisecs).
+ Requires CONFIG_SCHEDSTATS to workprocessor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/kernel/profile.c linux-2.6.23-rc9-010_document_profilesleep/kernel/profile.c
--- linux-2.6.23-rc9-005_ingo_profil...
And if it isn't set? I can easily see building a new kernel with stats
off and forgetting to change the boot options.--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-
If CONFIG_SCHEDSTAT is off and profile=sleep is set, you see with Ingo's
patch and readprofile;0 *unknown*
0 total 0.0000That is a tad confusing hence my follow-up patch which would say
"/proc/profile" doesn't exist when readprofile is used and the warning
in dmesg.--
Mel Gorman-
yep - that's the best we can do for the stable release.
We could improve quality of behavior here by not offering /proc/profile
in that case and by printk-ing something if profile=sleep is specified
on a !CONFIG_SCHEDSTATS kernel. I'm willing to apply patches that do
that :)Ingo
-
I included a candidate patch in the last mail but it was shoved down at
the bottom so it could easily have been missed.==============
Subject: Document profile=sleep requiring CONFIG_SCHEDSTATSprofile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes the
limitation in Documentation/kernel-parameters.txt and prints a warning at
boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
Documentation/kernel-parameters.txt | 3 ++-
kernel/profile.c | 5 +++++
2 files changed, 7 insertions(+), 1 deletion(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt
--- linux-2.6.23-rc9-005_ingo_profile_fix/Documentation/kernel-parameters.txt 2007-10-02 04:24:52.000000000 +0100
+++ linux-2.6.23-rc9-010_document_profilesleep/Documentation/kernel-parameters.txt 2007-10-02 16:43:41.000000000 +0100
@@ -1395,7 +1395,8 @@ and is between 256 and 4096 characters.
Param: "schedule" - profile schedule points.
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
- Param: "sleep" - profile D-state sleeping (millisecs)
+ Param: "sleep" - profile D-state sleeping (millisecs).
+ Requires CONFIG_SCHEDSTATSprocessor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc9-005_ingo_profile_fix/kernel/profile.c linux-2.6.23-rc9-010_document_profilesleep/kernel/profile.c
--- linux-2.6.23-rc9-005_ingo_profile_fix/kernel/profile.c 2007-10-02 04:24:52.000000000 +0100
+++ linux-2.6.23-rc9-010_document_profilesleep/kernel/profile.c 2007-10-02 16:44:50.000000000 +0100
@@ -60,6 +60,7 @@ static int __init profile_setup(char * s
int par;if (!strncmp(str, sleepstr, strlen(sleepstr))) {
+#ifdef CONFIG_SCHEDSTATS
prof_on = SLE...
thanks, applied.
Ingo
-
great - thanks for testing it. I'm glad you caught it as sleep=profile
is pretty useful in "why is my system so slow" tests. (which problems
are usually reported _after_ a stable kernel is released ...)Ingo
-
| Artem Bityutskiy | [PATCH 12/44 take 2] [UBI] allocation unit implementation |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Jeff Garzik | Re: [RFC] Heads up on sys_fallocate() |
| Christoph Hellwig | pcmcia ioctl removal |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| David Miller | Re: [BUG] New Kernel Bugs |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
