ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.27-rc1...
- Something in linux-next has broken my X server
- My Vaio is vacationing on the other side of the continent. Ignorance
is bliss.- Lots of people are vacationing at present (certain x86 people, for
example). I'll still be around so please be sure to Cc me on things.But I am unlikely to want to be buried in x86 patches, so please just
give those an extra week or two's testing.Boilerplate:
- See the `hot-fixes' directory for any important updates to this patchset.
- To fetch an -mm tree using git, use (for example)
git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1- -mm kernel commit activity can be reviewed by subscribing to the
mm-commits mailing list.echo "subscribe mm-commits" | mail majordomo@vger.kernel.org
- If you hit a bug in -mm and it is not obvious which patch caused it, it is
most valuable if you can perform a bisection search to identify which patch
introduced the bug. Instructions for this process are athttp://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
But beware that this process takes some time (around ten rebuilds and
reboots), so consider reporting the bug first and if we cannot immediately
identify the faulty patch, then perform the bisection search.- When reporting bugs, please try to Cc: the relevant maintainer and mailing
list on any email.- When reporting bugs in this kernel via email, please also rewrite the
email Subject: in some manner to reflect the nature of the bug. Some
developers filter by Subject: when looking for messages to read.- Occasional snapshots of the -mm lineup are uploaded to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
the mm-commits list. These probably are at least compilable.- More-than-daily -mm sna...
Hello Peter,
I'm seeing similar GCOV problems as with 2.6.26-rc5-mm1 that you fixed.
This is the same x86_64 box and again it was unable to boot with gcov enabled.
A quick look revealed that arch/x86/tsc_64.c and arch/x86/tsc_32.c code was
unified. Unfortunately simple change ofGCOV_tsc_32.o := n
GCOV_tsc_64.o := nto
GCOV_tsc.o := n
did not help. Given the amount of combinations of which set of files with GCOV
might cause failures I was rather fortunate and after a few hours I was able
to pinpoint exactly two files which need GCOV disabled to make my x86_64 boot.If you want to try to figure out what is wrong with them please feel free to send
me patches to test. If not then how about this patch? Compile and run tested.Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
--- linux-2.6.27-rc1-mm1/arch/x86/kernel/Makefile 2008-08-01 18:05:04.000000000 +0200
+++ linux-2.6.27-rc1-mm1-dirty/arch/x86/kernel/Makefile 2008-08-05 21:49:21.000000000 +0200
@@ -13,8 +13,8 @@ CFLAGS_REMOVE_rtc.o = -pg
CFLAGS_REMOVE_paravirt.o = -pg
endif-GCOV_tsc_32.o := n
-GCOV_tsc_64.o := n
+GCOV_vsyscall_64.o := n
+GCOV_tsc.o := n#
# vsyscalls (which work on the user stack) should haveMariusz
--
Your patch looks good. I don't think I will be able to refine those
list of files to be excluded any better than you already did so this
should go into -mm with the other gcov patches.For future reference, there are other object files which "stand out" in
the respective Makefile, namely rtc.o, hpet.o and paravirt.o. Just like
the two files that you identified as causing problems with gcov
profiling, these are explicitly excluded from either FTRACE profiling
or stack-protector checks or both. If there should be further run-time
problems, these are good candidates to check, though I'd like to refrain
from removing them at this point in time without them causing any
apparent problems.Regards,
Peter
--
Make that arch/x86/kernel/tsc_64.c and arch/x86/kernel/tsc_32.c
Mariusz
--
Hello,
$ uname -a
Linux sparc64 2.6.27-rc1-mm1 #1 SMP PREEMPT Sat Aug 2 15:51:55 CEST 2008 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/LinuxLogs get a little flooded with:
BUG: using smp_processor_id() in preemptible [00000000] code: emerge/3217
caller is smp_call_function_mask+0x1c/0x180
Call Trace:
[0000000000486374] smp_call_function_mask+0x14/0x180
[0000000000447f94] tsb_grow+0x2d4/0x420
[000000000040796c] sparc64_realfault_common+0x10/0x20
[000000000045b604] schedule_tail+0x64/0xa0
[0000000000406150] ret_from_syscall+0x8/0x48
BUG: using smp_processor_id() in preemptible [00000000] code: emerge/3220
caller is smp_call_function_mask+0x1c/0x180
Call Trace:
[0000000000486374] smp_call_function_mask+0x14/0x180
[0000000000447f94] tsb_grow+0x2d4/0x420
[000000000040796c] sparc64_realfault_common+0x10/0x20
[000000000045b604] schedule_tail+0x64/0xa0
[0000000000406150] ret_from_syscall+0x8/0x48
BUG: using smp_processor_id() in preemptible [00000000] code: rsync/3220
caller is smp_call_function_mask+0x1c/0x180
Call Trace:
[0000000000486374] smp_call_function_mask+0x14/0x180
[0000000000447f94] tsb_grow+0x2d4/0x420
[000000000040796c] sparc64_realfault_common+0x10/0x20
BUG: using smp_processor_id() in preemptible [00000000] code: rsync/3220
caller is smp_call_function_mask+0x1c/0x180
Call Trace:
[0000000000486374] smp_call_function_mask+0x14/0x180
[0000000000447f94] tsb_grow+0x2d4/0x420
[000000000040796c] sparc64_realfault_common+0x10/0x20
BUG: using smp_processor_id() in preemptible [00000000] code: rsync/3224
caller is smp_call_function_mask+0x1c/0x180
Call Trace:
[0000000000486374] smp_call_function_mask+0x14/0x180
[0000000000447f94] tsb_grow+0x2d4/0x420
[000000000040796c] sparc64_realfault_common+0x10/0x20
[000000000045b604] schedule_tail+0x64/0xa0
[0000000000406150] ret_from_syscall+0x8/0x48
BUG: using smp_processor_id() in preemptible [00000000] code: file/3246
caller is smp_call_function_mask+0x1c/0x180
Call Trace:
[0000000000486374] smp_call...
From: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Thenk for the report and sample patch.
I've decided to put the preemption disabled call at the smp_tsb_sync() call
site so that smp_tsb_sync() can still invoke smp_call_function_mask() as
a tail-call.Thanks again!
sparc64: Need to disable preemption around smp_tsb_sync().
Based upon a bug report by Mariusz Kozlowski
It uses smp_call_function_masked() now, which has a preemption-disabled
requirement.Signed-off-by: David S. Miller <davem@davemloft.net>
---
arch/sparc64/mm/tsb.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)diff --git a/arch/sparc64/mm/tsb.c b/arch/sparc64/mm/tsb.c
index 3547937..587f8ef 100644
--- a/arch/sparc64/mm/tsb.c
+++ b/arch/sparc64/mm/tsb.c
@@ -1,9 +1,10 @@
/* arch/sparc64/mm/tsb.c
*
- * Copyright (C) 2006 David S. Miller <davem@davemloft.net>
+ * Copyright (C) 2006, 2008 David S. Miller <davem@davemloft.net>
*/#include <linux/kernel.h>
+#include <linux/preempt.h>
#include <asm/system.h>
#include <asm/page.h>
#include <asm/tlbflush.h>
@@ -415,7 +416,9 @@ retry_tsb_alloc:
tsb_context_switch(mm);/* Now force other processors to do the same. */
+ preempt_disable();
smp_tsb_sync(mm);
+ preempt_enable();/* Now it is safe to free the old tsb. */
kmem_cache_free(tsb_caches[old_cache_index], old_tsb);
--
1.5.6.GIT--
Hi,
rmmod on ide-cd_mod causes this oops:
BUG: unable to handle kernel paging request at 83535683
IP: [<c0246ffa>] ide_device_put+0xc/0x33
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:05.0/resource
Modules linked in: radeon drm nfsd lockd sunrpc exportfs pcmcia uhci_hcd ehci_hcd usbcore snd_ali5451 yenta_socket pcspkr snd_ac97_codec ac97_bus rsrc_nonstatic snd_pcm snd_timer ati_agp agpgart snd soundcore snd_page_alloc ide_cd_mod(-) cdrom 8139too psmouse sony_laptop backlight floppy rtcPid: 3890, comm: rmmod Not tainted (2.6.27-rc1-mm1 #2)
EIP: 0060:[<c0246ffa>] EFLAGS: 00010286 CPU: 0
EIP is at ide_device_put+0xc/0x33
EAX: 83535657 EBX: dc927a00 ECX: 00000003 EDX: 00000001
ESI: dec34e34 EDI: dec34e34 EBP: d9f46ee0 ESP: d9f46edc
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process rmmod (pid: 3890, ti=d9f46000 task=dd88e780 task.ti=d9f46000)
Stack: dc927c00 d9f46eec dec2e202 dc927c00 d9f46ef8 dec2e225 dd9138dc d9f46f00
c02469e0 d9f46f10 c024156f dd9138dc dd9139f4 d9f46f24 c024162c 00000880
dec34e34 c0397dc0 d9f46f38 c0240a33 00000880 dec34e34 00000000 d9f46f48
Call Trace:
[<dec2e202>] ? ide_cd_put+0x26/0x33 [ide_cd_mod]
[<dec2e225>] ? ide_cd_remove+0x16/0x19 [ide_cd_mod]
[<c02469e0>] ? generic_ide_remove+0x1a/0x1e
[<c024156f>] ? __device_release_driver+0x59/0x7f
[<c024162c>] ? driver_detach+0x97/0x99
[<c0240a33>] ? bus_remove_driver+0x6f/0x8b
[<c02419f1>] ? driver_unregister+0x2f/0x33
[<dec31331>] ? ide_cdrom_exit+0xd/0xf [ide_cd_mod]
[<c014265a>] ? sys_delete_module+0x10d/0x1e2
[<c015fedc>] ? do_munmap+0x1d7/0x234
[<c01e8684>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0103015>] ? sysenter_do_call+0x12/0x35
=======================
Code: ff ff 89 44 24 08 c7 44 24 04 a7 de 35 c0 89 34 24 e8 cb ce f9 ff 31 c0 83 c4 0c 5b 5e 5d c3 55 89 e5 53 89 c3 8b 40 24 8b 40 10 <8b> 40 2c 85 c0 74 12 8b 80 44 01 00 0...
Hi,
Unfortunately, I'm unable to reproduce it here with 2.6.27-rc1-mm1.
Could you please check whether it is drive->hwif or hwif->host exploding?
--
It's ALI M15x3 chipset. .config is attached.
# lspci
00:00.0 Host bridge: ATI Technologies Inc RS200/RS200M AGP Bridge [IGP 340M] (rev 02)
00:01.0 PCI bridge: ATI Technologies Inc PCI Bridge [IGP 340M]
00:03.0 Modem: ALi Corporation M5457 AC'97 Modem Controller
00:04.0 Multimedia audio controller: ALi Corporation M5451 PCI AC-Link Controller Audio Device (rev 02)
00:06.0 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
00:07.0 ISA bridge: ALi Corporation M1533/M1535 PCI to ISA Bridge [Aladdin IV/V/V+]
00:0a.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev aa)
00:0a.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev aa)
00:0a.2 FireWire (IEEE 1394): Ricoh Co Ltd R5C552 IEEE 1394 Controller (rev 02)
00:0c.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
00:0c.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
00:0c.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51)
00:0f.0 IDE interface: ALi Corporation M5229 IDE (rev c4)
00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)I saw it exploding in two ways. I added simple debugging stuff:
--- linux-2.6.27-rc1-mm1/drivers/ide/ide.c 2008-08-02 11:42:05.000000000 +0200
+++ linux-2.6.27-rc1-mm1-dirty/drivers/ide/ide.c 2008-08-02 23:26:52.000000000 +0200
@@ -714,6 +714,21 @@ EXPORT_SYMBOL_GPL(ide_device_get);
void ide_device_put(ide_drive_t *drive)
{
#ifdef CONFIG_MODULE_UNLOAD
+ void *tmp;
+
+ tmp = drive;
+ printk("drive: 0x%p\n", tmp);
+ tmp = drive->hwif;
+ printk("drive->hwif: 0x%p\n", tmp);
+ tmp = drive->hwif->host;
+ printk("drive->hwif->host: 0x%p\n", tmp);
+ tmp = drive->hwif->host->dev;
+ printk("drive->hwif->host->dev: 0x%p\n", tmp);
+ tmp = drive->hwif->host->dev[0];
+ printk("drive->hwif->host->dev[0]: 0x%p\n", tmp);
+ tmp = drive->hwif->host->dev...
[...]
Thanks for debugging this. I see the problem now: previous reference
counting fix was totally fscked up and introduced access to cd->drive
after putting last reference on cd (time to re-supply brown paper bag
stock). The incremental fix (for 2.6.27-rc1-mm1) attached, the fixedDoes it still happen with the 1) fixed?
---
drivers/ide/ide-cd.c | 4 +++-
drivers/ide/ide-disk.c | 4 +++-
drivers/ide/ide-floppy.c | 4 +++-
drivers/ide/ide-tape.c | 4 +++-
drivers/scsi/ide-scsi.c | 4 +++-
5 files changed, 15 insertions(+), 5 deletions(-)Index: b/drivers/ide/ide-cd.c
===================================================================
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -78,9 +78,11 @@ static struct cdrom_info *ide_cd_get(strstatic void ide_cd_put(struct cdrom_info *cd)
{
+ ide_drive_t *drive = cd->drive;
+
mutex_lock(&idecd_ref_mutex);
kref_put(&cd->kref, ide_cd_release);
- ide_device_put(cd->drive);
+ ide_device_put(drive);
mutex_unlock(&idecd_ref_mutex);
}Index: b/drivers/ide/ide-disk.c
===================================================================
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -74,9 +74,11 @@ static struct ide_disk_obj *ide_disk_getstatic void ide_disk_put(struct ide_disk_obj *idkp)
{
+ ide_drive_t *drive = idkp->drive;
+
mutex_lock(&idedisk_ref_mutex);
kref_put(&idkp->kref, ide_disk_release);
- ide_device_put(idkp->drive);
+ ide_device_put(drive);
mutex_unlock(&idedisk_ref_mutex);
}Index: b/drivers/ide/ide-floppy.c
===================================================================
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -179,9 +179,11 @@ static struct ide_floppy_obj *ide_floppystatic void ide_floppy_put(struct ide_floppy_obj *floppy)
{
+ ide_drive_t *drive = floppy->drive;
+
mutex_lock(&idefloppy_ref_mutex);
kref_put(&floppy->kref, idef...
No. I applied your incremental fix and tested it for some time. It doesn't
oops anymore in any way in spite of my best efforts :)Tested-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Thanks,
--
Hi Andrew,
make allyesconfig with 2.6.27-rc1-mm1 kernel on powerpc fails with build error
LD .tmp_vmlinux1
ld: drivers/built-in.o section .devexit.text exceeds stub group size
ld: sound/built-in.o section .devinit.text exceeds stub group size
ld: drivers/built-in.o section .devinit.text exceeds stub group size
ld: net/built-in.o section .exit.text exceeds stub group size
ld: drivers/built-in.o section .exit.text exceeds stub group size
ld: net/built-in.o section .init.text exceeds stub group size
ld: sound/built-in.o section .init.text exceeds stub group size
ld: drivers/built-in.o section .init.text exceeds stub group size
ld: fs/built-in.o section .init.text exceeds stub group size
ld: mm/built-in.o section .init.text exceeds stub group size
ld: kernel/built-in.o section .init.text exceeds stub group size
ld: arch/powerpc/platforms/built-in.o section .init.text exceeds stub group size
ld: arch/powerpc/kernel/built-in.o section .init.text exceeds stub group size
ld: init/built-in.o section .init.text exceeds stub group size
ld: kernel/built-in.o section .sched.text exceeds stub group size
ld: net/built-in.o section .text exceeds stub group size
ld: arch/powerpc/oprofile/built-in.o section .text exceeds stub group size
ld: sound/built-in.o section .text exceeds stub group size
ld: drivers/built-in.o section .text exceeds stub group size
ld: lib/built-in.o section .text exceeds stub group size
ld: tests/built-in.o section .text exceeds stub group size
ld: block/built-in.o section .text exceeds stub group size
ld: crypto/built-in.o section .text exceeds stub group size
ld: security/built-in.o section .text exceeds stub group size
ld: ipc/built-in.o section .text exceeds stub group size
ld: fs/built-in.o section .text exceeds stub group size
ld: mm/built-in.o section .text exceeds stub group size
ld: kernel/built-in.o section .text exceeds stub group size
ld: arch/powerpc/xmon/built-in.o section .text exceeds stub group size
ld: arch/powerpc/platforms/built-in.o section .te...
<snip>
Turning off GCOV "fixes" this. Not really the best solution but at
least it narrows doen the search effort.Peter,
Can you have a look at how this can be fixed, if at all?Yours Tony
linux.conf.au http://www.marchsouth.org/
Jan 19 - 24 2009 The Australian Linux Technical Conference!--
Peter,
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
I did some testing with a cross-compiler myself and I don't think
there is a general solution to this problem. It's not one particular
file that is causing the problem but seemingly the sheer size of the
resulting vmlinux file - even though the toal vmlinux.o size is
"merely" up about 100MiB (from around 1,03Gib to 1,13Gib).I think I'll need help from people with knowledge of the powerpc
toolchain here.Regards,
Peter
--
Am not terribly happy with the state of the gcov patches. They STILL
leave thousands of dead symlinks lying around after `make mrproper' and
generally seem to muck up the kbuild system a bit, although nothing
that a bit of Sam love wouldn't fix.Plus it breaks the build on a few architectures (branch out of range,
mainly), but that's a fairly minor thing which could even be worked
around in Kconfig (disable the offending code if gcov is enabled)--
Have not had time / energy to get aroud to it.
Other things continue to pop up and time is limited at the moment
as in more limited than usual).Sam
--
This is caused by patch
gcov-create-links-to-gcda-files-in-build-directory.patch
which can be simply removed as it is no longer needed since patch
gcov-add-gcov-profiling-infrastructure-revert-link-changes.patch
Hm, by now the only change to kbuild is the addition of gcc options
-fprofile-arcs/-ftest-coverage depending on the respective config
symbols. If there is anything else that should be changed, pleaseSome of the problems caused/uncovered by enabling gcov profiling for
a kernel build on some architectures simply cannot be fixed by a change
to the kernel patch itself. I'm wondering if it would be possible
to disable this configuration option when specifying allyesconfig. That
way at least generic testing wouldn't be affected.Regards,
Peter
--
The only suspicious thing so far is 100% CPU during "time-schedule" test
from LTP:[...]
pth_str03 0 INFO : thread 0 exiting, depth=4, status=0, addr=0xf2b010
pth_str03 0 INFO : The sum of tree (breadth 4, depth 3) is 3570
pth_str03 1 PASS : Test passed
<<<execution_status>>>
duration=0 termination_type=exited termination_id=0 corefile=no
cutime=0 cstime=3
<<<test_end>>>
<<<test_start>>>
tag=time-schedule01 stime=1217504927
cmdline=" time-schedule"[100% CPU here, reproducible]
"strace -p" kicks the test to completion, and shows plenty of
sched_yield() calls, IIRC.--
Andrew,
Please don't do so. We did discuss this and while Paul and Hugh have
opposed the patches, there is no alternative to memory overcommit
handling for cgroups. Claiming that no one supports overcommit is not
a valid argument. Apache (of what I've seen can decide rlimits for
each of it's children). Without the overcommit feature, a cgroup would
be prone to either excessive swapping for OOM (if badly configured). A
friendly feature that allows us to control and fail allocations is
much nicer.I've resolved most of the issues reported, except for the last one by
Hugh. The infrastructure also allows me to build a mlock controller. I
am just back from Canada, I hope to get cracking at the problem soon.Balbir
--
| H. Peter Anvin | Re: [RFC 00/15] x86_64: Optimize percpu accesses |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Eric W. Biederman | Remaining straight forward kthread API conversions... |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
git: | |
