ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23/2.6.23-mm1/ - I've been largely avoiding applying anything since rc8-mm2 in an attempt to stabilise things for the 2.6.23 merge. But that didn't stop all the subsystem maintainers from going nuts, with the usual accuracy. We're up to a 37MB diff now, but it seems to be working a bit better. Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail majordomo@vger.kernel.org - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.23-rc8-mm2: git-acpi.patch git-alsa.patch git-arm.patch git-audit-master.patch git-avr32.patch git-cifs.patch git-cpufreq.patch ...
I get funny SIGBUS' like so:
fault
if (->page_mkwrite() < 0)
nfs_vm_page_mkwrite()
nfs_write_begin()
nfs_flush_incompatible()
nfs_wb_page()
nfs_wb_page_priority()
nfs_sync_mapping_wait()
nfs_wait_on_request_locked()
nfs_wait_on_request()
nfs_wait_bit_interruptible()
return -ERESTARTSYS
SIGBUS
trying to figure out what to do about this...
-
Why? If someone is interrupting the write, then a SIGBUS is pretty much expected. Trond -
Hmmm... It sounds like the fault handler should deliver the appropriate signal, should ->page_mkwrite() return ERESTARTSYS, and then retry the access instruction that caused the fault when the signal handler has finished running. David -
If you signal the process before msync() has completed, or before you have completed unmapping the region then your writes can potentially be lost. Why should we be providing any guarantees beyond that? Trond -
Good point, I'm trying to figure out where my signal is comming from. -
I don't think the fault handler is currently in any position to do that ATM. It is possible to make it interruptible in some contexts, but faults from kernel code may not be able to cope. -
Add missing parenthesis in cfe_writeblk() macro. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> include/asm-mips/fw/cfe/cfe_api.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.23-mm1-a/include/asm-mips/fw/cfe/cfe_api.h 2007-10-12 08:25:46.000000000 +0200 +++ linux-2.6.23-mm1-b/include/asm-mips/fw/cfe/cfe_api.h 2007-10-12 08:37:42.000000000 +0200 @@ -154,7 +154,7 @@ int64_t cfe_getticks(void); #define cfe_readblk(a, b, c, d) __cfe_readblk(a, b, c, d) #define cfe_setenv(a, b) __cfe_setenv(a, b) #define cfe_write(a, b, c) __cfe_write(a, b, c) -#define cfe_writeblk(a, b, c, d __cfe_writeblk(a, b, c, d) +#define cfe_writeblk(a, b, c, d) __cfe_writeblk(a, b, c, d) #endif /* CFE_API_IMPL_NAMESPACE */ int cfe_close(int handle); -
Hi Andrew
My compile just failed with
drivers/scsi/gdth.c: In function ‘gdth_search_dev’:
drivers/scsi/gdth.c:646: warning: ‘pci_find_device’ is deprecated
(declared at include/linux/pci.h:482)
drivers/scsi/gdth.c: In function ‘gdth_init_isa’:
drivers/scsi/gdth.c:857: error: ‘gdth_irq_tab’ undeclared (first use in
this function)
drivers/scsi/gdth.c:857: error: (Each undeclared identifier is reported
only once
drivers/scsi/gdth.c:857: error: for each function it appears in.)
drivers/scsi/gdth.c: In function ‘gdth_copy_internal_data’:
drivers/scsi/gdth.c:2362: warning: unused variable ‘sg’
make[2]: *** [drivers/scsi/gdth.o] Error 1
make[1]: *** [drivers/scsi] Error 2
make: *** [drivers] Error 2
[dhaval@gondor linux-2.6.23]$
Looking into the code I notice that gdth_irq_tab is not declared with
CONFIG_ISA=y and !CONFIG_EISA.
The values seem to be same in 2.6.23 (I am not sure why it has been put
with #ifdefs in -mm) so I have just modified the #ifdef to take care of
CONFIG_ISA as well.
(Compile tested only)
Thanks,
--
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Index: linux-2.6.23/drivers/scsi/gdth.c
===================================================================
--- linux-2.6.23.orig/drivers/scsi/gdth.c 2007-10-12 14:07:28.000000000 +0530
+++ linux-2.6.23/drivers/scsi/gdth.c 2007-10-12 15:06:47.000000000 +0530
@@ -288,7 +288,7 @@ static struct timer_list gdth_timer;
#ifdef CONFIG_ISA
static unchar gdth_drq_tab[4] = {5,6,7,7}; /* DRQ table */
#endif
-#ifdef CONFIG_EISA
+#if defined(CONFIG_EISA) || defined(CONFIG_ISA)
static unchar gdth_irq_tab[6] = {0,10,11,12,14,0}; /* IRQ table */
#endif
static unchar gdth_polling; /* polling if TRUE */
--
regards,
Dhaval
-
On Thu, 11 Oct 2007 21:31:26 -0700 On RHEL5/x86_64 environment, == [kamezawa@hannibal ref-2.6.23-mm1]$ make menuconfig Makefile:456: /home/kamezawa/ref-2.6.23-mm1/arch//Makefile: No such file or directory make: *** No rule to make target `/home/kamezawa/ref-2.6.23-mm1/arch//Makefile'. Stop. == $(ARCH) cannot be detected automatically... What information is useful for fixing this ? Thanks, -Kame -
More serious breakage happened to UML - include/asm-um/arch went straight to hell; I'll look into fixing that tomorrow... -
I always forget to test uml. But a quick test build seems to work until it hits this: arch/um/drivers/slip_kern.c: In function 'slip_init': arch/um/drivers/slip_kern.c:34: error: 'struct net_device' has no member named 'header_cache_update' arch/um/drivers/slip_kern.c:35: error: 'struct net_device' has no member named 'hard_header_cache' arch/um/drivers/slip_kern.c:36: error: 'struct net_device' has no member named 'hard_header' <looks at networking people> -
Fix hard_header for net-2.6 (2.6.23-mm1). Please test this patch, unfortunately the tree it came from won't build UML, so it isn't possible to give it a proper check. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> --- a/arch/um/drivers/slip_kern.c 2007-10-11 13:16:07.000000000 -0700 +++ b/arch/um/drivers/slip_kern.c 2007-10-12 11:06:00.000000000 -0700 @@ -30,9 +30,7 @@ void slip_init(struct net_device *dev, v slip_proto_init(&spri->slip); dev->init = NULL; - dev->header_cache_update = NULL; - dev->hard_header_cache = NULL; - dev->hard_header = NULL; + dev->hard_header_ops = NULL; dev->hard_header_len = 0; dev->addr_len = 0; dev->type = ARPHRD_SLIP; --- a/arch/um/drivers/slirp_kern.c 2007-10-11 13:16:07.000000000 -0700 +++ b/arch/um/drivers/slirp_kern.c 2007-10-12 11:05:52.000000000 -0700 @@ -31,9 +31,7 @@ void slirp_init(struct net_device *dev, dev->init = NULL; dev->hard_header_len = 0; - dev->header_cache_update = NULL; - dev->hard_header_cache = NULL; - dev->hard_header = NULL; + dev->hard_header_ops = NULL; dev->addr_len = 0; dev->type = ARPHRD_SLIP; dev->tx_queue_len = 256; -
Umm... Dies much faster here:
include/asm-um/arch:
@echo ' SYMLINK $@'
ifneq ($(KBUILD_SRC),)
$(Q)mkdir -p $(objtree)/include/asm-um
$(Q)ln -fsn $(srctree)/include/asm-$(SUBARCH) include/asm-um/arch
else
$(Q)cd $(TOPDIR)/include/asm-um && ln -sf ../asm-$(SUBARCH) arch
endif
gives a symlink from include/asm-um/arch to include/asm-i386 or
include/asm-x86_64, so e.g.
#ifndef __UM_POSIX_TYPES_H
#define __UM_POSIX_TYPES_H
#include "asm/arch/posix_types.h"
#endif
in asm-um/posix_types.h blows instantly. Try to build on a tree without
stale symlinks...
-
Hi Andrew,
I noticed a regression between 2.6.23-rc8-mm2 and 2.6.23-mm1 (with your
hotfixes). User space threads seems to receive a ERESTART_RESTARTBLOCK
as soon as a thread does a pthread_join on them. The previous behavior
was to wait for them to exit by taking a futex.
I provide a toy program that shows the problem. On 2.6.23-rc8-mm2, it
loops forever (as it should). On 2.6.23-mm1, it exits after 10 seconds.
Any idea on what may cause this problem ?
(I also provide complete ptrace -f of a correct and buggy run and my
kernel config. Tests were done on i386.)
Mathieu
/*
* Thread testing
*
* build with gcc -lpthread -o pthread pthread.c
*
* Mathieu Desnoyers
* License: GPL
*/
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
static int __thread test = 0;
/* signal handler */
void handler(int signo)
{
printf("Sig handler : TID %lu, pid : %lu\n", pthread_self(), getpid());
}
void *thr1(void *arg)
{
test = 1;
while(1) {
printf("thread 1, thread id : %lu, pid %lu, test %d\n",
pthread_self(), getpid(), test);
sleep(2);
}
return ((void*)1);
}
void *thr2(void *arg)
{
while(1) {
printf("thread 2, thread id : %lu, pid %lu, test %d\n",
pthread_self(), getpid(), test);
sleep(2);
}
return ((void*)2);
}
int main()
{
int err;
pthread_t tid1, tid2;
void *tret;
static struct sigaction act;
act.sa_handler = handler;
sigemptyset(&(act.sa_mask));
sigaddset(&(act.sa_mask), SIGUSR1);
sigaction(SIGUSR1, &act, NULL);
err = pthread_create(&tid1, NULL, thr1, NULL);
if (err != 0)
exit(1);
err = pthread_create(&tid2, NULL, thr2, NULL);
if (err != 0)
exit(1);
sleep(10);
err = pthread_join(tid1, &tret);
if (err != 0)
exit(1);
err = pthread_join(tid2, &tret);
if (err != 0)
exit(1);
return 0;
}
---------------
strace -f ./pthread ...On Fri, 12 Oct 2007 15:47:59 -0400 No idea. But I can reproduce it here so I'll bisect it now. Thanks for the test case! -
On Fri, 12 Oct 2007 15:47:59 -0400 Bisection shows that this problem is caused by these two patches: pid-namespaces-allow-cloning-of-new-namespace.patch -
No, the reason is that pthread_join() succeeds while it shouldn't. The main
thread does exit_group() and kills the sub-thread sleeping in nanosleep.
ERESTART_RESTARTBLOCK is not delivered to the user-space (sub-thread is dying),
I bet something like this
void *threda(void *arg)
{
for (;;)
pause();
return NULL;
}
int main(void)
{
pthread_t tid;
pthread_create(&tid, NULL, thread, NULL);
pthread_join(tid, NULL);
return 0;
}
Because do_fork() doesn't use parent_tidptr. At all! So it is very clear
This? http://marc.info/?l=linux-mm-commits&m=118712242002039
Pavel, this patch has a subtle difference compared to what we discussed on
containers list. It moves put_user(parent_tidptr) from copy_process() to
do_fork(), so we don't report child's pid if copy_process() failed. I do
not think this is bad, but Eric seems to disagree with such a change.
But I can't understand why Andrew sees the same problem _after_ this patch!
And which patch removed the "put_user(nr, parent_tidptr)" chunk?
Andrew, could I get the kernel source after bisection somehow? (I am not
familiar with guilt, will try to study it later)
Mathieu, could you try the patch below?
Oleg.
--- kernel/fork.c~ 2007-10-13 15:41:35.000000000 +0400
+++ kernel/fork.c 2007-10-13 15:41:41.000000000 +0400
@@ -1443,6 +1443,9 @@ long do_fork(unsigned long clone_flags,
task_pid_nr_ns(p, current->nsproxy->pid_ns) :
task_pid_vnr(p);
+ if (clone_flags & CLONE_PARENT_SETTID)
+ put_user(nr, parent_tidptr);
+
if (clone_flags & CLONE_VFORK) {
p->vfork_done = &vfork;
init_completion(&vfork);
-
Aha. I am looking at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23/2.6.23-mm1/broken... Looks like the original patch was damaged somehow, it doesn't have the "put_user(nr, parent_tidptr)" code. Oleg. -
It does have it, except it moved somewhere else. That would have been me trying to fix yet another reject storm. I thought I had that one right. Could someone fix it please? -
Hi Oleg, Yes, it runs fine with this patch. Thanks, -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -
On Thu, 11 Oct 2007 23:42:02 -0700 Yes, I need to set it by hand. This is a quick fix for me...... Maybe $(ARCH) should be undefined until following .kbuild check. ifneq ($(wildcard .kbuild),) ... else ARCH ?= $(SUBARCH) endif -Kame -- Index: ref-2.6.23-mm1/Makefile =================================================================== --- ref-2.6.23-mm1.orig/Makefile +++ ref-2.6.23-mm1/Makefile @@ -191,7 +191,6 @@ SUBARCH := $(shell uname -m | sed -e s/i # The empty ARCH and CROSS_COMPILE statements exist so it is easy to # patch in hardcoded values for ARCH and CROSS_COMPILE -ARCH ?= CROSS_COMPILE ?= # Kbuild save the ARCH and CROSS_COMPILE setting in .kbuild -
That line came in on request from Andi/Novell. And I tested it regioursly with several of my cross compile setups. But never with native - silly me. But that patch has other issues too so I will withdraw it until I have fixed the other annoying issues. We are simply to gcc happy in the top-lvel makefile and we run it several times for no good reasons when we for example do maouldes_install or headers_isntall. Sam -
After setting ARCH by hand, it build and booted OK for me. But I did add the patch fromhttp://lkml.org/lkml/2007/10/11/48 as my personal hotfix. Two things I noted in my logs: [ 16.040000] NET: Registered protocol family 1 [ 16.050000] NET: Registered protocol family 17 [ 16.060000] NET: Registered protocol family 15 [ 16.080000] sysctl table check failed: /sunrpc/transports .7249.14 Missing strategy [ 16.100000] sysctl table check failed: /sunrpc/transports .7249.14 Unknown sysctl binary path [ 16.130000] RPC: Registered udp transport module. [ 16.140000] RPC: Registered tcp transport module. ... but NFSv4 still works. Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in smartd database. Oct 12 10:23:03 treogen [ 105.990000] WARNING: at drivers/ata/libata-core.c:5752 ata_qc_issue() Oct 12 10:23:03 treogen [ 105.990000] Oct 12 10:23:03 treogen [ 105.990000] Call Trace: Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>] ata_qc_issue+0x47f/0x540 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>] ata_scsi_flush_xlat+0x0/0x30 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044a6ea>] ata_scsi_translate+0xfa/0x180 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044d84d>] ata_scsi_queuecmd+0x12d/0x210 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804333d0>] scsi_dispatch_cmd+0x150/0x250 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804391f1>] scsi_request_fn+0x1f1/0x360 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039b827>] elv_insert+0x167/0x250 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff803a0ac2>] __make_request+0xe2/0x670 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d560>] generic_make_request+0x1d0/0x3c0 Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc1b9>] bio_alloc_bioset+0xb9/0x140 Oct 12 10:23:03 treogen [ ...
I would more expect Jens, as the breakage in ata_sg_is_last comes through the sglist patches from the block gittree. smartd always said that. Never thought that it would matter. And it also say this about the other two identical drives that are connected via the SiI 3132 instead the MCP55. And until now smartd worked with this drive, logging temperature changes into /var/log/messages. hm: Even with the warnings below it does that: Oct 12 10:53:25 treogen smartd[6095]: Device: /dev/sdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 57 to 58 Oct 12 11:23:26 treogen smartd[6095]: Device: /dev/sdc, SMART Usage Attribute: 190 Temperature_Celsius changed from 51 to 50 Oct 12 11:23:26 treogen smartd[6095]: Device: /dev/sdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 49 to 50 Oct 12 13:23:25 treogen smartd[6095]: Device: /dev/sdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 57 But I have not seen any new WARNINGs... -
On the next boot no WARNING show up. On the third boot with 2.6.23-mm1 the drive failed completely: First I got this WARNING: Oct 13 07:46:48 treogen smartd[6081]: Device: /dev/sdc, opened Oct 13 07:46:48 treogen [ 99.850000] WARNING: at drivers/ata/libata-core.c:5761 ata_qc_issue() Oct 13 07:46:48 treogen [ 99.850000] Oct 13 07:46:48 treogen [ 99.850000] Call Trace: Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044431a>] ata_qc_issue+0x4aa/0x540 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044ce30>] ata_scsi_pass_thru+0x0/0x2c0 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044a6ea>] ata_scsi_translate+0xfa/0x180 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044d84d>] ata_scsi_queuecmd+0x12d/0x210 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff804333d0>] scsi_dispatch_cmd+0x150/0x250 Oct 13 07:46:48 treogen smartd[6081]: Device: /dev/sdc, not found in smartd database. Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff804391f1>] scsi_request_fn+0x1f1/0x360 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8039f362>] blk_execute_rq_nowait+0x62/0xb0 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8039f446>] blk_execute_rq+0x96/0x110 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8039f5b1>] get_request_wait+0x21/0x1a0 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8022c8ea>] __wake_up_common+0x5a/0x90 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80438e14>] scsi_execute+0xe4/0x120 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044cb14>] ata_cmd_ioctl+0x124/0x270 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044cd67>] ata_scsi_ioctl+0x107/0x1d0 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8043424c>] scsi_ioctl+0xbc/0x330 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff803a14f3>] blkdev_driver_ioctl+0x93/0xa0 Oct 13 07:46:48 treogen [ 99.850000] ...
The WARNING indicates that there is a SWNCQ bug in sata_nv. Given that the problem appears when SYNCHRONIZE CACHE is being issued, I would guess that sata_nv is not properly handling non-queued commands. NVIDIA CC'd. This is a patch from libata-dev.git#nv-swncq (via #ALL). Jeff -
I can't follow you on SYNCHRONIZE CACHE. The only command written to the syslog in the errors where 0x60==ATA_CMD_FPDMA_READ and 0xB0 (which is not in include/linux/ata.h, but ATA-6 says that this is SMART related. That But that still seems correct, as I would not expect that SMART commands get queued. (Thats just a guess, as I did not try to find the Comparing sata_nv.c from 2.6.23-rc8-mm1 and 2.6.23-mm1 I see two changes, that look suspicious: http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=commitdiff;h=31cc23... The comment says: "ahci and sata_sil24 are converted to use ata_std_qc_defer()." But the patch also adds ".qc_defer = ata_std_qc_defer," to sata_nv.c The second change is the removal of the 'lock' spinlock from sata_nv.c that was used in nv_swncq_qc_issue and nv_swncq_host_interrupt. Should I try to revert one or both of these changes? Torsten -
In the traceback you have "ata_scsi_flush_xlat", which is the function
that translates a SCSI sync-cache command into an ATA flush-cache command.
The "WARNING: at drivers/ata/libata-core.c:5752 ata_qc_issue()" also
guides us to the code comment
/* Make sure only one non-NCQ command is outstanding. The
* check is skipped for old EH because it reuses active qc to
* request ATAPI sense.
*/
which is a check related to NCQ->off and off->NCQ edge cases.
If you are git-capable, IMO the next steps in problem elimination should be
* download latest linux-2.6.git (currently
752097cec53eea111d087c545179b421e2bde98a)
* build and test linux-2.6.git, to establish a new baseline
* download latest libata-dev.git#nv-swncq (currently
3cb664c2d319a4fde5028c3c5dab6221fe70bd2d)
* build and test, with sata_nv module option swncq=0
* build and test, with sata_nv module option swncq=1
That will get -mm out of the picture, use the same baseline kernel for
all three tests (nv-swncq is based off of
752097cec53eea111d087c545179b421e2bde98a) and narrow things down to the
precise changes that went upstream (or are on the 'nv-swncq' branch,
waiting to go upstream).
My gut feeling is that there is a lingering bug in sata_nv SWNCQ somewhere.
Jeff
-
Aha. That makes sense. But on the second error, where the drive was kicked out completely all three traces did not have ata_scsi_flush_xlat. First WARNING: Oct 13 07:46:48 treogen [ 99.850000] Call Trace: Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044431a>] ata_qc_issue+0x4aa/0x540 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044ce30>] ata_scsi_pass_thru+0x0/0x2c0 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044a6ea>] ata_scsi_translate+0xfa/0x180 Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20 ... Second+Third: Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff804442ef>] ata_qc_issue+0x47f/0x540 Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff80432e60>] scsi_done+0x0/0x20 Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff8044a440>] ata_scsi_rw_xlat+0x0/0x1b0 Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff8044a6ea>] ata_scsi_translate+0xfa/0x180 Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff80432e60>] scsi_done+0x0/0x20 ... But I very much agree about this. But rather than 'normal' edges with the cache flushes, I would blame it on the SMART commands from smartd that trigger the switch. Looking more at this patch, I thing the code change is correct and only the comment is missing sata_nv. (Only ahci, sil24 and nv seem to ... I should really take the time install this, but I don't think git That commit (3cb664c2d319a4fde5028c3c5dab6221fe70bd2d) seems to be the only commit relevant to swncq, as it adds it completely without any I will try this. Currently I have sata_nv.swncq=1 in my kernel commandline so its trivial to change that. Older versions of SWNCQ already worked for me, so I don't think its a general problem. And as the symptoms would nicely fit into a race condition when manipulating the NCQ state, the removal of the lock ...
Wait! I think I found the bug: Its a evil interaction between the above patch and the swncq patch that is applied later. The qc_defer patch removes the old ata_scmd_need_defer that was always called for all drivers and substitutes it for ata_std_qc_defer and adds it as aops->qc_defer to all drivers that support NCQ *at that point*. Then the swncq patch adds a new NCQ capable driver, but the nobody added the qc_defer-ops to the ops-structure that is added. So swncq will never defer any commands and the first command that would need to be defered (the SMART commands) blows up, if there is still another command in flight. I will only add the qc_defer and try this... Torsten -
3 boots, all worked. So I'm very sure that was the bug, but I will now do a little load testing... The only strange thing about 2.6.23-mm1 is, that it takes ~4 second more to boot. 2.6.23-rc8-mm1: [ 3.720000] scsi0 : sata_sil24 [ 3.730000] scsi1 : sata_sil24 [ 3.740000] ata1: SATA max UDMA/100 irq 17 [ 3.750000] ata2: SATA max UDMA/100 irq 17 [ 4.110000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 4.160000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 [ 4.180000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 4.240000] ata1.00: configured for UDMA/100 [ 4.600000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 4.660000] ata2.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 [ 4.680000] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 4.730000] ata2.00: configured for UDMA/100 2.6.23-mm1: [ 3.650000] scsi0 : sata_sil24 [ 3.660000] scsi1 : sata_sil24 [ 3.660000] ata1: SATA max UDMA/100 host m128@0xefeffc00 port 0xefef8000 irq 17 [ 3.690000] ata2: SATA max UDMA/100 host m128@0xefeffc00 port 0xefefa000 irq 17 [ 5.930000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 5.980000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 [ 6.000000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 6.060000] ata1.00: configured for UDMA/100 [ 8.290000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 8.340000] ata2.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 [ 8.360000] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 8.420000] ata2.00: configured for UDMA/100 Torsten -
So, you basically applied the attached patch? Yeah, absence of qc_defer for an NCQ-capable chip would do it. Jeff
Yes. The system seems to work correctly now. The only thing I noted during load testing (updating Gentoo == compiling and installing) was, that there seems to be memory leak. After ~2h 2.5 of my 4Gb where gone. But there where to many things -
Please send /proc/meminfo and /proc/slabinfo after the leak has been happening for a while. Sometimes `echo m > /proc/sysrq_trigger ; dmesg -s 1000000' will provide useful info. The page-owner code can pinpoint a leak source. See ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23/2.6.23-mm1/broken... Enable CONFIG_DEBUG_SLAB_LEAK, check out /proc/slab_allocators -
I don't have the meminfo or slabinfo, only the output from SysRq+M: SysRq : Show Memory Mem-info: Node 0 DMA per-cpu: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 173 Cold: hi: 62, btch: 15 usd: 29 CPU 1: Hot: hi: 186, btch: 31 usd: 69 Cold: hi: 62, btch: 15 usd: 4 CPU 2: Hot: hi: 186, btch: 31 usd: 82 Cold: hi: 62, btch: 15 usd: 13 CPU 3: Hot: hi: 186, btch: 31 usd: 71 Cold: hi: 62, btch: 15 usd: 3 Node 1 DMA32 per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 171 Cold: hi: 62, btch: 15 usd: 0 CPU 1: Hot: hi: 186, btch: 31 usd: 57 Cold: hi: 62, btch: 15 usd: 0 CPU 2: Hot: hi: 186, btch: 31 usd: 171 Cold: hi: 62, btch: 15 usd: 6 CPU 3: Hot: hi: 186, btch: 31 usd: 158 Cold: hi: 62, btch: 15 usd: 7 Node 1 Normal per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0 CPU 1: Hot: hi: 186, btch: 31 usd: 0 Cold: hi: 62, btch: 15 usd: 0 CPU 2: Hot: hi: 186, btch: 31 usd: 170 Cold: hi: 62, btch: 15 usd: 13 CPU 3: Hot: hi: 186, btch: 31 usd: 172 Cold: hi: 62, btch: 15 usd: 19 Active:236368 inactive:63289 dirty:365 writeback:0 unstable:0 free:28366 slab:43372 mapped:13718 pagetables:2356 bounce:0 Node 0 DMA free:8048kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:8876kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 2004 2004 2004 Node 0 DMA32 free:98364kB min:4040kB low:5048kB high:6060kB active:527764kB inactive:107636kB present:2052320kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 1 DMA32 free:5824kB min:3040kB low:3800kB ...
As I'm using SLUB there is no /proc/slabinfo.
I have attached any files that looked remotely related.
After I have seen ~2Gb leaked again, I took the first set of outputs.
atop showed at that point:
MEM | tot 3.9G | free 648.7M | cache 814.5M | buff 0.0M | slab 131.5M |
SWP | tot 9.3G | free 9.3G | | vmcom 16.7M | vmlim 11.3G |
free showed:
total used free shared buffers cached
Mem: 4061808 3396852 664956 0 28 834012
-/+ buffers/cache: 2562812 1498996
Swap: 9775416 60 9775356
I then tried to build 2.6.23-mm1 with PAGE_OWNER enabled. During
modpost the system started swapping like mad, I needed to abort the
build, but was still able to shut down the system normal.
At that point I took the secound set of outputs (files called *.end)
free showed at that time:
total used free shared buffers cached
Mem: 4061808 3777084 284724 0 0 2032
-/+ buffers/cache: 3775052 286756
Swap: 9775416 4904 9770512
Did that. The output of /proc/page_owner is ~350Mb, gzipped still ~7Mb.
Taking only the first line from each stackdump it shows the following counts:
73 [0xffffffff8020a13e] __switch_to+430
6 [0xffffffff8020a236] __switch_to+678
8 [0xffffffff8020ac30] default_idle+0
1 [0xffffffff8020ba3f] sys_rt_sigreturn+879
3 [0xffffffff8020bbbe] system_call+126
115 [0xffffffff802132e1] dma_alloc_pages+177
1 [0xffffffff802191fb] __smp_call_function_mask+235
1 [0xffffffff8021970b] flush_tlb_page+75
8 [0xffffffff8021c1b7] mp_register_gsi+71
3 [0xffffffff8021c25b] mp_register_gsi+235
26 [0xffffffff8021ec6d] flush_gart+13
2 [0xffffffff8021ec7f] flush_gart+31
28 [0xffffffff8021ecda] gart_map_simple+58
1 [0xffffffff8021f048] gart_map_sg+680
15 [0xffffffff8021f2af] k8_flush_garts+191
4 ...This one is suspicious. Can you find the whole record for it? The other info shows a tremendous memory leak, not via slab. Looks like someone is running alloc_pages() directly and isnb't giving them back. -
I still have all 354042 records of it. ;)
The first column is the times I found this line in page_owner.
I divided the counts for the duplicate lines (mempool_alloc+83 and
kcryptd_do_crypt+0) by two, so normalize them. There still are some
false positive counts in there, so it does not match the 354042
precisely.
354036 Page allocated via order 0, mask 0x11202
1 (PFN/Block always differ) PFN 3072 Block 6 type 0 Flags
354338 [0xffffffff80266373] mempool_alloc+83
354338 [0xffffffff80266373] mempool_alloc+83
354025 [0xffffffff802bb389] bio_alloc_bioset+185
354058 [0xffffffff804d2b40] kcryptd_do_crypt+0
354052 [0xffffffff804d2cc7] kcryptd_do_crypt+391
354058 [0xffffffff804d2b40] kcryptd_do_crypt+0
354052 [0xffffffff80245d3c] run_workqueue+204
354062 [0xffffffff802467b0] worker_thread+0
Blaming it on dm-crypt looks right, as the leak seems to happens, if
there is (heavy) disk activity.
(updatedb just ate ~500 Mb)
Torsten
-
err, take another look at the changelog in page-owner-tracking-leak-detector.patch. It directs you to Documentation/page_owner.c which aggregates the contents of Yup, it does appear that dm-crypt is leaking. Let's add some cc's. Thanks for testing -mm and for reporting this. -
More precisely - change below from git-block.patch update
caused that pages are not deallocated at all.
(cc-ing Jens)
-static int crypt_endio(struct bio *clone, unsigned int done, int error)
+static void crypt_endio(struct bio *clone, int error)
...
- * free the processed pages, even if
- * it's only a partially completed write
+ * free the processed pages
*/
- if (!read_io)
- crypt_free_buffer_pages(cc, clone, done);
-
- /* keep going - not finished yet */
- if (unlikely(clone->bi_size))
- return 1;
-
- if (!read_io)
+ if (!read_io) {
+ crypt_free_buffer_pages(cc, clone, clone->bi_size);
goto out;
+ }
clone->bi_size is zero here now, so crypt_free_buffer_pages will not
work correctly (previously there was count of processed bytes).
But because it seems that bio cannot be processed partially now, we can
simplify crypt_free_buffer_pages to always remove all allocated pages.
Milan
--
mbroz@redhat.com
-
Neil, this doesn't look very good. dm-crypt needs to know the clone io size, so ->bi_size was definitely used properly in this context before. Now it's gone. Suggestions on how to fix that up? I've been less than impressed with the bi_end_io() patchset so far, it's been full of typos and bad conversions. I'm tempted to revert the whole thing, clearly it wasn't ready for merge. -- Jens Axboe -
Noting the -v on the grep command, now I understand that this program does in fact what I need. Not reading this correctly I assumed it collects information only Top3 from the page_owner-util: 353978 times: Page allocated via order 0, mask 0x11202 [0xffffffff80266373] mempool_alloc+83 [0xffffffff80266373] mempool_alloc+83 [0xffffffff802bb389] bio_alloc_bioset+185 [0xffffffff804d2b40] kcryptd_do_crypt+0 [0xffffffff804d2cc7] kcryptd_do_crypt+391 [0xffffffff804d2b40] kcryptd_do_crypt+0 [0xffffffff80245d3c] run_workqueue+204 [0xffffffff802467b0] worker_thread+0 45065 times: Page allocated via order 0, mask 0x1201d2 [0xffffffff805ae2c2] __down_read+18 [0xffffffff8026c246] __do_page_cache_readahead+230 [0xffffffff8026c576] ondemand_readahead+278 [0xffffffff80264185] do_generic_mapping_read+629 [0xffffffff802635f0] file_read_actor+0 [0xffffffff80265bbe] generic_file_aio_read+254 [0xffffffff8037a98b] xfs_read+347 [0xffffffff8036b793] xfs_access+67 33008 times: Page allocated via order 0, mask 0x1201d2 [0xffffffff8026c246] __do_page_cache_readahead+230 [0xffffffff8026c576] ondemand_readahead+278 [0xffffffff8026404e] do_generic_mapping_read+318 [0xffffffff802635f0] file_read_actor+0 [0xffffffff80265bbe] generic_file_aio_read+254 [0xffffffff8037a98b] xfs_read+347 [0xffffffff8036b793] xfs_access+67 Torsten -
cross compile work but native compile doesn't anymore :( Here's a tmp fix. Thanks, C. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: 2.6.23-mm1/Makefile =================================================================== --- 2.6.23-mm1.orig/Makefile +++ 2.6.23-mm1/Makefile @@ -191,7 +191,7 @@ SUBARCH := $(shell uname -m | sed -e s/i # The empty ARCH and CROSS_COMPILE statements exist so it is easy to # patch in hardcoded values for ARCH and CROSS_COMPILE -ARCH ?= +ARCH ?= $(SUBARCH) CROSS_COMPILE ?= -
I get many traces similar to the one below from it (w/ hotfixes): WARNING: at /home/rafael/src/mm/linux-2.6.23-mm1/arch/x86_64/kernel/smp.c:397 smp_call_function_mask() Call Trace: [<ffffffff8021b290>] smp_call_function_mask+0x4b/0x82 [<ffffffff8021b2ea>] smp_call_function+0x23/0x25 [<ffffffff884a0b80>] :processor:acpi_processor_latency_notify+0x19/0x20 [<ffffffff80437ace>] notifier_call_chain+0x33/0x65 [<ffffffff8024f32f>] __srcu_notifier_call_chain+0x4b/0x69 [<ffffffff8024f07c>] pm_qos_add_requirement+0x24/0xd2 [<ffffffff8024f35c>] srcu_notifier_call_chain+0xf/0x11 [<ffffffff8024ee6d>] update_target+0x71/0x76 [<ffffffff8024f101>] pm_qos_add_requirement+0xa9/0xd2 [<ffffffff88160bf9>] :snd_pcm:snd_pcm_hw_params+0x349/0x382 [<ffffffff80291110>] kmem_cache_alloc+0x8a/0xbc [<ffffffff88160d75>] :snd_pcm:snd_pcm_hw_params_user+0x50/0x87 [<ffffffff88160fe1>] :snd_pcm:snd_pcm_common_ioctl1+0x1ae/0xd4f [<ffffffff8815f755>] :snd_pcm:snd_pcm_open+0xd6/0x1f2 [<ffffffff8028fc17>] cache_alloc_debugcheck_after+0x11a/0x199 [<ffffffff8024b514>] remove_wait_queue+0x40/0x45 [<ffffffff8815f7bd>] :snd_pcm:snd_pcm_open+0x13e/0x1f2 [<ffffffff8022f18e>] default_wake_function+0x0/0xf [<ffffffff8030b24d>] prio_tree_insert+0x18c/0x231 [<ffffffff8027b5fb>] vma_prio_tree_insert+0x23/0x39 [<ffffffff80282e91>] vma_link+0xdd/0x10b [<ffffffff8816206f>] :snd_pcm:snd_pcm_playback_ioctl1+0x24d/0x26a [<ffffffff8816292c>] :snd_pcm:snd_pcm_playback_ioctl+0x2e/0x36 [<ffffffff802a0896>] do_ioctl+0x2a/0x77 [<ffffffff802a0b34>] vfs_ioctl+0x251/0x26e [<ffffffff802a0ba8>] sys_ioctl+0x57/0x7b [<ffffffff8020bfde>] system_call+0x7e/0x83 Full dmesg attached. Greetings, Rafael
This is from : WARN_ON(irqs_disabled()) in the cmp_call_function_mask processor_idle.c is registering a acpi_processor_latency_notify my code changed the notifier call from blocking_notifier_call_chain to srcu_notifier_call_chain, because dynamic creation of notifier chains at runtime where easier with the srcu_notifier_call_chain than the blocking_notifier_call_chain. As dynamic creation of PM_QOS parameters are no longer needed I can change the notifiers back to match what was in lanency.c However; looking at the call tree differences between blockin_notifier_call_chain and srcu_notifier_call_chain I cannot see a difference in irq enabling / disabling. I'm not confident this will address this yet. I'll change the PM_QOS params patch to use blocking notifiers and test on a 64bit boot and see what happens. I've been needing to setup my x86_64 dev box for a while now anyway. thanks, >
I think I'll have to send you a patch that changes the notifiers but I doubt it will fix it. After a bit of messing around I have the 2.6.23-mm1 running on my core-2 box note: Ubuntu's make-kpkg on the mm1 tree resulted in a system that wouldn't boot past the intrd. Looks like the pivot root failed or something. Anyway, I'm not reproducing your experience, snd_pcm is loaded. I don't know none of the WARN's are not hitting on my box. do you have some configuration information that could help me reproduce well its booting but I'm not reproducing the trace messages. I'll do the patch for you to test. --mgross -
Well, I can send you the .config, but the box is AMD-based (Turion 64 X2), with an ATI chipset and an HP BIOS, so it seems to be much different from yours. Greetings, Rafael -
it may be worth a shot anyway. BTW while changing my code to use the blocking notifiers I found that there is a initialization race between cpu-idle and pm_qos I have to fix. I need to re factor my start up code to handle cpuidle registering itself in as a notifier at core_initcall time. I'll have a patch ready tomorrow. thanks, --mgross -
I didn't see my patch show up on the list so I'm resending it.
The following is a patch to update the pm_qos code in the mm1 tree. It
removes the PM_QOS_CPUIDLE parameter (replacing it with
PM_CPU_DMA_LATENCY), It changes the notifications from srcu to blocking
in hopes of fixing the WARNS reported by xxx, and it changes the
initialization to me largely static to avoid initialization race with
cpu-idle.
I think we will have to re-visit the static vrs dynamic initialization
and this init race in a while to support pm_qos parameters per power
domain (i.e. per cpu-socket) based on platform information (ACPI) but
for now lets see if this fixes the warning's reported.
Thanks,
Signed-off-by: mark gross <mgross@linux.intel.com>
Binary files linux-2.6.23-mm1/arch/x86_64/ia32/vsyscall-syscall.so.dbg and linux-2.6.23-mm1-pmqos/arch/x86_64/ia32/vsyscall-syscall.so.dbg differ
Binary files linux-2.6.23-mm1/arch/x86_64/ia32/vsyscall-sysenter.so.dbg and linux-2.6.23-mm1-pmqos/arch/x86_64/ia32/vsyscall-sysenter.so.dbg differ
Binary files linux-2.6.23-mm1/arch/x86_64/vdso/vdso.so.dbg and linux-2.6.23-mm1-pmqos/arch/x86_64/vdso/vdso.so.dbg differ
diff -urN -X linux-2.6.23-mm1/Documentation/dontdiff linux-2.6.23-mm1/drivers/cpuidle/cpuidle.c linux-2.6.23-mm1-pmqos/drivers/cpuidle/cpuidle.c
--- linux-2.6.23-mm1/drivers/cpuidle/cpuidle.c 2007-10-16 15:03:30.000000000 -0700
+++ linux-2.6.23-mm1-pmqos/drivers/cpuidle/cpuidle.c 2007-10-17 09:26:21.000000000 -0700
@@ -268,7 +268,7 @@
static inline void latency_notifier_init(struct notifier_block *n)
{
- pm_qos_add_notifier(PM_QOS_CPUIDLE, n);
+ pm_qos_add_notifier(PM_QOS_CPU_DMA_LATENCY, n);
}
#else /* CONFIG_SMP */
diff -urN -X linux-2.6.23-mm1/Documentation/dontdiff linux-2.6.23-mm1/drivers/cpuidle/governors/ladder.c linux-2.6.23-mm1-pmqos/drivers/cpuidle/governors/ladder.c
--- linux-2.6.23-mm1/drivers/cpuidle/governors/ladder.c 2007-10-16 15:03:30.000000000 -0700
+++ ...ubject: [PATCH] static initialization and blocking notification for pm_qos... was Re: 2.6.23-mm1
please try this patch and let me know if the warnings go away. (I have
not been able to reproduce your issue.)
The following is a patch to update the pm_qos code in the mm1 tree. It
removes the PM_QOS_CPUIDLE parameter (replacing it with
PM_CPU_DMA_LATENCY), It changes the notifications from srcu to blocking
in hopes of fixing the WARNS reported by xxx, and it changes the
initialization to me largely static to avoid initialization race with
cpu-idle.
I think we will have to re-visit the static vrs dynamic initialization
and this init race in a while to support pm_qos parameters per power
domain (i.e. per cpu-socket) based on platform information (ACPI) but
for now lets see if this fixes the warning's reported.
Thanks,
Signed-off-by: mark gross <mgross@linux.intel.com>
Binary files linux-2.6.23-mm1/arch/x86_64/ia32/vsyscall-syscall.so.dbg and linux-2.6.23-mm1-pmqos/arch/x86_64/ia32/vsyscall-syscall.so.dbg differ
Binary files linux-2.6.23-mm1/arch/x86_64/ia32/vsyscall-sysenter.so.dbg and linux-2.6.23-mm1-pmqos/arch/x86_64/ia32/vsyscall-sysenter.so.dbg differ
Binary files linux-2.6.23-mm1/arch/x86_64/vdso/vdso.so.dbg and linux-2.6.23-mm1-pmqos/arch/x86_64/vdso/vdso.so.dbg differ
diff -urN -X linux-2.6.23-mm1/Documentation/dontdiff linux-2.6.23-mm1/drivers/cpuidle/cpuidle.c linux-2.6.23-mm1-pmqos/drivers/cpuidle/cpuidle.c
--- linux-2.6.23-mm1/drivers/cpuidle/cpuidle.c 2007-10-16 15:03:30.000000000 -0700
+++ linux-2.6.23-mm1-pmqos/drivers/cpuidle/cpuidle.c 2007-10-17 09:26:21.000000000 -0700
@@ -268,7 +268,7 @@
static inline void latency_notifier_init(struct notifier_block *n)
{
- pm_qos_add_notifier(PM_QOS_CPUIDLE, n);
+ pm_qos_add_notifier(PM_QOS_CPU_DMA_LATENCY, n);
}
#else /* CONFIG_SMP */
diff -urN -X linux-2.6.23-mm1/Documentation/dontdiff linux-2.6.23-mm1/drivers/cpuidle/governors/ladder.c ...Domen Puncer's change to support "MPC5200 low power mode" (in
powerpc-git, which is in Linus's tree now) adds new code calling
mpc52xx_pm_prepare and _finish with suspend_state_t as an argument,
while Rafael Wysocki's pm-rework-struct-platform_suspend_ops.patch
converts those to take no arguments. So the build fails:
arch/powerpc/platforms/52xx/mpc52xx_pm.c:61: error: conflicting types
for ‘mpc52xx_pm_prepare’
include/asm/mpc52xx.h:270: error: previous declaration of
‘mpc52xx_pm_prepare’ was here
arch/powerpc/platforms/52xx/mpc52xx_pm.c:167: error: conflicting types
for ‘mpc52xx_pm_finish’
include/asm/mpc52xx.h:272: error: previous declaration of
‘mpc52xx_pm_finish’ was here
Sorting this out is beyond my abilities; I don't know how to deal with
stuff like this (in arch/powerpc/platforms/52xx/lite5200_pm.c):
static int lite5200_pm_prepare(suspend_state_t state)
{
/* deep sleep? let mpc52xx code handle that */
if (state == PM_SUSPEND_STANDBY)
return mpc52xx_pm_prepare(state);
Patch authors CC'd.
--
Joseph Fannin
jfannin@gmail.com
-
Ouch.
I think that the appended patch is needed. Unfortunately, I can't test it here.
Greetings,
Rafael
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
arch/powerpc/platforms/52xx/lite5200_pm.c | 35 +++++++++++++++++++-----------
include/asm-powerpc/mpc52xx.h | 4 +--
2 files changed, 25 insertions(+), 14 deletions(-)
Index: linux-2.6.23-mm1/include/asm-powerpc/mpc52xx.h
===================================================================
--- linux-2.6.23-mm1.orig/include/asm-powerpc/mpc52xx.h
+++ linux-2.6.23-mm1/include/asm-powerpc/mpc52xx.h
@@ -267,9 +267,9 @@ extern int mpc52xx_set_wakeup_gpio(u8 pi
extern int __init lite5200_pm_init(void);
/* lite5200 calls mpc5200 suspend functions, so here they are */
-extern int mpc52xx_pm_prepare(suspend_state_t);
+extern int mpc52xx_pm_prepare(void);
extern int mpc52xx_pm_enter(suspend_state_t);
-extern int mpc52xx_pm_finish(suspend_state_t);
+extern void mpc52xx_pm_finish(void);
extern char saved_sram[0x4000]; /* reuse buffer from mpc52xx suspend */
#endif
#endif /* CONFIG_PM */
Index: linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c
===================================================================
--- linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200_pm.c
+++ linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c
@@ -1,5 +1,5 @@
#include <linux/init.h>
-#include <linux/pm.h>
+#include <linux/suspend.h>
#include <asm/io.h>
#include <asm/time.h>
#include <asm/mpc52xx.h>
@@ -18,6 +18,8 @@ static void __iomem *sram;
static const int sram_size = 0x4000; /* 16 kBytes */
static void __iomem *mbar;
+static suspend_state_t lite5200_pm_target_state;
+
static int lite5200_pm_valid(suspend_state_t state)
{
switch (state) {
@@ -29,13 +31,22 @@ static int lite5200_pm_valid(suspend_sta
}
}
-static int lite5200_pm_prepare(suspend_state_t state)
+static int lite5200_pm_set_target(suspend_state_t state)
+{
+ if (lite5200_pm_valid(state)) ...These declarations are extern, but pm-rework-struct-platform_suspend_ops.patch makes the function definitions static, which doesn't seem to be allowed. After removing the static bits from those two functions in mpc52xx_pm.c it builds, but there are lots of warnings, which seem to be related: CC arch/powerpc/kernel/prom.o In file included from arch/powerpc/platforms/52xx/mpc52xx_pic.c:34: include/asm/mpc52xx.h:271: warning: parameter names (without types) in function declaration CC arch/powerpc/platforms/52xx/mpc52xx_common.o arch/powerpc/kernel/prom.c: In function ‘early_init_dt_scan_chosen’: arch/powerpc/kernel/prom.c:784: warning: assignment from incompatible pointer type arch/powerpc/kernel/prom.c:788: warning: assignment from incompatible pointer type In file included from arch/powerpc/platforms/52xx/mpc52xx_common.c:20: include/asm/mpc52xx.h:271: warning: parameter names (without types) in function declaration CC arch/powerpc/platforms/52xx/mpc52xx_pci.o CC arch/powerpc/kernel/traps.o In file included from arch/powerpc/platforms/52xx/mpc52xx_pci.c:16: include/asm/mpc52xx.h:271: warning: parameter names (without types) in function declaration arch/powerpc/platforms/52xx/mpc52xx_pci.c: In function ‘mpc52xx_pci_setup’: arch/powerpc/platforms/52xx/mpc52xx_pci.c:262: warning: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘resource_size_t’ arch/powerpc/platforms/52xx/mpc52xx_pci.c:262: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘resource_size_t’ arch/powerpc/platforms/52xx/mpc52xx_pci.c:276: warning: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘resource_size_t’ arch/powerpc/platforms/52xx/mpc52xx_pci.c:276: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘resource_size_t’ arch/powerpc/platforms/52xx/mpc52xx_pci.c:295: warning: cast to pointer from integer of different size arch/powerpc/platforms/52xx/mpc52xx_pci.c:295: warning: format ‘%x’ expects type ...
Well, suspend_state_t is undefined in mpc52xx.h . I've added #include <linux/suspend.h> to the corrected patch below, although I'm not sure if that's the right thing to do here. Greetings, Rafael Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- arch/powerpc/platforms/52xx/lite5200_pm.c | 35 +++++++++++++++++++----------- arch/powerpc/platforms/52xx/mpc52xx_pm.c | 4 +-- include/asm-powerpc/mpc52xx.h | 6 +++-- 3 files changed, 29 insertions(+), 16 deletions(-) Index: linux-2.6.23-mm1/include/asm-powerpc/mpc52xx.h =================================================================== --- linux-2.6.23-mm1.orig/include/asm-powerpc/mpc52xx.h +++ linux-2.6.23-mm1/include/asm-powerpc/mpc52xx.h @@ -18,6 +18,8 @@ #include <asm/prom.h> #endif /* __ASSEMBLY__ */ +#include <linux/suspend.h> + /* ======================================================================== */ /* Structures mapping of some unit register set */ @@ -267,9 +269,9 @@ extern int mpc52xx_set_wakeup_gpio(u8 pi extern int __init lite5200_pm_init(void); /* lite5200 calls mpc5200 suspend functions, so here they are */ -extern int mpc52xx_pm_prepare(suspend_state_t); +extern int mpc52xx_pm_prepare(void); extern int mpc52xx_pm_enter(suspend_state_t); -extern int mpc52xx_pm_finish(suspend_state_t); +extern void mpc52xx_pm_finish(void); extern char saved_sram[0x4000]; /* reuse buffer from mpc52xx suspend */ #endif #endif /* CONFIG_PM */ Index: linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c =================================================================== --- linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200_pm.c +++ linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c @@ -1,5 +1,5 @@ #include <linux/init.h> -#include <linux/pm.h> +#include <linux/suspend.h> #include <asm/io.h> #include <asm/time.h> #include <asm/mpc52xx.h> @@ -18,6 +18,8 @@ static void __iomem *sram; static const int sram_size = ...
A bit more is needed due to the rename of lite5200_pm_init() to
lite5200_suspend_init(). An amended patch follows that builds and
boots on my powermac.
---
diff -aurN linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200.c linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200.c
--- linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200.c 2007-10-12 16:21:47.000000000 -0400
+++ linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200.c 2007-10-14 11:49:29.000000000 -0400
@@ -126,7 +126,7 @@
#ifdef CONFIG_PM
mpc52xx_suspend.board_suspend_prepare = lite5200_suspend_prepare;
mpc52xx_suspend.board_resume_finish = lite5200_resume_finish;
- lite5200_pm_init();
+ lite5200_suspend_init();
#endif
#ifdef CONFIG_PCI
diff -aurN linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200_pm.c linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c
--- linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200_pm.c 2007-10-14 11:10:57.000000000 -0400
+++ linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c 2007-10-14 09:06:36.000000000 -0400
@@ -1,5 +1,5 @@
#include <linux/init.h>
-#include <linux/pm.h>
+#include <linux/suspend.h>
#include <asm/io.h>
#include <asm/time.h>
#include <asm/mpc52xx.h>
@@ -18,6 +18,8 @@
static const int sram_size = 0x4000; /* 16 kBytes */
static void __iomem *mbar;
+static suspend_state_t lite5200_pm_target_state;
+
static int lite5200_pm_valid(suspend_state_t state)
{
switch (state) {
@@ -29,13 +31,22 @@
}
}
-static int lite5200_pm_prepare(suspend_state_t state)
+static int lite5200_pm_set_target(suspend_state_t state)
+{
+ if (lite5200_pm_valid(state)) {
+ lite5200_pm_target_state = state;
+ return 0;
+ }
+ return -EINVAL;
+}
+
+static int lite5200_pm_prepare(void)
{
/* deep sleep? let mpc52xx code handle that */
- if (state == PM_SUSPEND_STANDBY)
- return mpc52xx_pm_prepare(state);
+ if (lite5200_pm_target_state == PM_SUSPEND_STANDBY)
+ return mpc52xx_pm_prepare();
- if (state != ...Thanks. Can you please try the alternative one below? I just removed the renaming of lite5200_pm_init() from it. Greetings, Rafael Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- arch/powerpc/platforms/52xx/lite5200_pm.c | 33 ++++++++++++++++++++---------- arch/powerpc/platforms/52xx/mpc52xx_pm.c | 4 +-- include/asm-powerpc/mpc52xx.h | 6 +++-- 3 files changed, 28 insertions(+), 15 deletions(-) Index: linux-2.6.23-mm1/include/asm-powerpc/mpc52xx.h =================================================================== --- linux-2.6.23-mm1.orig/include/asm-powerpc/mpc52xx.h +++ linux-2.6.23-mm1/include/asm-powerpc/mpc52xx.h @@ -18,6 +18,8 @@ #include <asm/prom.h> #endif /* __ASSEMBLY__ */ +#include <linux/suspend.h> + /* ======================================================================== */ /* Structures mapping of some unit register set */ @@ -267,9 +269,9 @@ extern int mpc52xx_set_wakeup_gpio(u8 pi extern int __init lite5200_pm_init(void); /* lite5200 calls mpc5200 suspend functions, so here they are */ -extern int mpc52xx_pm_prepare(suspend_state_t); +extern int mpc52xx_pm_prepare(void); extern int mpc52xx_pm_enter(suspend_state_t); -extern int mpc52xx_pm_finish(suspend_state_t); +extern void mpc52xx_pm_finish(void); extern char saved_sram[0x4000]; /* reuse buffer from mpc52xx suspend */ #endif #endif /* CONFIG_PM */ Index: linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c =================================================================== --- linux-2.6.23-mm1.orig/arch/powerpc/platforms/52xx/lite5200_pm.c +++ linux-2.6.23-mm1/arch/powerpc/platforms/52xx/lite5200_pm.c @@ -1,5 +1,5 @@ #include <linux/init.h> -#include <linux/pm.h> +#include <linux/suspend.h> #include <asm/io.h> #include <asm/time.h> #include <asm/mpc52xx.h> @@ -18,6 +18,8 @@ static void __iomem *sram; static const int sram_size = 0x4000; /* 16 kBytes */ static void __iomem *mbar; +static ...
Well, from the lack of response I gather it works. :-) I'm going to send it in a separate thread with a changelog. Please object if it doesn't work. Greetings, Rafael -
This patch builds and boots on my powermac. Also, I checked, and the remaning warnings from gcc in this general area are also present Thanks! -- Joseph Fannin jfannin@gmail.com -
Hi, I just tried 2.6.23-mm1 and suspend is not working there. automount refuses to go in the freezer. I've attached dmesg (three attempts to suspend so it gets a bit big). Suspend works on 2.6.23 and sched-devel. Another funny thing that I've noticed on -mm is that amarok refuses to load a playlist. It works properly on sched-devel tree. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-mm1 # Sat Oct 13 14:05:27 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=20 CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_NS is not set CONFIG_CGROUP_CPUACCT=y CONFIG_CPUSETS=y CONFIG_RESOURCE_COUNTERS=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_CGROUP_MEM_CONT=y CONFIG_PROC_PID_CPUSET=y CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not ...
Hi, Could you please try to find the patch that introduces this issue (using bisection)? Greetings, Rafael -
The winner is freezer-use-wait-queue-instead-of-busy-looping.patch -- regards, Dhaval -
Thanks. Hm, interesting. This patch is not really essential, so it's better to drop if it causes problems. Andrew, can you drop it, please? Greetings, Rafael -
Hi Andrew,
While running regular cpu-offline tests on 2.6.23-mm1, I
hit the following lockdep warning.
It was triggered because some of the per-cpu counters and thus
their locks are accessed from IRQ context.
This can cause a deadlock if it interrupts a cpu-offline thread which
is transferring a dead-cpu's counts to the global counter.
Please find the patch for the same below. Tested on i386.
Thanks and Regards
gautham.
=====================Warning! ===========================================
[root@llm43]# ./all_hotplug_once
CPU 1 is now offline
=================================
[ INFO: inconsistent lock state ]
2.6.23-mm1 #3
---------------------------------
inconsistent {in-softirq-W} -> {softirq-on-W} usage.
sh/7103 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&percpu_counter_irqsafe){-+..}, at: [<c028e296>] percpu_counter_hotcpu_callback+0x22/0x67
{in-softirq-W} state was registered at:
[<c014126f>] __lock_acquire+0x40d/0xb4a
[<c0141966>] __lock_acquire+0xb04/0xb4a
[<c0141a0b>] lock_acquire+0x5f/0x79
[<c028e4b5>] __percpu_counter_add+0x62/0xad
[<c04d5e81>] _spin_lock+0x21/0x2c
[<c028e4b5>] __percpu_counter_add+0x62/0xad
[<c028e4b5>] __percpu_counter_add+0x62/0xad
[<c01531af>] test_clear_page_writeback+0x88/0xc5
[<c014d35e>] end_page_writeback+0x20/0x3c
[<c0188757>] end_buffer_async_write+0x133/0x181
[<c0141966>] __lock_acquire+0xb04/0xb4a
[<c0187eb4>] end_bio_bh_io_sync+0x21/0x29
[<c0187e93>] end_bio_bh_io_sync+0x0/0x29
[<c0189345>] bio_endio+0x27/0x29
[<c04358f8>] dec_pending+0x17d/0x199
[<c0435a13>] clone_endio+0x73/0x9f
[<c04359a0>] clone_endio+0x0/0x9f
[<c0189345>] bio_endio+0x27/0x29
[<c027ba83>] __end_that_request_first+0x150/0x2c0
[<c034a161>] scsi_end_request+0x1d/0xab
[<c014f5ed>] mempool_free+0x63/0x67
[<c034ac22>] scsi_io_completion+0x108/0x2c7
[<c027e03b>] blk_done_softirq+0x51/0x5c
[<c012b291>] __do_softirq+0x68/0xdb
[<c012b33a>] do_softirq+0x36/0x51
[<c012b4bf>] irq_exit+0x43/0x4e
...I noticed 32bit binary on x86_64 behavior is changed on 2.6.23-mm1. This is a result of pmap after process returns -ENOMEM.(see attached program) == on 2.6.23 == errno 12 3531: ./malloc 0000000000001000 6272K ----- [ anon ] 0000000000621000 100K r-x-- /lib/ld-2.5.so 000000000063a000 4K r---- /lib/ld-2.5.so 000000000063b000 4K rw--- /lib/ld-2.5.so 000000000063c000 8K ----- [ anon ] 000000000063e000 1244K r-x-- /lib/libc-2.5.so 0000000000775000 8K r---- /lib/libc-2.5.so 0000000000777000 4K rw--- /lib/libc-2.5.so 0000000000778000 12K rw--- [ anon ] 000000000077b000 123700K ----- [ anon ] 0000000008048000 4K r-x-- /home/kamezawa/malloc 0000000008049000 4K rw--- /home/kamezawa/malloc 000000000804a000 3929824K ----- [ anon ] 00000000f7f02000 8K rw--- [ anon ] 00000000f7f04000 100K ----- [ anon ] 00000000f7f1d000 4K rw--- [ anon ] 00000000f7f1e000 131812K ----- [ anon ] 00000000fffd7000 84K rw--- [ stack ] 00000000fffec000 72K ----- [ anon ] 00000000ffffe000 4K r-x-- [ anon ] total 4193272K == == on 2.6.23-mm1== errno 12 3504: ./malloc 0000000000621000 100K r-x-- /lib/ld-2.5.so 000000000063a000 4K r---- /lib/ld-2.5.so 000000000063b000 4K rw--- /lib/ld-2.5.so 000000000063e000 1244K r-x-- /lib/libc-2.5.so 0000000000775000 8K r---- /lib/libc-2.5.so 0000000000777000 4K rw--- /lib/libc-2.5.so 0000000000778000 12K rw--- [ anon ] 0000000008048000 4K r-x-- /home/kamezawa/malloc 0000000008049000 4K rw--- /home/kamezawa/malloc 0000000055555000 4K rw--- [ anon ] 0000000055556000 100K ----- [ anon ] 000000005556f000 8K rw--- [ anon ] 0000000055671000 2789016K ----- [ anon ] 00000000ffa17000 84K rw--- [ stack ] 00000000ffa2c000 5960K ----- [ anon ] 00000000ffffe000 4K r-x-- [ anon ] total 2796560K == Maybe get_unmapped_area() had some ...
So it only managed to allocate half as much virtual memory? Lovely. It had better not be. It is due to pie-executable-randomization.patch. That patch has been an ongoing source of trouble. I'll drop it. Again. Guys, please don't resend it until it actually works. -
Hi, hm, I guess this is probably due to pie-randomization patch, right? (could you please try reverting it, to see whether things get back to normal). Oh well, this causes more trouble that I have ever imagined ... I will look into it, thanks a lot for the report. Andrew, please drop this one again, I will fix it up. Thanks, -- Jiri Kosina -
On Wed, 17 Oct 2007 11:10:23 +0200 (CEST)
Maybe this can be fix.
Thanks,
-Kame
==
ia32 on x86_64 seems to be handled as it is.
arch/x86_64/mm/mmap.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
Index: devel-2.6.23-mm1/arch/x86_64/mm/mmap.c
===================================================================
--- devel-2.6.23-mm1.orig/arch/x86_64/mm/mmap.c
+++ devel-2.6.23-mm1/arch/x86_64/mm/mmap.c
@@ -54,13 +54,17 @@ static inline unsigned long mmap_base(vo
return TASK_SIZE - (gap & PAGE_MASK);
}
-static inline int mmap_is_legacy(void)
+static inline int mmap_is_32(void)
{
#ifdef CONFIG_IA32_EMULATION
if (test_thread_flag(TIF_IA32))
return 1;
#endif
+ return 0;
+}
+static inline int mmap_is_legacy(void)
+{
if (current->personality & ADDR_COMPAT_LAYOUT)
return 1;
@@ -89,7 +93,12 @@ void arch_pick_mmap_layout(struct mm_str
* Fall back to the standard layout if the personality
* bit is set, or if the expected stack growth is unlimited:
*/
- if (mmap_is_legacy()) {
+ if (mmap_is_32()) {
+#ifdef CONFIG_IA32_EMULATION
+ /* ia32_pick_mmap_layout has its own. */
+ return ia32_pick_mmap_layout(mm);
+#endif
+ } else if(mmap_is_legacy()) {
mm->mmap_base = TASK_UNMAPPED_BASE;
mm->get_unmapped_area = arch_get_unmapped_area;
mm->unmap_area = arch_unmap_area;
-
Hi Kame, yes, this looks correct to me. Did you verify that it makes the problem you are seeing go away? I will do some more testing. Unfortunately, I am afraid it is a bit late for 2.6.24. Thanks, -- Jiri Kosina -
Thanks a lot, it works flawlessly. I will rebase the patch after 2.6.24-rc1 is released and will send it to Andrew's queue, hopefully for 2.6.25. Thanks! -- Jiri Kosina -
Andrew, below is a fixed version with patch from Kamezawa Hiroyuki incorporated. It fixes the small regression Kamezawa found just at the time you sent merge request for this patch to Linus -- that ia32 ELF binaires on x86_64 were able to allocate only about 2/3 of memory they were able to allocate without this patch. Apart from this fix, the patch is the same as it has been in -mm tree for quite some time. It'd be great if it could make it for 2.6.24, if feasible. Thanks. From: Jiri Kosina <jkosina@suse.cz> Subject: PIE executable randomization This patch is using mmap()'s randomization functionality in such a way that it maps the main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). The code has been extraced from Ingo's exec-shield patch http://people.redhat.com/mingo/exec-shield/ [akpm@linux-foundation.org: fix used-uninitialsied warning] [kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling] Signed-off-by: Jiri Kosina <jkosina@suse.cz> diff --git a/arch/ia64/ia32/binfmt_elf32.c b/arch/ia64/ia32/binfmt_elf32.c index f6ae3ec..3db699b 100644 --- a/arch/ia64/ia32/binfmt_elf32.c +++ b/arch/ia64/ia32/binfmt_elf32.c @@ -226,7 +226,7 @@ elf32_set_personality (void) } static unsigned long -elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type) +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused) { unsigned long pgoff = (eppnt->p_vaddr) & ~IA32_PAGE_MASK; diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index 907942e..95485e6 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -12,6 +12,7 @@ #include <linux/file.h> #include <linux/utsname.h> #include <linux/personality.h> +#include <linux/random.h> #include <asm/uaccess.h> #include <asm/ia32.h> @@ -65,6 +66,7 ...
On Thu, 11 Oct 2007 21:31:26 -0700 Between rc8-mm2 and 2.6.23-mm1, autofs stopped working in the -mm kernel. Instead of mounting my home directory, I get these messages in /var/log/messages: Oct 20 00:38:52 kenny automount[2293]: cache_readlock: mapent cache rwlock lock failed Oct 20 00:38:52 kenny automount[2293]: unexpected pthreads error: 11 at 65 in cache.c I am not sure if this is due to autofs changes or changes in some other code that was merged. If you can think of any suspicious change that I should test, please let me know. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
I don't think anything changed in autofs in that period. I'd be suspecting the r-o-bind-mounts patches, but they didn't change much in that time either. Does current mainline work OK? If so, pretty much the only thing in that area left unmerged is r-o-bind-mounts and hch's exportfs stuff. -
On Fri, 19 Oct 2007 22:39:00 -0700 Yes, 2.6.23 mainline works fine. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
On Sat, 20 Oct 2007 01:54:04 -0400 Let me clarify: 2.6.23 vanilla works. I have not yet tried the latest 2.6.23+ git tree. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
On Sat, 20 Oct 2007 01:54:45 -0400 I just tried it. In the latest git tree, autofs still works. The regression is in -mm only. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
Andrew, Rik tracked it down to an interaction with futexes from the pid namespace code. I believe r/o bind mounts are innocent for now. -- Dave -
On Mon, 22 Oct 2007 11:45:19 +0800 Not that I know. If I reboot the system into 2.6.23 or 2.6.23-git, things work just fine though. That makes me think the server is not I do not know if this an autofs issue or the result of something Nope, the only two lines that I found in the log are above... Nothing in dmesg either. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
Hey there!! fails to boot here with this friendly oops: http://oioio.altervista.org/linux/dsc01702.jpg .config: http://oioio.altervista.org/linux/config-2.6.23-mm1-1 2.6.23-rc8-mm2 booted ok but had other problems I haven't reported yet (no s2ram with mysql running and some net WARNING). Let's see if .23-mm1 still has those first. I'm adding Cc: linux-scsi PS: I'll hardly be able to bisect in the next days... :P -- -
That looks like a Jens and Dave production to me. -
Yes, and it's been fixed: http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif... See also: http://lkml.org/lkml/2007/10/13/174 Thanks, Shaggy -- David Kleikamp IBM Linux Technology Center -
Hi Andrew,
The build fails with the following error message.
CC arch/powerpc/sysdev/axonram.o
arch/powerpc/sysdev/axonram.c:120:34: error: macro "bio_io_error" passed 2 arguments, but takes just 1
arch/powerpc/sysdev/axonram.c: In function ‘axon_ram_make_request’:
arch/powerpc/sysdev/axonram.c:120: error: ‘bio_io_error’ undeclared (first use in this function)
arch/powerpc/sysdev/axonram.c:120: error: (Each undeclared identifier is reported only once
arch/powerpc/sysdev/axonram.c:120: error: for each function it appears in.)
arch/powerpc/sysdev/axonram.c:134: error: too many arguments to function ‘bio_endio’
make[1]: *** [arch/powerpc/sysdev/axonram.o] Error 1
make: *** [arch/powerpc/sysdev] Error 2
The patch fixes the build failure.
Signed-off-by : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
--- linux-2.6.23/arch/powerpc/sysdev/axonram.c 2007-10-12 12:58:14.000000000 +0530
+++ linux-2.6.23/arch/powerpc/sysdev/~axonram.c 2007-10-12 12:51:43.000000000 +0530
@@ -117,7 +117,7 @@ axon_ram_make_request(struct request_que
transfered = 0;
bio_for_each_segment(vec, bio, idx) {
if (unlikely(phys_mem + vec->bv_len > phys_end)) {
- bio_io_error(bio, bio->bi_size);
+ bio_io_error(bio);
rc = -ERANGE;
break;
}
@@ -131,7 +131,7 @@ axon_ram_make_request(struct request_que
phys_mem += vec->bv_len;
transfered += vec->bv_len;
}
- bio_endio(bio, transfered, 0);
+ bio_endio(bio, 0);
return rc;
}
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
On Fri, 12 Oct 2007 22:38:25 +0200 ho hum. Maybe reiser4 needs updating for the git-block changes. I don't recall having seen a useful description of what's going on in git-block so some reverse-engineering might be needed. -
Hmm. I can add more data to this. My x86_64 mode laptop is running 2.6.23-mm1 with Reiser4 and does not experience problems. I am using 64-bit kernel, libata (I think, whatever the SCSI-like PATA is called), and Reiser4. Both libata and Reiser4 are built-in, not modules. --=20 Zan Lynx <zlynx@acm.org>
Reiser4: Drop 'size' argument from bio_endio and bi_end_io
This patch pushes into Reiser4 the changes introduced by
commit 6712ecf8f648118c3363c142196418f89a510b90:
As bi_end_io is only called once when the request is complete,
the 'size' argument is now redundant. Remove it.
Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.
While we are at it, change bi_end_io to return void.
Please review.
Signed-Off-By: Laurent Riffard <laurent.riffard@free.fr>
---
fs/reiser4/flush_queue.c | 10 ++--------
fs/reiser4/page_cache.c | 24 ++++--------------------
fs/reiser4/status_flags.c | 7 +------
3 files changed, 7 insertions(+), 34 deletions(-)
Index: linux-2.6-mm/fs/reiser4/flush_queue.c
===================================================================
--- linux-2.6-mm.orig/fs/reiser4/flush_queue.c
+++ linux-2.6-mm/fs/reiser4/flush_queue.c
@@ -391,9 +391,8 @@ int atom_fq_parts_are_clean(txn_atom * a
}
#endif
/* Bio i/o completion routine for reiser4 write operations. */
-static int
-end_io_handler(struct bio *bio, unsigned int bytes_done UNUSED_ARG,
- int err)
+static void
+end_io_handler(struct bio *bio, int err)
{
int i;
int nr_errors = 0;
@@ -401,10 +400,6 @@ end_io_handler(struct bio *bio, unsigned
assert("zam-958", bio->bi_rw & WRITE);
- /* i/o op. is not fully completed */
- if (bio->bi_size != 0)
- return 1;
-
if (err == -EOPNOTSUPP)
set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
@@ -447,7 +442,6 @@ end_io_handler(struct bio *bio, unsigned
}
bio_put(bio);
- return 0;
}
/* Count I/O requests which will be submitted by @bio in given flush queues
Index: linux-2.6-mm/fs/reiser4/page_cache.c
===================================================================
--- linux-2.6-mm.orig/fs/reiser4/page_cache.c
+++ linux-2.6-mm/fs/reiser4/page_cache.c
@@ -320,18 +320,11 @@ reiser4_tree *reiser4_tree_by_page(const
mpage_end_io_read() would also ...Looks correct to me. Acked-by: Jens Axboe <jens.axboe@oracle.com> -- Jens Axboe -
Hi Andrew,
The build fails with following message
CC drivers/net/ibm_newemac/zmii.o
CC drivers/net/ibm_newemac/rgmii.o
drivers/net/ibm_newemac/rgmii.c: In function ‘rgmii_probe’:
drivers/net/ibm_newemac/rgmii.c:254: error: implicit declaration of
function ‘device_is_compatible’
make[3]: *** [drivers/net/ibm_newemac/rgmii.o] Error 1
make[2]: *** [drivers/net/ibm_newemac] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2
The function device_is_compatible does not exist, and seems to called
instead of
of_device_compatible. This patch replace the function.
Signed-off-by : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
--- linux-2.6.23/drivers/net/ibm_newemac/rgmii.c 2007-10-12 12:10:48.000000000 +0530
+++ linux-2.6.23/drivers/net/ibm_newemac/~rgmii.c 2007-10-12 14:37:21.000000000 +0530
@@ -251,7 +251,7 @@ static int __devinit rgmii_probe(struct
}
/* Check for RGMII type */
- if (device_is_compatible(ofdev->node, "ibm,rgmii-axon"))
+ if (of_device_is_compatible(ofdev->node, "ibm,rgmii-axon"))
dev->type = RGMII_AXON;
else
dev->type = RGMII_STANDARD;
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
Hi Andrew, Another build failure with following message CC drivers/scsi/advansys.o drivers/scsi/advansys.c:71:2: warning: #warning this driver is still not properly converted to the DMA API drivers/scsi/advansys.c: In function ‘AdvBuildCarrierFreelist’: drivers/scsi/advansys.c:6486: error: implicit declaration of function ‘virt_to_bus’ drivers/scsi/advansys.c: In function ‘AdvInitAsc3550Driver’: drivers/scsi/advansys.c:6974: error: implicit declaration of function ‘bus_to_virt’ drivers/scsi/advansys.c:6974: warning: cast to pointer from integer of different size drivers/scsi/advansys.c:6994: warning: cast to pointer from integer of different size drivers/scsi/advansys.c: In function ‘AdvInitAsc38C0800Driver’: drivers/scsi/advansys.c:7450: warning: cast to pointer from integer of different size drivers/scsi/advansys.c:7471: warning: cast to pointer from integer of different size drivers/scsi/advansys.c: In function ‘AdvInitAsc38C1600Driver’: drivers/scsi/advansys.c:7939: warning: cast to pointer from integer of different size drivers/scsi/advansys.c:7963: warning: cast to pointer from integer of different size drivers/scsi/advansys.c: In function ‘adv_isr_callback’: drivers/scsi/advansys.c:8175: warning: cast to pointer from integer of different size drivers/scsi/advansys.c: In function ‘AdvISR’: drivers/scsi/advansys.c:8392: warning: cast to pointer from integer of different size drivers/scsi/advansys.c:8412: warning: cast to pointer from integer of different size drivers/scsi/advansys.c: In function ‘AdvExeScsiQueue’: drivers/scsi/advansys.c:10845: warning: cast to pointer from integer of different size make[2]: *** [drivers/scsi/advansys.o] Error 1 make[1]: *** [drivers/scsi] Error 2 make: *** [drivers] Error 2 The functions virt_to_bus and bus_to_virt are begin defined between ifdef CONFIG_PPC32 but when i compile allyesconfig with ppc64 box,i get this error. This patch removes the ifdef. Signed-off-by : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> --- --- ...
especially ones like that ;) Matthew has proposed that advansys should be dependent upon CONFIG_VIRT_TO_BUS. I don't think anyone's done a patch yet though. (Actually, the code which you've altered there should probably be using CONFIG_VIRT_TO_BUS, too). -
Which is totally bogus, because virt_to_bus/bus_to_virt only work on systems without an IOMMU. Most if not all ppc64 systems have one or more IOMMUs. This patch is nacked. The correct fix is to make advansys depend on CONFIG_VIRT_TO_BUS, or alternatively fix advansys.c properly by making it use the interfaces described in Documentation/DMA-mapping.txt (or the equivalent scsi Definitely. Paul. -
If you look at the git logs, you'll notice there's some progress towards this. It's already the case for the narrow boards. I have a patch to rip it all out for the wide boards, but there's clearly a bug because it crashes my parisc machine. Works fine on x86 though. I can't work on it this week because I'm travelling and the parisc machine with remote power died on me last week. I think I already suggested a temporary CONFIG_VIRT_TO_BUS dependency to akpm last week. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -
Andrew Morton wrote: Works a bit better right :) At least it boots here but I have a strange problem with it. It seems 2.6.23-mm1 kills off java. Every program needs java here does not work anymore telling 'my java' installation is incorrect. Also I noticed firefox is acting weird as well thunderbird. Gtk apps just random freeze and need be killed with -11. Running 'java -version' manually returns nothing , 'java -jar some.jar' does nothing as well. ( not even a error or anything else ) ( I've also tested sun's java 1.5 and 1.6 and openjre as well all with same result ) I only have a WARNING in my dmesg but i don't think this is related to this : Oct 13 01:44:52 lara [10722.146448] WARNING: at fs/namespace.c:586 __mntput() Oct 13 01:44:52 lara [10722.146478] [<c0167cb2>] mntput_no_expire+0x5d/0xab Oct 13 01:44:52 lara [10722.146503] [<c01683d1>] sys_umount+0x1f8/0x202 Oct 13 01:44:52 lara [10722.146511] [<c010f368>] check_pgt_cache+0x13/0x15 Oct 13 01:44:52 lara [10722.146529] [<c0158cd0>] sys_stat64+0xf/0x23 Oct 13 01:44:52 lara [10722.146549] [<c0147a9c>] remove_vma+0x31/0x36 Oct 13 01:44:52 lara [10722.146574] [<c010fbf6>] do_page_fault+0x180/0x4ea Oct 13 01:44:52 lara [10722.146600] [<c01683e6>] sys_oldumount+0xb/0xe Oct 13 01:44:52 lara [10722.146614] [<c010258e>] sysenter_past_esp+0x5f/0x85 Oct 13 01:44:52 lara [10722.146639] [<c02e0000>] xfrm_tmpl_resolve+0x2bd/0x37b Oct 13 01:44:52 lara [10722.146656] ======================= I also noticed some programs like vlc segfaults : vlc[20506]: segfault at 01950000 eip 01950000 esp b4876368 error 4 Booting 2.6.23 makes all these go away. I don't have anything else in my logs. Any idea what patches could cause this problem(s) ? Config can be found there -> http://194.231.229.228/2.6.23-mm1-config Regards, Gabriel C -
Do you know any more about when this happened? Was it during a reboot, or after you unmounted some device or volume? Have you seen it again? Which filesystem(s) do you use? -- Dave -
Something seems to be amiss with CONFIG_LOCALVERSION handling.
I am routinely building with
CONFIG_LOCALVERSION=3D"-testing"
CONFIG_LOCALVERSION_AUTO=3Dy
My usual sequence of "make ; sudo make modules_install install"
has worked fine for all of 2.6.23{-rc?{,-mm?},}. For 2.6.23-mm1
it fails with:
ts@xenon:~/kernel/linux-2.6.23-mm1-work> sudo make modules_install instal=
l
root's password:
INSTALL arch/i386/crypto/aes-i586.ko
[...]
INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.=
map 2.6.23-mm1; fi
sh /home/ts/kernel/linux-2.6.23-mm1-work/arch/i386/boot/install.sh 2.6.23=
-mm1 arch/i386/boot/bzImage System.map "/boot"
Root device: /dev/system/root (mounted on / as ext3)
Module list: processor thermal ahci pata_marvell aic7xxx fan jbd ext3 =
dm_mod edd dm-mod dm-snapshot (xennet xenblk dm-mod dm-snapshot)
Kernel image: /boot/vmlinuz-2.6.23-mm1
Initrd image: /boot/initrd-2.6.23-mm1
No modules found for kernel 2.6.23-mm1-testing
ts@xenon:~/kernel/linux-2.6.23-mm1-work>
That is, both "make modules_install" and "make install" omit
the "-testing" suffix, "make modules_install" installing the
modules into /lib/modules/2.6.23-mm1 instead of
/lib/modules/2.6.23-mm1-testing, and "make install" passing
"2.6.23-mm1" without the "-testing" suffix to the install.sh
script, but mkinitrd suddenly rediscovers the real kernel
version string and consequently looks for modules in
/lib/modules/2.6.23-mm1-testing, so initrd creation fails.
Ideas?
--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Nope... I have just tried it out with latest -linus tree and I see no bugs. Note that all kbuild fixes are in latest -linus except for a few things that are postponed. I will keep it in mind but nor persuade it further for now. Sam -
I have investigated a bit more, and stumbled on this: ts@xenon:~/kernel/linux-2.6.23-mm1-work> make include/config/kernel.relea= se ts@xenon:~/kernel/linux-2.6.23-mm1-work> cat include/config/kernel.releas= e 2.6.23-mm1-testing ts@xenon:~/kernel/linux-2.6.23-mm1-work> make Using ARCH=3Di386 CROSS_COMPILE=3D CHK include/linux/version.h CHK include/linux/utsrelease.h [...] Kernel: arch/i386/boot/bzImage is ready (#1) Building modules, stage 2. MODPOST 1085 modules ts@xenon:~/kernel/linux-2.6.23-mm1-work> cat include/config/kernel.releas= e 2.6.23-mm1 ts@xenon:~/kernel/linux-2.6.23-mm1-work> Hmmm. "Curiouser and curiouser", said Alice. So the content of the file include/config/kernel.release generated by "make" varies depending on whether I ask "make" to create just that file, or an entire kernel!? That runs against everything I ever learned about "make"! My ability to comprehend the inner workings of Kbuild ends here. I'll just skip this -mm release and wait for 2.6.24-rc1, hoping it won't have the same problem. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
2.6.24-rc1 is fine, so the issue can be closed. T. -
/home is mounted with the following options: /dev/mapper/vglinux1-lvhome on /home type reiserfs (rw,noatime,nodiratime,user_xattr) I guess that beagled (the Beagle desktop search daemon) has populated user xattrs on almost all files. Now, when I delete a file, two BUGs occur and the system hangs. Here is the stack for the first BUG (the second one is very similar): [partially hand copied stack] _fput fput reiserfs_delete_xattrs reiserfs_delete_inode generic_delete_inode generic_drop_inode iput do_unlinkat sys_unlink sys_enter_past_esp I reported a similar BUG in 2.6.22-rc8-mm2 (see http://lkml.org/lkml/2007/9/27/235). Dave Hansen sent a patch for it, I tested it and it was OK for 2.6.22-rc8-mm2. I tried this patch on 2.6.23-mm1, and it fixed the BUGs here too. ---- From: Dave Hansen <haveblue@us.ibm.com> The bug is caused by reiserfs creating a special 'struct file' with a NULL vfsmount. /* Opens a file pointer to the attribute associated with inode */ static struct file *open_xa_file(const struct inode *inode, const char *name, int flags) { ... fp = dentry_open(xafile, NULL, O_RDWR); /* dentry_open dputs the dentry if it fails */ As Christoph just said, this is somewhat of a bandaid. But, it shouldn't hurt anything. --- lxc-dave/fs/file_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN fs/open.c~fix-reiserfs-oops fs/open.c diff -puN fs/file_table.c~fix-reiserfs-oops fs/file_table.c --- lxc/fs/file_table.c~fix-reiserfs-oops 2007-09-27 13:32:20.000000000 -0700 +++ lxc-dave/fs/file_table.c 2007-09-27 13:33:11.000000000 -0700 @@ -236,7 +236,7 @@ void fastcall __fput(struct file *file) fops_put(file->f_op); if (file->f_mode & FMODE_WRITE) { put_write_access(inode); - if (!special_file(inode->i_mode)) + if (!special_file(inode->i_mode) && mnt) mnt_drop_write(mnt); } put_pid(file->f_owner.pid); diff -puN include/linux/mount.h~fix-reiserfs-oops ...
The delete path is a similar case as the one Dave fixed, also cause by
a NULL vfsmount passed to dentry_open, but through a different code-path.
Untested fix for this problem below:
Index: linux-2.6.23-rc8/fs/reiserfs/xattr.c
===================================================================
--- linux-2.6.23-rc8.orig/fs/reiserfs/xattr.c 2007-09-30 14:13:46.000000000 +0200
+++ linux-2.6.23-rc8/fs/reiserfs/xattr.c 2007-09-30 14:18:30.000000000 +0200
@@ -207,9 +207,8 @@ static struct dentry *get_xa_file_dentry
* we're called with i_mutex held, so there are no worries about the directory
* changing underneath us.
*/
-static int __xattr_readdir(struct file *filp, void *dirent, filldir_t filldir)
+static int __xattr_readdir(struct inode *inode, void *dirent, filldir_t filldir)
{
- struct inode *inode = filp->f_path.dentry->d_inode;
struct cpu_key pos_key; /* key of current position in the directory (key of directory entry) */
INITIALIZE_PATH(path_to_entry);
struct buffer_head *bh;
@@ -352,24 +351,19 @@ static int __xattr_readdir(struct file *
* this is stolen from vfs_readdir
*
*/
-static
-int xattr_readdir(struct file *file, filldir_t filler, void *buf)
+static int xattr_readdir(struct inode *inode, filldir_t filler, void *buf)
{
- struct inode *inode = file->f_path.dentry->d_inode;
int res = -ENOTDIR;
- if (!file->f_op || !file->f_op->readdir)
- goto out;
+
mutex_lock_nested(&inode->i_mutex, I_MUTEX_XATTR);
-// down(&inode->i_zombie);
res = -ENOENT;
if (!IS_DEADDIR(inode)) {
lock_kernel();
- res = __xattr_readdir(file, buf, filler);
+ res = __xattr_readdir(inode, buf, filler);
unlock_kernel();
}
-// up(&inode->i_zombie);
mutex_unlock(&inode->i_mutex);
- out:
+
return res;
}
@@ -721,7 +715,6 @@ reiserfs_delete_xattrs_filler(void *buf,
/* This is called w/ inode->i_mutex downed */
int reiserfs_delete_xattrs(struct inode *inode)
{
- struct file *fp;
struct dentry *dir, *root;
int err = ...Here's a patch I worked up the other night that kills off struct file
completely from the xattr code. I've tested it locally.
After several posts and bug reports regarding interaction with the NULL
nameidata, here's a patch to clean up the mess with struct file in the
reiserfs xattr code.
As observed in several of the posts, there's really no need for struct file
to exist in the xattr code. It was really only passed around due to the
f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it.
reiserfs_prepare_write() and reiserfs_commit_write() don't actually use
the struct file passed to it, and the xattr code uses a private version of
reiserfs_readdir() to enumerate the xattr directories.
I do have patches in my queue to convert the xattrs to use reiserfs_readdir(),
but I guess I'll just have to rework those.
This is pretty close to the patch by Dave Hansen for -mm, but I didn't
notice it until after I wrote this up.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
fs/reiserfs/xattr.c | 111 ++++++++++++++--------------------------------------
1 file changed, 31 insertions(+), 80 deletions(-)
--- a/fs/reiserfs/xattr.c 2007-08-27 14:03:39.000000000 -0400
+++ b/fs/reiserfs/xattr.c 2007-10-14 22:11:05.000000000 -0400
@@ -191,28 +191,11 @@ static struct dentry *get_xa_file_dentry
dput(xadir);
if (err)
xafile = ERR_PTR(err);
- return xafile;
-}
-
-/* Opens a file pointer to the attribute associated with inode */
-static struct file *open_xa_file(const struct inode *inode, const char *name,
- int flags)
-{
- struct dentry *xafile;
- struct file *fp;
-
- xafile = get_xa_file_dentry(inode, name, flags);
- if (IS_ERR(xafile))
- return ERR_PTR(PTR_ERR(xafile));
else if (!xafile->d_inode) {
dput(xafile);
- return ERR_PTR(-ENODATA);
+ xafile = ERR_PTR(-ENODATA);
}
-
- fp = dentry_open(xafile, NULL, O_RDWR);
- /* dentry_open dputs the dentry if it fails */
-
- return fp;
+ return xafile;
}
/*
@@ ...Looks like a merge of Dave's and my patch :) ACK from me, I don't care whether it's one or two patches. -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yeah, it probably is. I did it from scratch since it was my mess, and the patches I saw were against -mm. *shrug* Likewise, I don't care if it's one or two. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHFiJHLPWxlyuTD7IRAojqAJwKS+eL1yCtUVHzBSFUxjjkW6KgPwCcDRUE Q1V7tCPcT9h0a8ahVmYn+ms= =5kMt -----END PGP SIGNATURE----- -
Sorry Jeff, your patch does not apply on 2.6.23-mm1. The 'struct file'
removal from reiserfs_xattr_ function is already in -mm
(make-reiserfs-stop-using-struct-file-for-internal.patch).
The Dave's patch I was refering to is this one:
==== BEGIN =====
The bug is caused by reiserfs creating a special 'struct file' with a
NULL vfsmount.
/* Opens a file pointer to the attribute associated with inode */
static struct file *open_xa_file(const struct inode *inode, const char
*name,
int flags)
{
...
fp = dentry_open(xafile, NULL, O_RDWR);
/* dentry_open dputs the dentry if it fails */
As Christoph just said, this is somewhat of a bandaid. But, it
shouldn't hurt anything.
---
lxc-dave/fs/file_table.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN fs/open.c~fix-reiserfs-oops fs/open.c
diff -puN fs/file_table.c~fix-reiserfs-oops fs/file_table.c
--- lxc/fs/file_table.c~fix-reiserfs-oops 2007-09-27 13:32:20.000000000 -0700
+++ lxc-dave/fs/file_table.c 2007-09-27 13:33:11.000000000 -0700
@@ -236,7 +236,7 @@ void fastcall __fput(struct file *file)
fops_put(file->f_op);
if (file->f_mode & FMODE_WRITE) {
put_write_access(inode);
- if (!special_file(inode->i_mode))
+ if (!special_file(inode->i_mode) && mnt)
mnt_drop_write(mnt);
}
put_pid(file->f_owner.pid);
diff -puN include/linux/mount.h~fix-reiserfs-oops include/linux/mount.h
==== END ====
Dave sent it privately to me... I guess this "bandaid" is no longer
needed now, is it?
~~
laurent
-
We'll need to drop Dave's patch first. Andrew, can you drop it and put this one in instead? -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'd guess not. This patch was actually against mainline. I should've specified. I can work up one against -mm later today if it's needed. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHE8wyLPWxlyuTD7IRAiJrAJ4nC6gwH1cFjWx6BI04O5fDIRftmACcD2wb whyXThHlIBK2phnZ6Pf8Pb8= =Kx6k -----END PGP SIGNATURE----- -
Hi Andrew, The link failure while compiling the kernel with allyesconfig over the lpar, which was seen in 2.6.23-rc8-mm2 (http://lkml.org/lkml/2007/9/30/2) is still seen in 2.6.23-mm1, the link failure is ld: arch/powerpc/kernel/head_64.o(.text+0x80c8): sibling call optimization to `.text.init.refok' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `.text.init.refok' extern ld: arch/powerpc/kernel/head_64.o(.text+0x8160): sibling call optimization to `.text.init.refok' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `.text.init.refok' extern ld: arch/powerpc/kernel/head_64.o(.text+0x81c4): sibling call optimization to `.text.init.refok' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `.text.init.refok' extern ld: final link failed: Bad value make: *** [.tmp_vmlinux1] Error 1 # gcc -v Using built-in specs. Target: powerpc64-suse-linux Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.1.2 --enable-ssp --disable-libssp --disable-libgcj --with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit --enable-libstdcxx-allocator=new --program-suffix=-4.1 --enable-version-specific-runtime-libs --without-system-libunwind --with-cpu=default32 --enable-secureplt --with-long-double-128 --host=powerpc64-suse-linux Thread model: posix gcc version 4.1.2 20061115 (prerelease) (SUSE Linux) ld -v GNU ld version 2.17.50.0.5 20060927 (SUSE Linux) Anything I can provide to help diagnose this? -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. -
Did we work out which patch is causing this? -
Hi Andrew, No, we did not work out on which patch is causing this ! I will try a bisect to find the patch causing this issue. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. -
Hi Andrew, After the bisecting, i found that the patch git-net.patch is the cause for the link failure. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. -
On Sun, 21 Oct 2007 12:12:38 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ib= r the link failure. The actual cause is my patch to mark some things in head_64.S as init_refok. I have a test patch which I will tidy up and post soon. However, even with that fixed, I am running into a linker bug which Alan Modra is looking into. --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
Hello !
While polling the contents of a cgroup task file, I caught the
following corruption. Is there a known race (and a fix) or should
I start digging ?
the program running in the cgroup is fork/exec intensive:
while (1) {
int i, s;
for (i = 0; i < count; i++)
if (fork() == 0)
execlp("/bin/true", "true", 0);
for (i = 0; i < count; i++)
wait(&s);
}
Thanks for any insights,
C.
list_add corruption. next->prev should be prev (ffffffff80a3f338), but was 0000000000200200. (next=ffff810103dcbe90).
------------[ cut here ]------------
kernel BUG at /home/legoater/linux/2.6.23-mm1/lib/list_debug.c:27!
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/local_cpus
CPU 3
Modules linked in: ipt_REJECT iptable_filter autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 2441, comm: bash Not tainted 2.6.23-mm1 #4
RIP: 0010:[<ffffffff80308cda>] [<ffffffff80308cda>] __list_add+0x27/0x5b
RSP: 0018:ffff810103d87dd8 EFLAGS: 00010296
RAX: 0000000000000079 RBX: ffff810105033040 RCX: 0000000000000079
RDX: ffff810103d960c0 RSI: 0000000000000001 RDI: 0000000000000096
RBP: ffff810103d87dd8 R08: 0000000000000002 R09: ffff810008123780
R10: 0000000000000000 R11: ffff810103d87a98 R12: 0000000000000000
R13: ffff810105033040 R14: ffff810104c11ac0 R15: 0000000000000000
FS: 00007f4e273556f0(0000) GS:ffff81010011a840(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000006ca2f8 CR3: 0000000103d82000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 2441, threadinfo ffff810103d86000, task ffff810103d960c0)
last branch before last exception/interrupt
from [<ffffffff80235885>] printk+0x68/0x69
to [<ffffffff80308cda>] __list_add+0x27/0x5b
Stack: ffff810103d87de8 ...Not a known race, no. Sorry, didn't have time to look at this yesterday since I was out of the office all day; I'll try to get a chance today. -
This is a crash on
list_add(&child->cg_list, &child->cgroups->tasks);
in cgroup_post_fork(). So it looks like child->cgroups->tasks.next is
a deleted list element. But there are no places that modify that list
outside of write_lock(&css_set_lock) as far as I can see, so I'm a bit
confused as to what the problem could be. I'll try to reproduce this.
Paul
-
Hi Andrew, The kernel build fails on the power box INSTALL vdso64.so INSTALL vdso32.so BOOTCC arch/powerpc/boot/inflate.o arch/powerpc/boot/inflate.c:920:19: error: errno.h: No such file or directory arch/powerpc/boot/inflate.c:921:18: error: slab.h: No such file or directory arch/powerpc/boot/inflate.c:922:21: error: vmalloc.h: No such file or directory arch/powerpc/boot/inflate.c: In function
This problem is fixed by d4faaecbcc6d9ea4f7c05f6de6af98e2336a4afb in Linus' tree. Paul. -
Hi Paul, Thanks, we tried it out over the 2.6.23-mm1 and the patch fixes the build failure. -- Thanks & Regards, Kamalesh Babulal, -
Ok, now that it boots let's go for more. I cannot suspend if mysqld is running. mysql isn't atually doing anything useful anyway. This is the failed suspend tasks dump of mysql: [ 0.000000] Linux version 2.6.23-mm1-1 (mattia@tadamune) (gcc version 4.2.1 (Debian 4.2.1-3)) #5 SMP PREEMPT Sun Oct 21 13:50:54 JST 2007 ... [ 271.736214] PM: Preparing system for mem sleep [ 271.738185] Freezing user space processes ... [ 291.918090] Freezing of tasks failed after 20.19 seconds (1 tasks refusing to freeze): [ 291.918156] task PC stack pid father ... [ 292.043105] ======================= [ 292.043175] mysqld_safe D c03d40c0 0 2393 1 [ 292.043343] c26b3eac 00000082 c03d0eb0 c03d40c0 c011a850 c011a843 c2626aa0 c2626bd4 [ 292.043803] c17fd0c0 00000000 c26b3e88 c26cc380 c26b3ea8 c011b83a c26b3ea0 00000000 [ 292.044322] 08104d08 00000000 00000000 08104d08 00000000 c26b3eb8 c0141de0 c26b3fb8 [ 292.044843] Call Trace: [ 292.044969] [<c0141de0>] refrigerator+0xcf/0xdb [ 292.045091] [<c012b4d2>] get_signal_to_deliver+0x33/0x414 [ 292.045214] [<c01034e8>] do_notify_resume+0x81/0x61e [ 292.045335] [<c0103f06>] work_notifysig+0x13/0x19 [ 292.045456] ======================= [ 292.045524] mysqld D c03d40c0 0 2430 2393 [ 292.045692] c25d0eac 00000086 c03d0eb0 c03d40c0 c0119eb5 00000000 c1c98550 c1c98684 [ 292.046184] c18060c0 00000001 c25d0e88 c2603000 c25d0ea8 c011b83a c25d0ea0 00000000 [ 292.046705] 00000000 00000000 00000000 00000000 00000000 c25d0eb8 c0141de0 c25d0fb8 [ 292.047272] Call Trace: [ 292.049112] [<c0141de0>] refrigerator+0xcf/0xdb [ 292.049234] [<c012b4d2>] get_signal_to_deliver+0x33/0x414 [ 292.049357] [<c01034e8>] do_notify_resume+0x81/0x61e [ 292.049477] [<c0103f06>] work_notifysig+0x13/0x19 [ 292.049598] ======================= [ 292.049666] mysqld D c03d40c0 0 2433 2393 [ 292.049834] c3000eac 00000086 c03d0eb0 c03d40c0 ...
great, that was the guilty patch in fact. -- -
I believe this is known and rafael already has a fix somewhere. The "guilty" patch already hit mainline, not sure about the "fix" patch. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
The fix has not been merged yet, but
freezer-use-wait-queue-instead-of-busy-looping.patch has been dropped for
another reason.
The mysqld problem seems to have been caused by another patch, though, and the
fix is appended.
Greetings,
Rafael
---
From: Rafael J. Wysocki <rjw@sisk.pl>
Do not allow processes to clear their TIF_SIGPENDING if TIF_FREEZE is set,
so that they will not race with the freezer (like mysqld, for example).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.23-mm1/kernel/signal.c
===================================================================
--- linux-2.6.23-mm1.orig/kernel/signal.c
+++ linux-2.6.23-mm1/kernel/signal.c
@@ -124,7 +124,7 @@ void recalc_sigpending_and_wake(struct t
void recalc_sigpending(void)
{
- if (!recalc_sigpending_tsk(current))
+ if (!recalc_sigpending_tsk(current) && !freezing(current))
clear_thread_flag(TIF_SIGPENDING);
}
-
Hello,
I'm seeing reproducible oops on 2.6.23-mm1 when trying to run tcpdump
over ppp0 interface. To reproduce I type simply:
# tcpdump -i ppp0
and wait a few seconds. I captured two oopses with a bit different stack
trace but EIP always points to packet_rcv():
(gdb) l* 0xc02d7d49
0xc02d7d49 is in packet_rcv (include/linux/netdevice.h:830).
825 static inline int dev_parse_header(const struct sk_buff *skb,
826 unsigned char *haddr)
827 {
828 const struct net_device *dev = skb->dev;
829
830 if (!dev->header_ops->parse)
831 return 0;
832 return dev->header_ops->parse(skb, haddr);
833 }
834
Please find pics attached (sorry for poor quality - I can provide you with better ones
tommorow if needed):
http://tuxland.pl/misc/2.6.23-mm1/DSC00136.JPG
http://tuxland.pl/misc/2.6.23-mm1/DSC00142.JPG
Regards,
Mariusz
Can you please test the latest Linus kernel from ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/? Because all netwrking things which were in 2.6.23-mm1 are now in mainline. So if mainline is OK then that bug presumably got fixed. Thanks. -
You're right. 2.6.23-git17 runs fine so the bug must have been fixed. Regards, Mariusz -
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfria |
