Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6

Previous thread: [PATCH -mm -v3 2/2] i386/x86_64 boot: document for 32 bit boot protocol by Huang, Ying on Wednesday, September 19, 2007 - 4:58 am. (1 message)

Next thread: [RFC PATCH] 2.6.22.6 netfilter: sk_setup_caps in ip_make_route_harder by lepton on Wednesday, September 19, 2007 - 5:36 am. (2 messages)
To: <linux-kernel@...>
Date: Wednesday, September 19, 2007 - 4:45 am

[1.] Summary
System Freeze on Particular workload with kernel 2.6.22.6

[2.] Description
System freezes on repeated application of the following command
for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done

Problem is consistent and repeatable.
Problem persists when running on a different drive, and also in pure console (no X).

One time, the following error logged in syslog:
Sep 19 04:22:11 mossnew kernel: [ 301.883919] VM: killing process convert
Sep 19 04:22:11 mossnew kernel: [ 301.884382] swap_free: Unused swap offset entry 0000ff00
Sep 19 04:22:11 mossnew kernel: [ 301.884421] swap_free: Unused swap offset entry 00000300
Sep 19 04:22:11 mossnew kernel: [ 301.884456] swap_free: Unused swap offset entry 00000200
Sep 19 04:22:11 mossnew kernel: [ 301.884491] swap_free: Unused swap offset entry 0000ff00
Sep 19 04:22:11 mossnew kernel: [ 301.884527] swap_free: Unused swap offset entry 0000ff00
Sep 19 04:22:11 mossnew kernel: [ 301.884562] swap_free: Unused swap offset entry 00000100

Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no errors.
Should not be a CPU problem either. I have been running CPU intensive tasks for days.

[3.] Keywords
freeze, swap_free,VM

[4.] /proc/version
Linux version 2.6.22.6intelcore2 (root@mossnew) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #1 SMP Sat Sep 15 00:29:00 EDT 2007

[5.] No Oops

[6.] Trigger
- Create a large number of png images. (a few hundred)

- repeatedly run
for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done

- This might be subjective, but the freeze seems to show up sooner if there is a CPU heavy
process running in the background.

[7] Environment
[7.1] Software /script/ver_linux

Linux mossnew 2.6.22.6intelcore2 #1 SMP Sat Sep 15 00:29:00 EDT 2007 x86_64 GNU/Linux

Gnu C 4.1.2
Gnu make 3.81
binutils 2.17.50
util-linux 2.12r
mount 2.12r
module-init-tools 3.3-pre2
e2fsprogs ...

To: Low Yucheng <ylow@...>
Cc: <linux-kernel@...>
Date: Thursday, September 20, 2007 - 12:14 pm

The "Unused swap offset entry" is almost always a sign of bad memory,
if google can be trusted. Your workload is *extremely* CPU and memory
intensive (and even hits the disk!), so this looks like bad RAM, bad
cooling, or a marginal power supply that is failing under load.

memtest86+ doesn't stress the CPU nearly as much, so it often doesn't
show all the problems.

Take your RAM down to one stick and try again (looks like you have 2G
installed?). If that still fails, try different RAM. If that still
fails, then swap out the power supply for another if you can, and try
again.

Ray
-

To: Ray Lee <ray-lk@...>
Cc: <linux-kernel@...>
Date: Thursday, September 20, 2007 - 9:04 pm

Hi all,

Thanks all. After lots of testing, I isolated the problem to one of the
memory modules.

Thought it might have been a kernel problem as I thought memtest should
be exhaustive enough considering I ran it for so long, but apparently not...
Even now, the bad module still does not show any errors in memtest...

Thanks,
Yucheng

-

To: Low Yucheng <ylow@...>
Cc: <linux-kernel@...>, <linux-mm@...>, Andrew Morton <akpm@...>
Date: Wednesday, September 19, 2007 - 11:47 am

Nice bug report, seems like from linux-source/REPORTING-BUGS.
But still:

* no relevant Cc (memory management added)
+ no output of `mount` (because if swap is on some file system, that
*can* be another problem)
+ no information about amount of memory and its BIOS configuration

FYI, latter two (and much more) is one `dmesg` output. This output,
together with any other kernel information can be gathered by serial or
net consoles:

linux-source/Documentation/serial-console.txt
linux-source/Documentation/networking/netconsole.txt

If console messages after freeze can be seen in text mode VGA/CRT
[]
____
-

To: Oleg Verych <olecom@...>
Cc: <linux-kernel@...>, <linux-mm@...>, Andrew Morton <akpm@...>
Date: Wednesday, September 19, 2007 - 12:16 pm

There are no additional console messages.
Not sure what this is: * no relevant Cc (memory management added)

But output of dmesg as requested:

[ 0.000000] Linux version 2.6.22.6intelcore2 (root@mossnew) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #1 SMP Sat Sep 15 00:29:00 EDT 2007
[ 0.000000] Command line: root=UUID=07b82da5-efcc-4d75-a31b-c01ccc3b2c14 ro quiet splash
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000007ff80000 (usable)
[ 0.000000] BIOS-e820: 000000007ff80000 - 000000007ff8e000 (ACPI data)
[ 0.000000] BIOS-e820: 000000007ff8e000 - 000000007ffe0000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007ffe0000 - 0000000080000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[ 0.000000] BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[ 0.000000] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
[ 0.000000] Entering add_active_range(0, 256, 524160) 1 entries of 3200 used
[ 0.000000] end_pfn_map = 1048576
[ 0.000000] DMI 2.4 present.
[ 0.000000] ACPI: RSDP 000FBCD0, 0014 (r0 ACPIAM)
[ 0.000000] ACPI: RSDT 7FF80000, 003C (r1 A_M_I_ OEMRSDT 7000703 MSFT 97)
[ 0.000000] ACPI: FACP 7FF80200, 0084 (r2 A_M_I_ OEMFACP 7000703 MSFT 97)
[ 0.000000] ACPI: DSDT 7FF805C0, 8C00 (r1 A0751 A0751055 55 INTL 20060113)
[ 0.000000] ACPI: FACS 7FF8E000, 0040
[ 0.000000] ACPI: APIC 7FF80390, 006C (r1 A_M_I_ OEMAPIC 7000703 MSFT 97)
[ 0.000000] ACPI: MCFG 7FF80400, 003C (r1 A_M_I_ OEMMCFG 7000703 MSFT 97)
[ 0.000000] ACPI: OEMB 7FF8E040, 0081 (r1 A_M_I_ AMI_OEM 7000703 MSFT 97)
[ 0.000000] ACPI: HPET 7FF891C0, 0038 (r1 A_M_I_ OEMHPET 7000703 MSFT 97)
[ 0.000000] A...

To: Low Yucheng <ylow@...>
Cc: Oleg Verych <olecom@...>, <linux-kernel@...>, <linux-mm@...>, Andrew Morton <akpm@...>
Date: Saturday, December 1, 2007 - 6:39 pm

Was the system still pingable?

Regards,

Daniel
--

To: Low Yucheng <ylow@...>
Cc: Oleg Verych <olecom@...>, <linux-kernel@...>, <linux-mm@...>, Andrew Morton <akpm@...>
Date: Wednesday, September 19, 2007 - 3:25 pm

Hi Low,

Relevant CCs means CCing maintainers or subsystem mailing lists related to your
bug report. i.e, if it's a networking bug, you need to CC the linux kernel
networking mailing list. If it's a kobject bug, you need to CC its maintainer
(Greg) and so on.

Regards,

--
Ahmed S. Darwish
HomePage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Low Yucheng <ylow@...>, Oleg Verych <olecom@...>, <linux-kernel@...>, <linux-mm@...>, Andrew Morton <akpm@...>
Date: Thursday, September 20, 2007 - 6:00 am

So, which one do you recommend here?

Regards,
Jarek P.

PS#1: I don't think we should require from users so much expertise
in bug reporting: after a few questions cc-ing should be no problem
here.

PS#2: Low Yucheng: maybe it's something else, but it seems your swap
could be bigger for this amount of memory. (You could try to monitor
this e.g. with "top" running in another console window.)
-

To: Jarek Poplawski <jarkao2@...>
Cc: Low Yucheng <ylow@...>, Oleg Verych <olecom@...>, <linux-kernel@...>, <linux-mm@...>, Andrew Morton <akpm@...>
Date: Thursday, September 20, 2007 - 11:24 am

I'm not really sure, just wanted to solve Jarek's confusion :).

Regards,

--
Ahmed S. Darwish
HomePage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

Previous thread: [PATCH -mm -v3 2/2] i386/x86_64 boot: document for 32 bit boot protocol by Huang, Ying on Wednesday, September 19, 2007 - 4:58 am. (1 message)

Next thread: [RFC PATCH] 2.6.22.6 netfilter: sk_setup_caps in ip_make_route_harder by lepton on Wednesday, September 19, 2007 - 5:36 am. (2 messages)