Re: 2.6.31 regression: programs crashing after a couple of days of uptime with hibernation

Previous thread: [PATCH 9/9] Staging: rtl8187se: fixed checkpatch.pl warnings and errors in r8180_rtl8225z2.c by Olimpiu Pascariu on Thursday, April 1, 2010 - 1:48 pm. (1 message)

Next thread: 2.6.33.1 pwc (quickcam pro 4000 is broken) by Justin Piszcz on Thursday, April 1, 2010 - 2:45 pm. (1 message)
From: Ondrej Zary
Date: Thursday, April 1, 2010 - 2:32 pm

Hello,
with kernel 2.6.30, I can have uptime of more than a month on my desktop PC
(with hibernation). It's impossible with 2.6.31. After 1 to 3 days, processes
that were running during hibernation (e.g. konsole, kwin, kicker, xorg) start
to crash randomly in very weird ways. The kernel itself does not seem to
crash. When I run the crashed program again, it seems to work. Looks like
some memory corruption.

This bug is also present in 2.6.32 and 2.6.33.

This is very hard to debug as the test case takes 3 days. I've been trying to
bisect it anyway. And after long time, I failed. Got this, which is obviously
wrong:
187f81b3d8d315c35c73ac0d05b15a04a0ac3ce7 is first bad commit

git bisect log
git-bisect start
# good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
git-bisect good 07a2039b8eb0af4ff464efd3dfd95de5c02648c6
# bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git-bisect bad 74fca6a42863ffacaf7ba6f1936a9f228950f657
# good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): videobuf: modify return value of VIDIOC_REQBUFS ioctl
git-bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
# bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device capabilities of 82599 single speed fiber NICs.
git-bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
# good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: lowmemorykiller: fix up remaining checkpatch warnings
git-bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
# good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git-bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
# bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch 'for-linus' of git://www.jni.nu/cris
git-bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
# bad: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge git://git.infradead.org/mtd-2.6
git-bisect bad ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
# bad: ...
From: Rafael J. Wysocki
Date: Thursday, April 1, 2010 - 2:45 pm

Is the kernel 32-bit or 64-bit?  What kind of CPU is there in the box?

Rafael
--

From: Ondrej Zary
Date: Thursday, April 1, 2010 - 3:03 pm

It's old 32-bit i686 CPU - Cyrix MII.

Linux version 2.6.31-pentium (rainbow@pentium) (gcc version 4.3.3 (GCC) ) #5 PREEMPT Fri Oct 2 19:10:10 CEST 2009
KERNEL supported cpus:
  NSC Geode by NSC
  Cyrix CyrixInstead
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
 BIOS-e820: 000000000fff0000 - 000000000fff3000 (ACPI NVS)
 BIOS-e820: 000000000fff3000 - 0000000010000000 (ACPI data)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
DMI 2.0 present.
last_pfn = 0xfff0 max_arch_pfn = 0x100000
initial memory mapped : 0 - 00800000
init_memory_mapping: 0000000000000000-000000000fff0000
 0000000000 - 000fff0000 page 4k
kernel direct mapping tables up to fff0000 @ 7000-4a000
ACPI: RSDP 000f6c40 00014 (v00 123456)
ACPI: RSDT 0fff3000 00028 (v01 123456 AWRDACPI 00000000      00000000)
ACPI: FACP 0fff3040 00074 (v01 123456 AWRDACPI 00000000      00000000)
ACPI: DSDT 0fff30c0 02313 (v01 123456 AWRDACPI 00001000 MSFT 01000007)
ACPI: FACS 0fff0000 00040
255MB LOWMEM available.
  mapped low ram: 0 - 0fff0000
  low ram: 0 - 0fff0000
  node 0 low ram: 00000000 - 0fff0000
  node 0 bootmap 00001000 - 00003000
(6 early reservations) ==> bootmem [0000000000 - 000fff0000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000100000 - 00004573a0]    TEXT DATA BSS ==> [0000100000 - 00004573a0]
  #2 [000009f800 - 0000100000]    BIOS reserved ==> [000009f800 - 0000100000]
  #3 [0000458000 - 000045a0a2]              BRK ==> [0000458000 - 000045a0a2]
  #4 [0000007000 - 0000045000]          PGTABLE ==> [0000007000 - 0000045000]
  #5 [0000001000 - 0000003000]          BOOTMAP ==> [0000001000 - 0000003000]
Zone PFN ranges:
  DMA      0x00000000 -> 0x00001000
  Normal   0x00001000 -> 0x0000fff0
Movable zone start PFN for each ...
From: Ondrej Zary
Date: Friday, April 2, 2010 - 3:53 am

Thanks, I'll try it. But I doubt that it will fix the problem. There were no 
changes in hibernate_asm_32.S between 2.6.30 and 2.6.31.

-- 
Ondrej Zary
--

From: Rafael J. Wysocki
Date: Friday, April 2, 2010 - 9:58 am

The change that exposed this issue was elsewhere in the x86 arch code.  I don't
know where exactly, but we surely didn't need the above patch before.

Rafael
--

From: Ondrej Zary
Date: Saturday, April 10, 2010 - 2:03 am

It did not help. It started crashing with 2.6.32 and this patch applied after 
two days.

-- 
Ondrej Zary
--

From: Rafael J. Wysocki
Date: Saturday, April 10, 2010 - 12:58 pm

Well, no idea, then.

Rafael
--

From: Ondrej Zary
Date: Saturday, April 10, 2010 - 2:03 pm

So I'll try to bisect it again.

Here's a bug report so it won't get lost: 
https://bugzilla.kernel.org/show_bug.cgi?id=15753

-- 
Ondrej Zary
--

Previous thread: [PATCH 9/9] Staging: rtl8187se: fixed checkpatch.pl warnings and errors in r8180_rtl8225z2.c by Olimpiu Pascariu on Thursday, April 1, 2010 - 1:48 pm. (1 message)

Next thread: 2.6.33.1 pwc (quickcam pro 4000 is broken) by Justin Piszcz on Thursday, April 1, 2010 - 2:45 pm. (1 message)