Re: x86 git tree broken (bisected)

Previous thread: nfs: infinite loop in fcntl(F_SETLKW) by Miklos Szeredi on Thursday, April 10, 2008 - 3:51 pm. (19 messages)

Next thread: [PATCH 2/3] ide-{floppy,tape,scsi}: 400ns delay is required after executing the command by Bartlomiej Zolnierkiewicz on Thursday, April 10, 2008 - 4:26 pm. (1 message)
To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>
Date: Thursday, April 10, 2008 - 3:59 pm

Hi,

The x86 git tree, as of HEAD commit

commit a9efd1225e6e0e78ceeaecc04cec1d428eb8173f
Author: Mike Travis <travis@sgi.com>
Date: Fri Apr 4 18:30:16 2008 -0700

x86: modify Kconfig to allow up to 4096 cpus

doesn't want to work on one of my testboxes (x86-64 desktop,
AMD-based).

First, the X server doesn't want to start (it says it couldn't mmap the
framebuffer).

Second, if I try to suspend the box to RAM, it enters a state it cannot
leave until power is physically cut from it (using the power button to power
off / power on the box doesn't help).

At the same time, 2.6.25-rc8-mm1 works just fine on this box.

Any ideas what to revert?

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>
Date: Thursday, April 10, 2008 - 4:51 pm

i used your config on an AMD system here and s2ram works just fine, both
using CONFIG_PM_TEST_SUSPEND=y bootup suspend self-test [which x86.git
QA uses all the time], and using a manual pm-suspend command at the
console.

you can also try your luck and remove the last 20% of x86.git [which is
always the newest stuff], by picking a commit 200 patches down the line,
via:

git-rev-list x86/base..x86/latest | head -200 | tail -1

and testing that. If that tree works, it's the last 200 commits that
break stuff.

exactly what kind of system are you using? If you revert the trampoline
changes, does it get any better - but i guess it might be better to do a
bisection.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>
Date: Thursday, April 10, 2008 - 6:27 pm

It's an Athlon 64 X2 on an ULi-based AsRock motherboard with Radeon X300SE

I'll try to figure out what is the last good commit.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>
Date: Friday, April 11, 2008 - 2:43 am

hm, that's very close to the system i tried: Athlon64 X2 with Radeon
X300SE (PCIe), 1GB RAM. (Asus A8N-E mobo)

Ingo
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>
Date: Thursday, April 10, 2008 - 4:13 pm

could you send your .config?

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>
Date: Thursday, April 10, 2008 - 4:25 pm

Attached.

Thanks,
Rafael

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Thursday, April 10, 2008 - 4:29 pm

could you disable this option:

CONFIG_NONPROMISC_DEVMEM=y

does it help with the X problem?

Ingo
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Thursday, April 10, 2008 - 4:38 pm

btw., Xorg works fine here on a comparable AMD system - but i use a
rather new distro (Fedora 8) which has Xorg 7.2.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Thursday, April 10, 2008 - 6:28 pm

My system is an OpenSUSE 10.3 and it has Xorg 7.2 as well.

I think the problem is somehow related to the Radeon.

Thanks,
Rafael
--

To: Ingo Molnar <mingo@...>, Yinghai Lu <yinghai.lu@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 3:26 pm

The bisection turned up commit ea1441bdf53692c3dc1fd2658addcf1205629661
"x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit" as the one
causing problems.

Unfortunately, I can't revert cleanly it, because there are two more commits
depending on it in a highly nontrivial fashion, so I have reverted all three
commits

a365998cd2cecfb827469dbd57c29602c106cb83
44f7f90fbe7a3a99aab082f765346514b7b5c705
ea1441bdf53692c3dc1fd2658addcf1205629661

and X starts again. Also, suspend to RAM works from under X.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Yinghai Lu <yinghai.lu@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 3:58 am

thanks Rafael for bisecting this!

This was a rather nasty problem - and i'm wondering what else we could
do to harden our hw resource management code. I'm wondering, is there
any particular reason why clearly broken resource setup is not detected
somewhere, automatically, and WARN_ON()-ed about?

for example, in the scheduler code we used to have similar bug patterns
again and again: architecture code set up scheduler domains incorrectly
and broke the system in subtle ways. So we added sched_domain_debug()
which is active under CONFIG_SCHED_DEBUG=y and does a few sanity checks
and complains if something is wrong. This caught quite a few bugs
whenever the sched-domains code was modified.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Rafael J. Wysocki <rjw@...>, Yinghai Lu <yinghai.lu@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 11:48 am

On Sun, 13 Apr 2008 09:58:45 +0200

that would be very welcome, esp if kerneloops.org can pick them up.

One thing we also need to do as Linux is get more conservative;
(this isn't per se about this specific thing)

With MCFG for example we learned over time "if it smells funny don't use it".
That concept should be carried much further imo; for example on K8 you
can compare the acpi table to the chipset for numa support, and if they don't match,
we SHOULD ignore both entirely.
The same is true all over; Linux tends to behave as "oh but we think we can make it work anyway",
in general imo that's a mistake in the long term, at least for default configs. Because there
will be cases where that will break, be it special bioses or next gens of chipsets.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Ingo Molnar <mingo@...>
Cc: Rafael J. Wysocki <rjw@...>, Yinghai Lu <yinghai.lu@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 5:39 am

Ingo Molnar <mingo@elte.hu> writes:

This whole problem just shows that it was a mistake in the first place
to try to redo the BIOS work in Linux. If BIOS doesn't supply MCFG
Linux trying to create one (or in general having generalized resource
allocation) is just a big mess and will cause endless problems. The
standard resource code is just not up to the task and it needs very
intimate knowledge of the hardware that the kernel shouldn't have.

Again the real fix I think is to just drop all that code in git-x86
again and finally fix LinuxBIOS to do its job properly and pass a
proper MCFG (or just forget about using mmconfig with LinuxBIOS - it
is not that Type1 suddently doesn't work anymore). Then this code
wouldn't be needed at all

-Andi
--

To: Andi Kleen <andi@...>
Cc: Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 2:19 pm

It has nothing to LinuxBIOS.

we would trust HW pci conf/msr than BIOS. even I could talk to BIOS
engineers everyday and tell them how to fix the problem in BIOS, some
still can not be fixed because of the legacy BIOS framework or big
mess.

the patchset from me in x86.git is in two folders
1. MCFG fix up for AMD cpu.
2. BUS numa support for AMD cpu with several sockets with muliti ht
links aka. multi peer root buses.
it will try to split root resource (iomem_resource, io_resource)
to different ht links. so when kernel try to assign resource to some
unassigned devices, it can use correct values.

these two patches will not hurt intel platform too.

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Andi Kleen <andi@...>, Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 2:29 pm

... so you opt to create the big mess in the kernel. Great.

And it does not even fixes a real problem, but getting
mmconfig or the numa bus discovery to work is not really a too serious
issue anyways. At best it is the icing on the cake to enable
some relatively obscure functionality and be a little more
efficient, but nothing really fundamental.

But for those things just expecting a working modern BIOS is quite
reasonable.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 3:29 pm

it does fix real problem. when big system with several HT links, and
every link some pcie slots.
you fully load pci-e cards (with pci bridge). BIOS will stop assign
io/mmio resource to left device if it run out of io port range.
(though it is supposed to go on to allocate mmio to left devices) (
modern pcie device only need mmio with drivers)

With pre set range allocation in NB pci conf, kernel could allocate
the resource in every peer root bus ranges.
(the code for assign resource to device that is not assigned resource
by BIOS --- already in kernel)

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Andi Kleen <andi@...>, Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 11:52 pm

On Sun, 13 Apr 2008 12:29:30 -0700

there is a really big difference between assigning PCI device resources
and doing a whole thing like MMCFG from scratch.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Greg Kroah-Hartman <gregkh@...>
Date: Monday, April 14, 2008 - 1:01 am

that MCONF patchset for AMD fam10h include
1. get mmconfig from MSR, MCFG is using that too, if that is right,
and we will get MCONF support when acpi support is off, and MCFG is
broken.
2. or assign 0xfc00000000 to that MSR, that is safe too.

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Andi Kleen <andi@...>, Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Greg Kroah-Hartman <gregkh@...>
Date: Monday, April 14, 2008 - 10:12 am

On Sun, 13 Apr 2008 22:01:23 -0700

using MCONF when the ACPI support isn't there is just a deathtrap.
To be honest, if you want to break the AMD machines out there, who am
I to care about that, I work for Intel. But I'm worried someone thinks
this can be done for Intel based systems too, and then carry over all
the bad bugs to those as well ;(

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Greg Kroah-Hartman <gregkh@...>
Date: Monday, April 14, 2008 - 2:11 pm

I don't want to break any machine. and just want to workaround some
bios bug, and use MMCONF when acpi is disabled...

YH
--

To: Andi Kleen <andi@...>
Cc: Ingo Molnar <mingo@...>, Rafael J. Wysocki <rjw@...>, Yinghai Lu <yinghai.lu@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 1:53 pm

On Sun, 13 Apr 2008 11:39:10 +0200

I totally agree with this. MCFG has been EXTREMELY fragile for the last years,
and I don't see that changing anytime soon.
The only thing that works for Linux so far is "if it even smells funny, don't use it".
Smelling funny is things like
1a) Bios table and e820 not matching up, or
1b) Bios table and hardware data not matching up
2) The content not matching content gotten via the traditional method
3) ... (bunch of other sanity checks)

I guess we really need to have
0) If it's not present in the BIOS do not touch
as rule as well.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Ingo Molnar <mingo@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 4:18 am

there is silicon abut about agp bridge aperture order reading...
=====> just sent out one patch to work around that

also BIOS is sick to allocate overlapping MMIO to the same link..

node 0 link 0: io port [1000, ffffff]
TOM: 0000000080000000 aka 2048M
node 0 link 0: mmio [e0000000, efffffff]
node 0 link 0: mmio [a0000, bffff]
node 0 link 0: mmio [80000000, ffffffff]
bus: [00,ff] on node 0 link 0

never thought that BIOS could be so sick.
===> already have one work around, need more test next week.

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 5:19 am

great!

basically any and all sickness should be assumed both by the hardware
and by the BIOS, _and_ by Linux architecture code as well as it passes
stuff to the generic driver layers. So as resources get set up we should
have resilience all the way and should be on the lookout for signs of
bugs - because breakages are so hard to track down in this area if they
go unnoticed during setup.

Ingo
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 4:26 pm

please keep the three patches and applied the two attached debug patches.

i wonder if there is some io allocation overlapping with your system.

YH

To: Yinghai Lu <yhlu.kernel@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 3:51 am

hm, would be nice to have these two debugging patches upstream. Perhaps
the printouts should be dependent on some boot parameter?

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>, Greg Kroah-Hartman <gregkh@...>
Date: Sunday, April 13, 2008 - 3:59 am

I am using them to print out the io/mmio allocation (from BIOS) before
kernel modifying them.

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 4:51 pm

Attached is a boot dmesg output from the current x86 git tree with your two
patches applied.

Thanks,
Rafael

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 4:24 am

can you try to apply the patch i sent to you about agp bridge order
reading for buggy silicon?

Please boot kernel with "debug"...

I want to verify if you can get

"
Aperture conflicts with PCI mapping.
"

in your boot log...

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 12:12 pm

It's not present in there:

rafael@albercik:~> grep Aperture failing-with-patch-dmesg.log
Aperture too small (32 MB)
Aperture from AGP @ de000000 size 4096 MB (APSIZE 0)
Aperture too small (0 MB)
agpgart: Aperture pointing to RAM
agpgart: Aperture from AGP @ de000000 size 4096 MB
agpgart: Aperture too small (0 MB)

Full dmesg output attached.

Thanks,
Rafael

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 6:00 pm

On Sun, Apr 13, 2008 at 9:12 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:

please check attached debug patch. and check if you can change GART
size in your BIOS setup to 64M instead of 32M

Thanks

YH

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 6:10 pm

Hm, what tree am I supposed to apply it too:
(1) current x86 git
(2) current x86 git w/ some of your previous patches (which ones in this case)
(3) failing (old) x86 git
(4) failing (old) x86 git w/ some of your previous patches (which ones in this
case)?

Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 6:32 pm

(1) current x86.git

Thanks

Yinghai Lu
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 8:19 pm

Attached is dmesg output from current x86.git with debug_gart_checking.patch
applied.

Thanks,
Rafael

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 9:42 pm

please test the final one ... ...

You should get back 64M memory back.

Thanks

Yinghai Lu

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 4:21 pm

Tested (current x86.git), dmesg output attached.

Thanks,
Rafael

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 5:06 pm

thanks.

looks good. as expected...

Checking aperture...
AGP bridge at 00:04:00
Aperture from AGP @ de000000 old size 32 MB
Aperture size 4096 MB (APSIZE 0) is not right, using settings from NB
Aperture from AGP @ de000000 size 32 MB (APSIZE 0)
Node 0: aperture @ de000000 size 32 MB
Aperture too small (32 MB) than (64 MB)
...
agpgart: Detected AGP bridge 20
Setting up ULi AGP.
agpgart: AGP aperture is 32M @ 0xde000000

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 5:09 pm

BTW, what exactly would be the benefit of increasing the aperture size, given
that I use a PCI Express graphics adapter?

Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 5:36 pm

you don't need increase that before you are have less 4G RAM.

if you have more than 4G RAM, you may need to increase that to GART
for iommu. so other devices that support only dma32 could use extra
32M.

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 5:16 am

so basically with all the right patches applied, and GART set to 32MB in
the BIOS, Rafael should have more free RAM on his system than ever
before :-)

i've put all the patches into x86.git/latest (it's all uploaded already
as well), so that should give Rafael a one-stop shop to test it out. [i
have not applied the debug patch that changes the aperture test from
32MB to 64MB, and it should be unnecessary as well]

btw., Yinghai, should we perhaps add a WARN_ON() to those places where
we waste RAM (such as the "This costs you 64 MB of RAM" message) - so
that kerneloops.org can pick those warnings up? Maybe there are other
situations where we waste RAM, and people dont realize it.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 2:08 pm

in Rafael case, just need to ask user to increase GART size in BIOS if
more than 4G RAM installed ( or 4G installed with hardware memhole
remapping enabled).

if less than 4G installed, just take the BIOS setting with 32M

YH
--

To: Yinghai Lu <yhlu.kernel@...>, Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 7:41 pm

Well, unfortunately current x86.git doesn't even boot on the affected box.
It 'cannot open root device "md1" or unknown-block (0,0)' (Ingo, any ideas?).

Today I have to take some sleep, so I'll try to debug it tomorrow, unless
someone else does it earlier.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 7:45 pm

Sounds like you didn't compile in the appropriate RAID support...

-hpa

--

To: H. Peter Anvin <hpa@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 8:09 pm

In fact I did, but I didn't notice that the initrd image was not built
correctly due to a local error.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 8:12 pm

Happens :)

-hpa

--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 2:07 pm

did you apply the patch like the attached that i sent you in another mail?

YH

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 2:47 pm

This dmesg is from a kernel without the patch.

The dmesg with the patch applied was sent in a separate message:
http://lkml.org/lkml/2008/4/13/122

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 2:54 pm

or you can re pull from x86.git#latest.

YH
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 2:53 pm

thanks. let me double check that patch...

YH
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 4:41 am

then with this patch for io allocation overlapping...

YH

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Sunday, April 13, 2008 - 12:14 pm

The kernel works correctly with this patch applied.

dmesg output attached.

Thanks,
Rafael

To: Rafael J. Wysocki <rjw@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Monday, April 14, 2008 - 5:02 am

thanks guys - i've applied the fix/workaround.

Ingo
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 5:11 pm

can you put boot in your command line?

Thanks

YH
--

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 5:21 pm

I'm not quite sure what you mean.

Can you please tell me what exactly you want me to do?

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 5:31 pm

i got some hint. Will send you one patch to workaround the overlapping.

YH
--

To: Ingo Molnar <mingo@...>
Cc: Yinghai Lu <yinghai.lu@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 4:23 pm

Update:

With the above three commits reverted both X itself and suspend to RAM from
X also work with the current x86-git (as of HEAD equal to
1192aeb957402b45f311895f124e4ca41206843c).

Thanks,
Rafael
--

To: Ingo Molnar <mingo@...>
Cc: Yinghai Lu <yinghai.lu@...>, Andrew Morton <akpm@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Thomas Gleixner <tglx@...>, H. Anvin <hpa@...>, Arjan van de Ven <arjan@...>
Date: Friday, April 11, 2008 - 4:29 pm

That also works from under a framebuffer console, so one of these commits
(presumably ea1441bdf53692c3dc1fd2658addcf1205629661) also breaks suspend on
this box.

Thanks,
Rafael
--

Previous thread: nfs: infinite loop in fcntl(F_SETLKW) by Miklos Szeredi on Thursday, April 10, 2008 - 3:51 pm. (19 messages)

Next thread: [PATCH 2/3] ide-{floppy,tape,scsi}: 400ns delay is required after executing the command by Bartlomiej Zolnierkiewicz on Thursday, April 10, 2008 - 4:26 pm. (1 message)