Hi,
The x86 git tree, as of HEAD commit
commit a9efd1225e6e0e78ceeaecc04cec1d428eb8173f
Author: Mike Travis <travis@sgi.com>
Date: Fri Apr 4 18:30:16 2008 -0700
x86: modify Kconfig to allow up to 4096 cpus
doesn't want to work on one of my testboxes (x86-64 desktop,
AMD-based).
First, the X server doesn't want to start (it says it couldn't mmap the
framebuffer).
Second, if I try to suspend the box to RAM, it enters a state it cannot
leave until power is physically cut from it (using the power button to power
off / power on the box doesn't help).
At the same time, 2.6.25-rc8-mm1 works just fine on this box.
Any ideas what to revert?
Thanks,
Rafael
--
btw., Xorg works fine here on a comparable AMD system - but i use a rather new distro (Fedora 8) which has Xorg 7.2. Ingo --
My system is an OpenSUSE 10.3 and it has Xorg 7.2 as well. I think the problem is somehow related to the Radeon. Thanks, Rafael --
The bisection turned up commit ea1441bdf53692c3dc1fd2658addcf1205629661 "x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit" as the one causing problems. Unfortunately, I can't revert cleanly it, because there are two more commits depending on it in a highly nontrivial fashion, so I have reverted all three commits a365998cd2cecfb827469dbd57c29602c106cb83 44f7f90fbe7a3a99aab082f765346514b7b5c705 ea1441bdf53692c3dc1fd2658addcf1205629661 and X starts again. Also, suspend to RAM works from under X. Thanks, Rafael --
Update: With the above three commits reverted both X itself and suspend to RAM from X also work with the current x86-git (as of HEAD equal to 1192aeb957402b45f311895f124e4ca41206843c). Thanks, Rafael --
That also works from under a framebuffer console, so one of these commits (presumably ea1441bdf53692c3dc1fd2658addcf1205629661) also breaks suspend on this box. Thanks, Rafael --
please keep the three patches and applied the two attached debug patches. i wonder if there is some io allocation overlapping with your system. YH
Attached is a boot dmesg output from the current x86 git tree with your two patches applied. Thanks, Rafael
I'm not quite sure what you mean. Can you please tell me what exactly you want me to do? Thanks, Rafael --
can you try to apply the patch i sent to you about agp bridge order reading for buggy silicon? Please boot kernel with "debug"... I want to verify if you can get " Aperture conflicts with PCI mapping. " in your boot log... YH --
The kernel works correctly with this patch applied. dmesg output attached. Thanks, Rafael
thanks guys - i've applied the fix/workaround. Ingo --
It's not present in there: rafael@albercik:~> grep Aperture failing-with-patch-dmesg.log Aperture too small (32 MB) Aperture from AGP @ de000000 size 4096 MB (APSIZE 0) Aperture too small (0 MB) agpgart: Aperture pointing to RAM agpgart: Aperture from AGP @ de000000 size 4096 MB agpgart: Aperture too small (0 MB) Full dmesg output attached. Thanks, Rafael
did you apply the patch like the attached that i sent you in another mail? YH
This dmesg is from a kernel without the patch. The dmesg with the patch applied was sent in a separate message: http://lkml.org/lkml/2008/4/13/122 Thanks, Rafael --
thanks. let me double check that patch... YH --
or you can re pull from x86.git#latest. YH --
On Sun, Apr 13, 2008 at 9:12 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote: please check attached debug patch. and check if you can change GART size in your BIOS setup to 64M instead of 32M Thanks YH
Hm, what tree am I supposed to apply it too:
(1) current x86 git
(2) current x86 git w/ some of your previous patches (which ones in this case)
(3) failing (old) x86 git
(4) failing (old) x86 git w/ some of your previous patches (which ones in this
case)?
Rafael
--
Well, unfortunately current x86.git doesn't even boot on the affected box. It 'cannot open root device "md1" or unknown-block (0,0)' (Ingo, any ideas?). Today I have to take some sleep, so I'll try to debug it tomorrow, unless someone else does it earlier. Thanks, Rafael --
Sounds like you didn't compile in the appropriate RAID support... -hpa --
In fact I did, but I didn't notice that the initrd image was not built correctly due to a local error. Thanks, Rafael --
Happens :) -hpa --
Attached is dmesg output from current x86.git with debug_gart_checking.patch applied. Thanks, Rafael
so basically with all the right patches applied, and GART set to 32MB in the BIOS, Rafael should have more free RAM on his system than ever before :-) i've put all the patches into x86.git/latest (it's all uploaded already as well), so that should give Rafael a one-stop shop to test it out. [i have not applied the debug patch that changes the aperture test from 32MB to 64MB, and it should be unnecessary as well] btw., Yinghai, should we perhaps add a WARN_ON() to those places where we waste RAM (such as the "This costs you 64 MB of RAM" message) - so that kerneloops.org can pick those warnings up? Maybe there are other situations where we waste RAM, and people dont realize it. Ingo --
in Rafael case, just need to ask user to increase GART size in BIOS if more than 4G RAM installed ( or 4G installed with hardware memhole remapping enabled). if less than 4G installed, just take the BIOS setting with 32M YH --
Tested (current x86.git), dmesg output attached. Thanks, Rafael
thanks. looks good. as expected... Checking aperture... AGP bridge at 00:04:00 Aperture from AGP @ de000000 old size 32 MB Aperture size 4096 MB (APSIZE 0) is not right, using settings from NB Aperture from AGP @ de000000 size 32 MB (APSIZE 0) Node 0: aperture @ de000000 size 32 MB Aperture too small (32 MB) than (64 MB) ... agpgart: Detected AGP bridge 20 Setting up ULi AGP. agpgart: AGP aperture is 32M @ 0xde000000 YH --
BTW, what exactly would be the benefit of increasing the aperture size, given that I use a PCI Express graphics adapter? Rafael --
you don't need increase that before you are have less 4G RAM. if you have more than 4G RAM, you may need to increase that to GART for iommu. so other devices that support only dma32 could use extra 32M. YH --
hm, would be nice to have these two debugging patches upstream. Perhaps the printouts should be dependent on some boot parameter? Ingo --
I am using them to print out the io/mmio allocation (from BIOS) before kernel modifying them. YH --
thanks Rafael for bisecting this! This was a rather nasty problem - and i'm wondering what else we could do to harden our hw resource management code. I'm wondering, is there any particular reason why clearly broken resource setup is not detected somewhere, automatically, and WARN_ON()-ed about? for example, in the scheduler code we used to have similar bug patterns again and again: architecture code set up scheduler domains incorrectly and broke the system in subtle ways. So we added sched_domain_debug() which is active under CONFIG_SCHED_DEBUG=y and does a few sanity checks and complains if something is wrong. This caught quite a few bugs whenever the sched-domains code was modified. Ingo --
there is silicon abut about agp bridge aperture order reading... =====> just sent out one patch to work around that also BIOS is sick to allocate overlapping MMIO to the same link.. node 0 link 0: io port [1000, ffffff] TOM: 0000000080000000 aka 2048M node 0 link 0: mmio [e0000000, efffffff] node 0 link 0: mmio [a0000, bffff] node 0 link 0: mmio [80000000, ffffffff] bus: [00,ff] on node 0 link 0 never thought that BIOS could be so sick. ===> already have one work around, need more test next week. YH --
great! basically any and all sickness should be assumed both by the hardware and by the BIOS, _and_ by Linux architecture code as well as it passes stuff to the generic driver layers. So as resources get set up we should have resilience all the way and should be on the lookout for signs of bugs - because breakages are so hard to track down in this area if they go unnoticed during setup. Ingo --
Ingo Molnar <mingo@elte.hu> writes: This whole problem just shows that it was a mistake in the first place to try to redo the BIOS work in Linux. If BIOS doesn't supply MCFG Linux trying to create one (or in general having generalized resource allocation) is just a big mess and will cause endless problems. The standard resource code is just not up to the task and it needs very intimate knowledge of the hardware that the kernel shouldn't have. Again the real fix I think is to just drop all that code in git-x86 again and finally fix LinuxBIOS to do its job properly and pass a proper MCFG (or just forget about using mmconfig with LinuxBIOS - it is not that Type1 suddently doesn't work anymore). Then this code wouldn't be needed at all -Andi --
On Sun, 13 Apr 2008 11:39:10 +0200 I totally agree with this. MCFG has been EXTREMELY fragile for the last years, and I don't see that changing anytime soon. The only thing that works for Linux so far is "if it even smells funny, don't use it". Smelling funny is things like 1a) Bios table and e820 not matching up, or 1b) Bios table and hardware data not matching up 2) The content not matching content gotten via the traditional method 3) ... (bunch of other sanity checks) I guess we really need to have 0) If it's not present in the BIOS do not touch as rule as well. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
It has nothing to LinuxBIOS.
we would trust HW pci conf/msr than BIOS. even I could talk to BIOS
engineers everyday and tell them how to fix the problem in BIOS, some
still can not be fixed because of the legacy BIOS framework or big
mess.
the patchset from me in x86.git is in two folders
1. MCFG fix up for AMD cpu.
2. BUS numa support for AMD cpu with several sockets with muliti ht
links aka. multi peer root buses.
it will try to split root resource (iomem_resource, io_resource)
to different ht links. so when kernel try to assign resource to some
unassigned devices, it can use correct values.
these two patches will not hurt intel platform too.
YH
--
... so you opt to create the big mess in the kernel. Great. And it does not even fixes a real problem, but getting mmconfig or the numa bus discovery to work is not really a too serious issue anyways. At best it is the icing on the cake to enable some relatively obscure functionality and be a little more efficient, but nothing really fundamental. But for those things just expecting a working modern BIOS is quite reasonable. -Andi --
it does fix real problem. when big system with several HT links, and every link some pcie slots. you fully load pci-e cards (with pci bridge). BIOS will stop assign io/mmio resource to left device if it run out of io port range. (though it is supposed to go on to allocate mmio to left devices) ( modern pcie device only need mmio with drivers) With pre set range allocation in NB pci conf, kernel could allocate the resource in every peer root bus ranges. (the code for assign resource to device that is not assigned resource by BIOS --- already in kernel) YH --
On Sun, 13 Apr 2008 12:29:30 -0700 there is a really big difference between assigning PCI device resources and doing a whole thing like MMCFG from scratch. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
that MCONF patchset for AMD fam10h include 1. get mmconfig from MSR, MCFG is using that too, if that is right, and we will get MCONF support when acpi support is off, and MCFG is broken. 2. or assign 0xfc00000000 to that MSR, that is safe too. YH --
On Sun, 13 Apr 2008 22:01:23 -0700 using MCONF when the ACPI support isn't there is just a deathtrap. To be honest, if you want to break the AMD machines out there, who am I to care about that, I work for Intel. But I'm worried someone thinks this can be done for Intel based systems too, and then carry over all the bad bugs to those as well ;( -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I don't want to break any machine. and just want to workaround some bios bug, and use MMCONF when acpi is disabled... YH --
On Sun, 13 Apr 2008 09:58:45 +0200 that would be very welcome, esp if kerneloops.org can pick them up. One thing we also need to do as Linux is get more conservative; (this isn't per se about this specific thing) With MCFG for example we learned over time "if it smells funny don't use it". That concept should be carried much further imo; for example on K8 you can compare the acpi table to the chipset for numa support, and if they don't match, we SHOULD ignore both entirely. The same is true all over; Linux tends to behave as "oh but we think we can make it work anyway", in general imo that's a mistake in the long term, at least for default configs. Because there will be cases where that will break, be it special bioses or next gens of chipsets. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
i used your config on an AMD system here and s2ram works just fine, both using CONFIG_PM_TEST_SUSPEND=y bootup suspend self-test [which x86.git QA uses all the time], and using a manual pm-suspend command at the console. you can also try your luck and remove the last 20% of x86.git [which is always the newest stuff], by picking a commit 200 patches down the line, via: git-rev-list x86/base..x86/latest | head -200 | tail -1 and testing that. If that tree works, it's the last 200 commits that break stuff. exactly what kind of system are you using? If you revert the trampoline changes, does it get any better - but i guess it might be better to do a bisection. Ingo --
It's an Athlon 64 X2 on an ULi-based AsRock motherboard with Radeon X300SE I'll try to figure out what is the last good commit. Thanks, Rafael --
hm, that's very close to the system i tried: Athlon64 X2 with Radeon X300SE (PCIe), 1GB RAM. (Asus A8N-E mobo) Ingo --
