Some incremental changes and bug fixes for PAT patchset. The changes are from the feedback we received earlier. There are few more pending changes that will follow soon. Thanks, Venki -- --
thanks, applied them to x86.git. Note that PAT is still hardcoded to disabled in arch/x86/mm/pat.c: int __read_mostly pat_disabled = 1; because one of my testsystems failed during bootup. I'll re-check whether these fixes resolve that, and if it passes then we could enable PAT. Ingo --
Hi, I just want to report that the PAT support in x86/mm causes crashes on two of my test machines. On both boxes the SATA detection does not work when the PAT support is patched into the kernel. Symptoms are as follows -- best described by a diff between the two boot.logs: # diff boot-failing.log boot-working.log -Linux version 2.6.24-rc8-ga9f7faa5 (root@hunter) (gcc version ... +Linux version 2.6.24-rc8-g2ea3cf43 (root@hunter) (gcc version ... ... early_iounmap(ffffffff82a0b000, 00001000) -early_ioremap(000000000000c000, 00001000) => -000002103394304 -early_iounmap(ffffffff82a0c000, 00001000) early_iounmap(ffffffff82808000, 00001000) ... -ACPI: PCI interrupt for device 0000:00:12.0 disabled -sata_sil: probe of 0000:00:12.0 failed with error -12 +scsi0 : sata_sil +scsi1 : sata_sil +ata1: SATA max UDMA/100 mmio m512@0xc0403000 tf 0xc0403080 irq 22 ... -AC'97 space ioremap problem -ACPI: PCI interrupt for device 0000:00:14.5 disabled -ATI IXP AC97 controller: probe of 0000:00:14.5 failed with error -5 ALSA device list: - No soundcards found. + #0: ATI IXP rev 80 with ALC655 at 0xc0403800, irq 17 ... -VFS: Cannot open root device "sda1" or unknown-block(0,0) -Please append a correct "root=" boot option; here are the available partitions: -1600 4194302 hdc driver: ide-cdrom -Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) +kjournald starting. Commit interval 5 seconds +EXT3-fs: mounted filesystem with ordered data mode. +VFS: Mounted root (ext3 filesystem) readonly. ... <snip> The second test machine uses ahci. But the symptoms are similar. I performed a git-bisect on x86/mm. Last commit that worked for me was 2ea3cf43fddecbfd66353caafdf73ec21ea3760b (x86: fix early_ioremap() ISA window) The subsequent commits for PAT support introduced the problem. I noticed that PAT should be disabled by default, but obviously the patches still have some side-effect. (Maybe ioremap changes lead to the problem?) Boot-logs ...
Can you attach the e820 map from the top of your dmesg. Thanks, --
hm, so the early_ioremap() stuff isnt working well enough ... that's the main effect of the PAT patches at the moment: no kernel code will access the low linear mappings (BIOS tables, ACPI data, etc.) directly, it's all done via early_ioremap(). But it's apparently buggy somewhere ... Ingo --
This does not look to be the problem here. We just mapped some new low
This ioremap failing seems to be the real problem. This can be due to
new tracking of ioremaps introduced by PAT patches. We do not allow
conflicting ioremaps to same region. Probably that is happening
in both Sound and sata initialization which results in driver init failing.
Can you please try the debug patch below over latest x86/mm and boot kernel with
debug boot option and send us the dmesg from the failure. That will give us
better info about ioremaps.
Thanks,
Venki
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-16 03:38:32.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-16 05:16:28.000000000 -0800
@@ -150,6 +150,8 @@
void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
{
+ printk(KERN_DEBUG "ioremap_nocache: addr %lx, size %lx\n",
+ phys_addr, size);
return __ioremap(phys_addr, size, _PAGE_UC);
}
EXPORT_SYMBOL(ioremap_nocache);
Index: linux-2.6.git/include/asm-x86/io_64.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/io_64.h 2008-01-16 03:38:32.000000000 -0800
+++ linux-2.6.git/include/asm-x86/io_64.h 2008-01-16 05:16:57.000000000 -0800
@@ -154,6 +154,8 @@
static inline void __iomem * ioremap (unsigned long offset, unsigned long size)
{
+ printk(KERN_DEBUG "ioremap: addr %lx, size %lx\n",
+ offset, size);
return __ioremap(offset, size, 0);
}
--
Normally if there is a conflict there should be a printk (or at least it was so in the original mattr code if you haven't changed it) -Andi --
Yes. Printks are there. But are with KERN_DEBUG now. We should change them to WARNING atleast. Thanks, Venki --
I'm pretty sure they were without KERN_* originally. Another reason why the checkpatch.pl KERN_* warnings suck -- the original state would have been better and I bet you changed it just to shut up the dumb scripts. -Andi --
Attached is the boot.log with x86/mm as of today (v2.6.24-rc8-720-gd294e9e).
For the failed devices I get:
sata_sil 0000:00:12.0: version 2.3
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22
ioremap_nocache: addr c0403000, size 200
swapper:1 conflicting cache attribute c0403000-c0404000 uncached<->default
ACPI: PCI interrupt for device 0000:00:12.0 disabled
and
Advanced Linux Sound Architecture Driver Version 1.0.15 (Tue Nov 20 19:16:42
2007 UTC).
ACPI: PCI Interrupt 0000:00:14.5[B] -> GSI 17 (level, low) -> IRQ 17
ioremap_nocache: addr c0403800, size 100
swapper:1 conflicting cache attribute c0403000-c0404000 uncached<->default
AC'97 space ioremap problem
ACPI: PCI interrupt for device 0000:00:14.5 disabled
ATI IXP AC97 controller: probe of 0000:00:14.5 failed with error -5
Grepping for ioremap/iounmap gives:
<snip>
ioremap: addr 77e72d10, size 6ad8
ioremap: addr 77e79982, size 544
ioremap: addr 77e7afc0, size 40
ioremap: addr c0403104, size fc
ioremap: addr 77e7ade1, size 3
ioremap: addr 77e7af04, size 1
ioremap: addr 77e7985c, size f4
ioremap: addr 77e79950, size 32
ioremap: addr 77e79ec6, size c0
ioremap: addr 77e79f86, size 7a
ioremap: addr 77e7af74, size 48
ioremap_nocache: addr c0400000, size 1000
ioremap_nocache: addr c0401000, size 1000
ioremap_nocache: addr c0402000, size 1000
ioremap_nocache: addr c0100000, size 80
ioremap_nocache: addr c0403000, size 200
ioremap_nocache: addr c0402000, size 1000
ioremap_nocache: addr c0400000, size 1000
ioremap_nocache: addr c0401000, size 1000
ioremap_nocache: addr c0403800, size 100
I guess the conflict for sata is
ioremap: addr c0403104, size fc
ioremap_nocache: addr c0403000, size 200
But where does the conflict for the sound card
(ioremap_nocache: addr c0403800, size 100)
come from?
And what can I do about conflicting regions?
Regards,
Andreas
Sorry, forget this dumb question. Its the same page as above. Andreas --
hm, is the problem that the two devices share the same physical page,
and thus get an overlapping area?
as an intermediate fix, how about following the attribute of the already
existing mapping, instead of rejecting the ioremap due to the conflict?
I.e. something like below?
Ingo
---
arch/x86/mm/pat.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: linux-x86.q/arch/x86/mm/pat.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pat.c
+++ linux-x86.q/arch/x86/mm/pat.c
@@ -174,7 +174,12 @@ int reserve_mattr(u64 start, u64 end, un
current->comm, current->pid,
start, end,
cattr_name(attr), cattr_name(ml->attr));
- err = -EBUSY;
+ /*
+ * Force the already existing attribute:
+ */
+ ma->attr = ml->attr;
+ if (*fattr)
+ *fatt = ml->attr;
break;
}
} else if (ml->start >= end) {
--
The correct behaviour probably would be to go with the most restrictive caching behaviour, i.e. uncached in this case. -hpa --
yeah. Or, to be on the safest side, forcing UC in this case. We'll have
a warning message anyway, so it wont go unnoticed - but we wont break
drivers.
Ingo
--------->
Subject: x86: patches/pat-conflict-fixup.patch
From: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/pat.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: linux-x86.q/arch/x86/mm/pat.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pat.c
+++ linux-x86.q/arch/x86/mm/pat.c
@@ -174,7 +174,12 @@ int reserve_mattr(u64 start, u64 end, un
current->comm, current->pid,
start, end,
cattr_name(attr), cattr_name(ml->attr));
- err = -EBUSY;
+ /*
+ * Force UC on a conflict:
+ */
+ ma->attr = _PAGE_UC;
+ if (*fattr)
+ *fattr = _PAGE_UC;
break;
}
} else if (ml->start >= end) {
--
or the one below. (it even builds)
Ingo
---
arch/x86/mm/pat.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: linux-x86.q/arch/x86/mm/pat.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pat.c
+++ linux-x86.q/arch/x86/mm/pat.c
@@ -174,7 +174,12 @@ int reserve_mattr(u64 start, u64 end, un
current->comm, current->pid,
start, end,
cattr_name(attr), cattr_name(ml->attr));
- err = -EBUSY;
+ /*
+ * Force the already existing attribute:
+ */
+ ma->attr = ml->attr;
+ if (*fattr)
+ *fattr = ml->attr;
break;
}
} else if (ml->start >= end) {
--
Yes.
Meanwhile I have figured out that it is some ACPI stuff that maps the page cached.
I've changed the ioremap's in drivers/acpi/osl.c to ioremap_nocache.
See attached patch.
Now the machine boots without conflicts.
ACPI: EC: Look up EC in DSDT
ioremap_nocache: addr c0403104, size fc
ioremap_nocache: addr 77e7ade1, size 3
ioremap_nocache: addr 77e7af04, size 1
...
sata_sil 0000:00:12.0: version 2.3
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22
ioremap_nocache: addr c0403000, size 200
scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 mmio m512@0xc0403000 tf 0xc0403080 irq 22
ata2: SATA max UDMA/100 mmio m512@0xc0403000 tf 0xc04030c0 irq 22
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
...
I guess it is not a good idea to use an existing cachable attribute if the
IO-region is non-prefetchable. And in this example there are 3 devices
which are potentially affected:
00:12.0 IDE interface: ATI Technologies Inc 4379 Serial ATA Controller (rev 80) (
...
Memory at c0403000 (32-bit, non-prefetchable) [size=512]
...
00:14.0 SMBus: ATI Technologies Inc IXP SB400 SMBus Controller (rev 82)
...
Memory at c0403400 (32-bit, non-prefetchable) [size=1K]
...
00:14.5 Multimedia audio controller: ATI Technologies Inc IXP SB400 AC'97 Audio Controller (rev 80)
...
Memory at c0403800 (32-bit, non-prefetchable) [size=256]
...
BTW, is there a need for osl.c to map all regions as cached?
Regards,
Andreas
---
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 1f1ec4a..175e6a4 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -221,7 +221,7 @@ void __iomem *acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
/*
* ioremap checks to ensure this is in reserved space
*/
- return ioremap((unsigned long)phys, size);
+ return ioremap_nocache((unsigned long)phys, size);
else
return ...btw., there's a change i did in today's x86.git: _all_ the old BIOS data accesses now go through early_ioremap(). This cleaned up the boot code quite significantly, as it's much more apparent now when we access a BIOS data table. (it also solves the problem when BIOS data pages are in reserved areas that we map via UC or dont map at all) the same happens with all ISA ioremaps as well - no more "low 1MB is treated special" exceptions. [ This also solves the 'EFI puts data pages into really high memory we dont have mapped yet' category of problems that BIOS writers are apparently busy creating right now ;-) ] the downside is that old linear-mapped assumptions might now result in an early fault - boot with earlyprintk=vga or earlyprintk=serial,ttyS0,115200. I fixed most such assumptions already and booted an allyesconfig kernel on both 32-bit and 64-bit x86, but a few more remain still. I've enhanced the early fault printout code as well to make it easier to debug such things, so it should be relatively easy to find the rest. Ingo --
But then, this will cause an attribute conflicit. Old one was specifying WB in PAT (ioremap with noflags) and the new ioremap specifies UC. As Linus mentioned, main problem is to figure out the correct attribute for ioremap() which doesn't specify the actual attribute to be used. One mechanism to fix the issue generically (somewhat atleast) is to use MTRR's and figure out the default MTRR attribute for that physical address In this scenario, ACPI is using ioremap() leaving some dangling references. Venki is looking to fix this code. Getting the attribute for MTRR for ioremap noflags, might solve some of these issues aswell. Will look into this. thanks, suresh --
i think the problem is the proximity of some ACPI tables to actual
device mmio areas - they share the same physical page. The ACPI tables
how would this solve the problem at hand? I dont think it's possible to
guarantee that all the BIOS data pages and mmio areas will have
compatible attributes. BIOS data pages might be in plain RAM that we
intend to map WB. Or they might be in reserved areas near the mmio
addresses.
but if we fixed up aliases (only for that single conflicting page), so
that all mappings are degraded to UC, we'd have uniform behavior all
ok. Resolving that would be nice anyway because the ACPI table might be
in plain RAM which might be reused by the kernel later on, etc. FYI,
there's also the patch from Yinghai Lu on lkml, for one such dangling
reference problem in the SRAT table.
Ingo
---------------->
From: Yinghai Lu <Yinghai.Lu@Sun.COM>
Subject: [PATCH] x86: copy srat table and unmap in acpi_parse_table
[PATCH] x86: copy srat table and unmap in acpi_parse_table
the old acpi_numa_slit_init was saving old address in early stage acpi_slit
and acpi_parse_table can not unmap address that.
the patch copy the slit in the callback,
so we could unmap table in acpi_parse_table instead of outside track it.
need to revert
"
commit d8d28f25f33c6a035cdfb1d421c79293d16e5c58
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Jan 17 15:26:42 2008 +0100
x86: ACPI: fix mapping leaks
ioremap_early() is stateful, hence we cannot tolerate mapping leaks.
"
before appling this patch
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Index: linux-2.6/arch/x86/mm/srat_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -23,7 +23,9 @@
int acpi_numa __initdata;
-static struct acpi_table_slit *acpi_slit;
+static int slit_copied;
+static u64 slit_locality_count;
+static u8 slit_entry[MAX_NUMNODES * MAX_NUMNODES];
static ...Yes, we must fix all aliases or reject the conflicting mapping. But fixing all aliases might not be that easy. (I've just seen a panic when using your patch ;-( Andreas --
yes, indeed my patch is bad if you have PAT enabled: conflicting cache attributes might be present. I'll go with your patch for now. should we perhaps do UC by default for early_ioremap() as well? Normally those mappings are only temporary - but in case of a leak they might hang around in the pagetables and the CPU might stumble upon them. Also, should early_iounmap() do a wbinvd() [/clflush()] call as well, to be safe? Ingo --
I think the best is to just reject conflicting mappings. (Because now I am too tired to think about a safe way how to change the aliases to the most restrictive memory type. ;-) But then of course such boot-time problems like I've seen on my test machines should be avoided somehow. Andreas --
Below is another potential fix for the problem here. Going through ACPI
ioremap usages, we found at one place the mapping is cached for possible
optimization reason and not unmapped later. Patch below always unmaps
ioremap at this place in ACPICA.
Thanks,
Venki
Index: linux-2.6.git/drivers/acpi/executer/exregion.c
===================================================================
--- linux-2.6.git.orig/drivers/acpi/executer/exregion.c 2008-01-17 03:18:39.000000000 -0800
+++ linux-2.6.git/drivers/acpi/executer/exregion.c 2008-01-17 07:34:33.000000000 -0800
@@ -48,6 +48,8 @@
#define _COMPONENT ACPI_EXECUTER
ACPI_MODULE_NAME("exregion")
+static int ioremap_cache;
+
/*******************************************************************************
*
* FUNCTION: acpi_ex_system_memory_space_handler
@@ -249,6 +251,13 @@
break;
}
+ if (!ioremap_cache) {
+ acpi_os_unmap_memory(mem_info->mapped_logical_address,
+ window_size);
+ mem_info->mapped_logical_address = 0;
+ mem_info->mapped_physical_address = 0;
+ mem_info->mapped_length = 0;
+ }
return_ACPI_STATUS(status);
}
--
Applying and compiling your patch I see: CC drivers/acpi/executer/exregion.o drivers/acpi/executer/exregion.c: In function 'acpi_ex_system_memory_space_handler': drivers/acpi/executer/exregion.c:81: warning: 'window_size' may be used uninitialized in this function After glancing through this file it seems that ioremap_cache is always 0 and acpi_os_unmap_memory will unconditionally be executed at end of this function. I am not familiar with that code. But I just want to reinsure that this is what you want. And if so, why is that variable needed? But maybe I missed something ... (I'll test it tomorrow, or I better should say later today.) Andreas --
I missed that warning. But should not matter for testing this patch as we always initialize window_size with the patch. Yes. The variable is not needed. With patch I always map at the beginning of this function and unmap at the end. I just kept the variable as I was planning to add a boot option to control this initially. But, later decided to keep the test patch simple without any boot option. We can come up with a better patch once we know that the test patch helps. Thanks, Venki --
Andreas, Could you also try the patch Suresh Siddha sent out yesterday. That covers the case where the attribute was not getting removed even after unmap was called. Thanks, Venki --
An easy way for you to figure out if our patch will solve your problem is this, look for any quirks for your device in drivers/pci/quirks.c and or architecture specific quirks file. If you see your device in there, then our patch is likely to solve your problem. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
This is the matrix the CPU uses when combining MTRR and PAT behaviour.
It probably makes sense to mimic:
| WB WT WC UC
---+---------------
WB | WB WT WC UC
WT | WT WT UC UC
WC | WC UC WC UC
UC | UC UC UC UC
With the current PAT encoding:
WB = 00
WT = 01
WC = 10
UC = 11
... this is simply a bitwise OR. This makes sense, since one of the
bits denies delaying writes (WT, UC), and the other denies delaying
reads (WC, UC).
-hpa
--
Almost. There is a specific case and important where MTRR UC + page table WC == WC. But yes. For ioremap where we are WB + MTRR == MTRR we need to request the same attributes as the e820 map, to get the attribute checking correct. Eric --
True; however, that shouldn't be followed for the case of conflicting attempts at mapping. Now, I *believe* it is safe to have some mappings UC and some WC. This is also something to keep in mind (there are legitimate applications for that particular form of aliasing, too.) If so, we may not want to thump at those. -hpa --
In this case the correct attribute is the one of the underlying MTRR. And if it conflicts with some other mapping that overrides an MTRR the driver was always broken and it should probably error out and be reevaluated/fixed. -Andi --
Hmm, early_ioremap_debug exists only in ioremap_32.c Have to adapt the 64-bit version first. But wait the 64-bit code contains already debug output for this. See the boot-logs that I have attached to my previous mails. (Interestingly the code for 64-bit early_io(re/un)map resides not in ioremap_64.c but in init_64.c.) Andreas --
Ok, here is the result: sata_sil 0000:00:12.0: version 2.3 ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22 ioremap_nocache: addr c0403000, size 200 swapper:1 conflicting cache attribute c0403000-c0404000 uncached<->default Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff8102905d>] ? reserve_mat 1a5/0x221 PGD 0 Oops: 0000 [1] SMP CPU 3 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.24-rc8-gd294e9ed-dirty #1 RIP: 0010:[<ffffffff8102905d>] [<ffffffff8102905d>] ? reserve_mattr+0x1a5/0x221 RSP: 0018:ffff810077581c60 EFLAGS: 00010282 RAX: 000000000000004e RBX: ffff8100775a7a00 RCX: 0000000000004c12 RDX: 000000000000a9a9 RSI: 0000000000000018 RDI: ffffffff8153bed4 RBP: 0000000000000000 R08: ffffffff81540fe7 R09: ffffffff81329d70 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000c0404000 R13: 0000000000000018 R14: 00000000c0403000 R15: 00000000c0403000 FS: 0000000000000000(0000) GS:ffff8100775d6bc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff810077580000, task ffff810077564790) Stack: ffffffff81411900 0000000000001000 0000000000001000 00000000c0404000 ffffc200008ac000 00000000c0403000 ffff8100775a7a40 ffffffff810281e9 0000000000000018 0000000000000005 ffff810077631680 ffff8100777b7800 Call Trace: [<ffffffff810281e9>] __ioremap+0xc2/0x11a [<ffffffff8114a6b0>] pcim_iomap+0x43/0x53 [<ffffffff8114a74f>] pcim_iomap_regions+0x8f/0x104 [<ffffffff811fba72>] sil_init_one+0xb0/0x1eb [<ffffffff81150f98>] pci_device_probe+0xd1/0x138 [<ffffffff811a4d9c>] driver_probe_device+0xe1/0x16a [<ffffffff811a4f6d>] __driver_attach+0x90/0xcd [<ffffffff811a4edd>] __driver_attach+0x0/0xcd [<ffffffff811a4edd>] __driver_attach+0x0/0xcd ...
for now i applied your ioremap_uncached() patch and removed my patch. my patch might work if the MTRR marks that area UC. Does it on your system? if the MTRRs (as set up by the BIOS) keep it at WB, then the ACPI ioremap() is already unsafe: the mmio area that happens to be there might be prefetched by the CPU. Ingo --
