I will post the existing patches in batches for closer review. Can be already all viewed at ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/ Highlights: - Using %fs instead of %gs for PDA register (Jeremy F.) - Removed the old static rom probing on x86-64. We really trust e820 now. This is slightly experimental still. We'll see how it does. - uncached copy user Not a clear win, but also not a loss. Follow i386. - Fix 32bit EFI with regparms. - Fix boot slowdown as VT guest (Zach Amsden) - VMI for paravirtualized VMware * first paravirt ops client * still some changes missing which need more work - Various changes in APIC routing setup (Ingo Molnar) - Various changes in mmconfig handling (Olivier Galibert, OGAWA Hirofumi, me) * Share code between i386/x86-64 * Be more aggressive at ignoring bogus MCFG tables * Now supports white lists for some chipsets (currently only Intel 945/915) - Some patches for the upcomming AMD Family10 CPUs. - More init section reference fixes from Vivek - Some preparation patches for Perfmon - New NUMA hash function for x86-64 and related changes (Amul Shah) - Support a trigger on machine check events on x86-64 - NMI watchdog fixes for Core2 from Venkatesh - Fix a HPET timer calibration issue on systems with long SMM events at boot (Jack Steiner) - Fix compat a.out signals on x86-64 - Lots of small stuff Not included yet. Might or might not make .21: - Solution for Nvidia IOMMU corruptions. We could default to iommu=soft for nvidia, but I was still hoping for a workaround from the hardware vendors. - vDSO support (still trouble with newer toolkits) - Xen paravirt ops support from Jeremy/Chris - Dynamic command line from Bernhard Walle - Fast getcpu from Dean (requires vDSO) - Rewritten RAID XOR functions - Fixes for empty nodes from mm - Fake node improvements for x86-64 - New dynamic IRQ 0 probing to work around all chipset issues - lguest * still seems heavily in development. Not sure it will ...
Andi, please make sure to also include this Calgary patch I sent on
Feb 6th.
Thanks,
Muli
Subject: x86-64 Calgary: robustify bad_dma_address handling
From: Muli Ben-Yehuda <muli@il.ibm.com>
- set bad_dma_address explicitly to 0x0
- reserve 32 pages from bad_dma_address and up
- WARN_ON() a driver feeding us bad_dma_address
Thanks to Leo Duran <leo.duran@amd.com> for the suggestion.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Leo Duran <leo.duran@amd.com>
Cc: Job Mason <jdmason@kudzu.us>
---
arch/x86_64/kernel/pci-calgary.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86_64/kernel/pci-calgary.c b/arch/x86_64/kernel/pci-calgary.c
index 3d65b1d..04480c3 100644
--- a/arch/x86_64/kernel/pci-calgary.c
+++ b/arch/x86_64/kernel/pci-calgary.c
@@ -138,6 +138,8 @@ static const unsigned long phb_debug_off
#define PHB_DEBUG_STUFF_OFFSET 0x0020
+#define EMERGENCY_PAGES 32 /* = 128KB */
+
unsigned int specified_table_size = TCE_TABLE_SIZE_UNSPECIFIED;
static int translate_empty_slots __read_mostly = 0;
static int calgary_detected __read_mostly = 0;
@@ -296,6 +298,16 @@ static void __iommu_free(struct iommu_ta
{
unsigned long entry;
unsigned long badbit;
+ unsigned long badend;
+
+ /* were we called with bad_dma_address? */
+ badend = bad_dma_address + (EMERGENCY_PAGES * PAGE_SIZE);
+ if (unlikely((dma_addr >= bad_dma_address) && (dma_addr < badend))) {
+ printk(KERN_ERR "Calgary: driver tried unmapping bad DMA "
+ "address 0x%Lx\n", dma_addr);
+ WARN_ON(1);
+ return;
+ }
entry = dma_addr >> PAGE_SHIFT;
@@ -656,8 +668,8 @@ static void __init calgary_reserve_regio
u64 start;
struct iommu_table *tbl = dev->sysdata;
- /* reserve bad_dma_address in case it's a legal address */
- iommu_range_reserve(tbl, bad_dma_address, 1);
+ /* reserve EMERGENCY_PAGES from bad_dma_address and up */
+ iommu_range_reserve(tbl, bad_dma_address, EMERGENCY_PAGES);
/* avoid the ...It's already included ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/robustify-bad_dma_address-handling -Andi -
Thanks, I scanned the list for 'calgary' and missed it. I'm guessing the Calgary got dropped because the Subject had 'x86-64 Calgary:' and your scripts trimmed the prefix? Cheers, Muli -
Andi Kleen wrote at LKML: ... Was it seen in canonical patch format on a mailinglist before? Is it Bernhard Kaindl's ohci1394_early? http://www.suse.de/~bk/firewire/ Would be good to put this on the usual patch-submission road in order to prep it for 2.6.22. Could be handled via linux1394-2.6.git, although a different channel where the actual users of this facility watch would IMO be more appropriate. I also have suggestions, at least WRT Bernhard's code: - The Kconfig option could go into the "Kernel hacking" submenu rather than the IEEE 1394 submenu. (The driver source should stay in drivers/ieee1394.) - Leave a note in the Kconfig help how it is typically used, i.e. what is required on the remote terminal side, where to find firescope, fireproxy etc. and assorted HOWTOs. - Indicate in the Kconfig help that only a 4GB address range is made visible this way. A mostly unrelated note: A simple to set up remote-dmesg utility would be nice to have on the terminal side. Maybe a small ieee1394 high-level driver which gives hints on the location of the dmesg buffer via configuration ROM would be warranted. Or is it feasible to find the dmesg buffer by plain memory analysis? -- Stefan Richter -=====-=-=== --=- -=-=- http://arcgraph.de/sr/ -
It's more related to arch code than firewire, so I thought i would handle it. But you can too if you want. It definitely needs much ftp://ftp.firstfloor.org/pub/ak/firescope/ -Andi -
It's better you do it. I don't know anything about the specifics of early initialization. Just Cc linux1394-devel on submission so that we can have a look at the OHCI-1394 related bits. Using linux1394-2.6.git could only be helpful if more code sharing between ohci1394.c and ohci1394_early.c would be desired. That's Thanks, I wasn't aware that firescope actually does fill this gap. -- Stefan Richter -=====-=-=== --=- -=-=- http://arcgraph.de/sr/ -
Ok. I'm sure you know more about 1394 than me so I guess it will be shared review. -Andi -
[ohci1394_early] Some remarks to the September 2006 version at http://www.suse.de/~bk/firewire/ : - Seems its .remove won't work properly if more than one OHCI-1394 controller is installed. And it's .probe isn't reentrant, but that might be less of a problem. - Its functionality will be lost if there is a FireWire bus reset, e.g. when something is plugged in or out. To keep physical DMA alive, an interrupt handler had to be installed which writes ~0 to OHCI1394_PhyReqFilter{Hi,Lo}Set. Can interrupt handlers be registered in an early setup stage? - There might be some register accesses in the setup which could be omitted; I'd have to look this up. - Could be optimized to not use ohci1394.h::struct ti_ohci. - PCI_CLASS_FIREWIRE_OHCI can be replaced by include/linux/pci_ids.h::PCI_CLASS_SERIAL_FIREWIRE_OHCI which was newly added in 2.6.20-git#. - I suppose .probe should check for PCI_CLASS_SERIAL_FIREWIRE_OHCI instead of PCI_CLASS_SERIAL_FIREWIRE. - How about dropping support for configuring this as module, to simplify the code? Unless this would interfere with ohci1394; and it probably would if there was an interrupt handler... - "depends on X86_64" is missing in Kconfig. - Maybe put it into arch/x86_64/drivers/ instead of drivers/ieee1394? - Plus what I mentioned earlier in the thread. I could send code to address some of this at next weekend or later. -- Stefan Richter -=====-=-=== --=- -=-=- http://arcgraph.de/sr/ -
I'd like to have that on ppc as well, so I'd rather keep it in drivers/ I agree that it doesn't need to be a module. If you can load modules, then you can load the full ohci driver. Thus, if it's an early thingy initialized by arch, it can export a special "takeover" hook that the proper ohci module can then call to override it (important if we start having an irq handler). Andi, also, how do you deal with iommu ? Not at all ? :-) Ben. -
This will need some abstraction at least -- there are some early mapping hacks Yes -- it's really early debugging hack mostly. It's reasonable to let the iommu be disabled (or later a special bypass can be added for this) -Andi -
Either abstraction or ifdef's .. we have ioremap working very early on Ok. Ben. -
Hi,
I just wanted to let you know that I'll have picked up the early
firewire patch again and cleaned it up very much so that it should
be ready to submit it and but it on the patch-submission road.
What's left to do is to write some HOWTO like Stefan describes
below, but I'll try to get that done soon.
I've also started working on the userspace tools and got firescope
to work across the 32/64-bit machines (both directions), there is
one hack (which I should do in a clean way insteat) in that patch
of which I do not know if that works in ppc/ppc64 but I could look
at it if needed or send the patch to Benjamin for adding support
for ppc64, to do it properly, we'll probably need an target architecture
option in firescope and as I do not know if it's needed by Benjamin,
I left out ppc64 for now.
I have just had the guts to explore __fast__ memory dumping over
firewire for full-system dumps (reading quadlets is __painfully__
show if you want to read 2GB of memory over the bus, you only get
about some some kilobytes each second) using raw1394_start_read()
to allow also block reads instead of just quatlet reads.
The biggest block size that worked here was 2048 bytes, which was
enough to get nearly 10MB/s of data transfer rate from the remote
memory to disk. Dumping 2GB of remote memory was just a matter of
about 3 few short minutes which quickly ran by.
Afterwards, the victim was dead (I excluded the low MB of memory,
so something else must have caused this), at least the start of
the dump looked well, but I haven't tested the error handling yet,
but I'll send you the tool (I called it firedump) soon.
Bernhard
--
Did you run into the PCI memory hole below 4GB? I suppose the best way would be to require a System.map and then read e820.nr_map/e820.map[] and only dump real memory. -Andi --
Thanks, that was it most likely. I checked and haw that I did Yes, good idea! Bernhard --
The maximum payload size of block requests depends on three things: 1. speed of the connection between the two nodes (debugged machine and debugging machine), 2. link layer controllers of the two nodes, 3. software on the debugging machine. 1.) S100: 512 bytes, S200: 1024 bytes, S400: 2048 bytes, S800 and more: 4096 bytes. 2.) Controllers on CardBus cards are limited to 1024 bytes payload of asynchronous packets, for reasons I don't know. The other available controllers only have the above mentioned speed-dependent limit. 3.) The ohci1394 driver has an implementation limitation which requires that all packets including headers don't exceed PAGE_SIZE. This does not affect the packets which go through the physical response unit (which they do on the debugged machine) but it affects the debugging machine. A quick note to this text from Bus resets are also caused by bus managing software, which Linux' old and new FireWire stacks and the stacks of all other FireWire capable desktop OSs are to varying degrees. I wonder if the following could happen: The two PCs are directly connected, only the PHY of the debugging PC is active, then the PHY of the debugged PC is activated, becomes root node, debugging PC examines the bus, then resets the bus to force its own PHY to become root node in order to get a working isochronous resource manager. This bus reset would switch remote DMA on the debugged PC off. -- Stefan Richter -=====-=-=== ==-- --=-- http://arcgraph.de/sr/ --
PS: If there are only 1394a nodes, you can read the connection speed from the speed map registers on the debugging machine. It becomes difficult for 1394b nodes and some mixtures of 1394a and b nodes. But consumer 1394b hardware is always S800 capable. Consumer 1394a hardware is always S400 capable. Except consumer camcorders which are AFAIK typically limited to S100, but they have only one port and will therefore never sit between debugging and debugged PC. So it's not as difficult in 99.9% of the cases: You can expect S400, some people might get S800. But you can't use the bigger S800 block size if the debugging This is relevant if the debugging or the debugged PC have a CardBus card. Actually this is the limit of all 1394a CardBus card I have seen; I don't know about 1394b CardBus cards, or Express cards. (I suppose Express cards have no limits of this kind, as they are just PCIe cards in a different formfactor.) This payload limitation of the link can be read from the bus info blocks of the debugged and the debugging machine. Though I am not entirely sure right now if the ohci1394_earlyinit driven machine will have its bus info block properly set up. -- Stefan Richter -=====-=-=== ==-- --=-- http://arcgraph.de/sr/ --
PPS: Today I had a brief look through your current sources. Look good so far, except for a curious hunk which patches drivers/Makefile. And I can't say anything to the x86 platform related and PCI related aspects of the driver. -- Stefan Richter -=====-=-=== ==-- --=-- http://arcgraph.de/sr/ --
From what I gather from e.g. drivers/usb/Makefile, you could do it also this way: obj-$(CONFIG_IEEE1394) += ieee1394/ +obj-$(CONFIG_PROVIDE_OHCI1394_DMA_INIT) += ieee1394/ -- Stefan Richter -=====-=-=== ==-- --==- http://arcgraph.de/sr/ --
Hi,
after summing up the discussion on previous patches, I'm now submitting the
patch below for formal review and adoption on branches for mainline inclusion.
As this patch is X86-only for now (might be extended for more architectures in
incremental patches tough), initially (that was in February 2007) Andi Kleen
offered to handle the patch submission.
Subject of this patch:
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is ...I will look a bit more into the details later; for now I have just a There are some really long sentences in it. In other words, the FireWire controller can be configured to work as a bus bridge between FireWire bus and local bus (PCI or PCIe). And yes, there are security implications. This is correct for ohci1394, and it's a bug. http://bugzilla.kernel.org/show_bug.cgi?id=7794 firewire-ohci however implements filtered physical DMA. The only user of that is the firewire-sbp2 driver which grants SBP-2 targets access through the physical response unit. firewire-ohci has at the moment no option for either unfiltered physical DMA (as needed by firescope at al, tracked at http://wiki.linux1394.org/ToDo) nor to completely disable physical DMA. -- Stefan Richter -=====-=-=== ==-- --==- http://arcgraph.de/sr/ --
Yes, that's the case. The PHY of the debugging PC must be active before the booting a kernel with the early OHCI-1394 initialisation on the debugged machine. I tried to to instruct users to do exactly that in that part of the help text, but changed the wording now in to hope to make it clearer: + [...] be sure to have the cable plugged and FireWire enabled on + the debugging host before booting the debug target for debugging. and I added new file, installed as Documentation/debugging-via-ohci1394.txt to give a more detailed HOWTO with links to all available tools and a step by step guide on how to create a setup for using firescope. ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff I'll submit that patch for full review in my next mail. Bernhard --
How would you define ready? It's currently useful and stable, and features a lack of enterprise-class complexity. - James -- James Morris <jmorris@namei.org> -
I've been working on it for some weeks. At this stage, it's also useful for some simple kernel hacking. - James -- James Morris <jmorris@namei.org> -
Well, I only have bug reports from around half a dozen people, so I'm not sure what that says about my userbase (for most people it should simply work). lguest.ozlabs.org got 3000 hits in the last 12 hours, and they can't all be bots 8) Mind you, in that time only 26 unique IP addresses visited the patches/ repository, so maybe they are... As to "insufficient review", the reviews so far have cleaned up some code (great!) and found 3 actual bugs, none real showstoppers and all now fixed: (1) race of initialization code vs. cpu hotplug. (2) block driver being suboptimal (3) network driver sending crap for inter-guest sendfile. In addition, you counted not handling TSC change as a bug, so I tore that code out, instead of leaving a FIXME. During the cleanup patches I did introduce (and then fix) another bug, it is true. I would not describe it as "heavily in development"; I really think it's Please don't harass my users. That's my job! Cheers! Rusty. -
