Hi Linus, This is a new tree, I had a conflict with your latest tree due to some trivial cleanups you merged. I've added the fix for CVE-2008-3831 which is unembargoed. Please pull the 'drm-next' branch from ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-next This contains two patches outside the DRM git tree to add exports for GEM functionality while we await Nick Piggins vmap and shmem changes. This update contains the following. 1. CVE-2008-3831 - fixes for memset on arbitrary memory via i915 on G33 hw and above. 2. Intel GEM memory manager . 3. Opregion support for Intel Integrated chipsets. 4. Updated radeon driver with support for new chipsets + bugfixes 5. vblank reworking to save power and interrupts on intel/radeon GPUs. Dave. arch/x86/mm/highmem_32.c | 1 + drivers/gpu/drm/Kconfig | 3 +- drivers/gpu/drm/Makefile | 5 +- drivers/gpu/drm/drm_agpsupport.c | 52 +- drivers/gpu/drm/drm_cache.c | 69 + drivers/gpu/drm/drm_drv.c | 6 + drivers/gpu/drm/drm_fops.c | 8 +- drivers/gpu/drm/drm_gem.c | 421 ++++++ drivers/gpu/drm/drm_irq.c | 464 +++++- drivers/gpu/drm/drm_memory.c | 2 + drivers/gpu/drm/drm_mm.c | 5 +- drivers/gpu/drm/drm_proc.c | 135 ++- drivers/gpu/drm/drm_stub.c | 11 +- drivers/gpu/drm/drm_sysfs.c | 2 +- drivers/gpu/drm/i915/Makefile | 7 +- drivers/gpu/drm/i915/i915_dma.c | 332 +++-- drivers/gpu/drm/i915/i915_drv.c | 476 +------ drivers/gpu/drm/i915/i915_drv.h | 1180 +++++----------- drivers/gpu/drm/i915/i915_gem.c | 2558 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_debug.c | 201 +++ drivers/gpu/drm/i915/i915_gem_proc.c | 292 ++++ drivers/gpu/drm/i915/i915_gem_tiling.c | 257 ++++ drivers/gpu/drm/i915/i915_irq.c | 514 +++++-- ...
Grr.
This whole merge series has been full of people sending me UNTESTED CRAP.
So what's the excuse _this_ time for adding all these stupid warnings to
my build log? Did nobody test this?
drivers/gpu/drm/drm_proc.c: In function ‘drm_gem_one_name_info’:
drivers/gpu/drm/drm_proc.c:525: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
drivers/gpu/drm/drm_proc.c:533: warning: format ‘%9d’ expects type ‘int’, but argument 4 has type ‘size_t’
drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_gtt_pwrite’:
drivers/gpu/drm/i915/i915_gem.c:184: warning: unused variable ‘vaddr_atomic’
and I wonder how many other warnings got added that I never even noticed
because I don't build them?
And yes, it's not just warnings. One of thse is horribly bad code:
nid->len += sprintf(&nid->buf[nid->len],
"%6d%9d%8d%9d\n",
obj->name, obj->size,
atomic_read(&obj->handlecount.refcount),
atomic_read(&obj->refcount.refcount));
where it's just wrong to use the field width as a separator. Because if
the counts are big enough, the separator suddenly goes away!
So that format string should be
"%6d %8zd %7d %8d\n"
instead. Which gives the same output when you don't overflow, and doesn't
have the overflow bug when you do.
As to that "vaddr_atomic" thing, the warning would have been avoided if
you had just cleanly split up the optimistic fast case.
IOW, write cleaner code, and the warning just goes away on its own.
Something like the appended. UNTESTED!
Hmm?
I really wish people were more careful, and took more pride in trying to
write readable code, with small modular functions instead. And move those
variables down to the block they are needed in.
Anyway, I pulled the thing, but _please_ test this cleanup and send it
back to me if it passes your testing. Ok?
Linus
---
drivers/gpu/drm/drm_proc.c | ...pects type =E2=80=98int=E2=80=99, but argument 3 has type =E2=80=98size_t= xpects type =E2=80=98int=E2=80=99, but argument 4 has type =E2=80=98size_t= Yeah, none of us are on x86-64, so we missed those warnings in testing. The change looks pretty good except for s/return 1/return 0/. We wanted to pull the i_wish_it_was_ioremap_atomic() hack out so that we could use it for relocation updates as well (which aren't copy_from_user), and I've started writing that patch. Expect something pushed at you soon. --=20 Eric Anholt eric@anholt.net eric.anholt@intel.com
We're lazy, perhaps even lazier than yourself. Given that the whole goal is to essentially ignore the CPU and get our code running on the GPU, it's hard to get excited about the kind of kernel we're running on the CPU. We've got a bunch of test boxes that run 64-bits, unfortunately, the people doing builds there appear not to care about warnings. That will get fixed. I've also got this new G45 box here; perhaps it's time to enter the 21st century and run it in native 64-bit mode. It's scary though; I've been writing window systems for 32-bit machines for thirty years now. --=20 keith.packard@intel.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Indeed. I have been running 64 bit builds for quite a while now, and merely ignored the warnings as "not my problem." In the future, I shall make it my problem. ~ C. - -- ~ Corbin Simpson <MostAwesomeDude@gmail.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkj5hecACgkQeCCY8PC5utAMAQCggtkBoolO58rW5PnPkYTosyZ5 DkgAnAqRSZzoFhgkdFcL92qV6Zyc7usJ =vpMP -----END PGP SIGNATURE----- --
Writing 3D drivers means running 3D games. Running 3D games unfortunately means running a lot of 32-bit userland as the fun stuff is binary-only. So I stick to a 32-bit system, becuase past experience trying to run both on the same system has been misery. --=20 Eric Anholt eric@anholt.net eric.anholt@intel.com
You can run 32bit user-space on a 64bit kernel just fine. Any breakage, if anything, you find will need to get fixed anyway. Also, one would hope Intel is poking these game writers to migrate to 64bit Linux ;-) --
I'd like to second this. 32bit on 64bit works fine. I've been maintaining separate multilib versions of the Gentoo libdrm/Mesa ebuilds for 32bit/64bit. It is important that the 32bit and 64bit graphics stacks are synchronised, rather than relying on external compatibility libraries as Gentoo does currently. Failing to do so does lead to Indeed! :) --
That's not a very good reason, though. We're supposed to be perfectly binary compatible, so it would actually be *better* if you did the testing using a 64-bit kernel. Sure, don't do it on _all_ machines, but running 32-bit user land is definitely not a valid reason to avoid a 64-bit kernel. Quite the reverse. We'd like to see more coverage. Linus --
Actually, I'm on x86_64 pretty much exclusively and saw these warnings last week. But I didn't send a fix (yet); sorry. That said, this code was far from untested, even though it did contain a few compile warnings, so I think Linus's complaint about UNTESTED CRAP was at least half wrong. -- Jesse Barnes, Intel Open Source Technology Center --
The code has been tested on 32 and 64 bit in lots of places, however I don't generally trawl the Fedora kernel build logs, and the amount of warnings we have there, new ones just get lost in the mists.. So its not untested on 64-bit, its just nobody who cared enough saw the compiler warnings. If linux-next had been going for the past few weeks someone would have spotted them, I just got the report yesterday from linux-next after I fed the push request. Perhaps we all need access to the Ingo compile farm. Dave. --
OK. I was hoping that kmap_atomic_pfn thing *never* see the light of mainline ;) Because hopefully the vmap improvements should be OK for 2.6.28 as well... But it's already there. OK, if vmap patches go upstream, then it can be switched over and the export removed before the release... Thanks, Nick --
Eric and I chatted with Venki Pallipadi this week and decided what we
really need is an official API for getting at pfns in our device BAR,
which is what we're (ab)using kmap_atomic_pfn for. This ties in with
some PAT issues as well, where we want to assert that all mappings of
our BAR are in WC mode. The basic plan is to have four new functions
(yes, I'm making up names here):
struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,=20
int bar,=20
int prot);
void io_mapping_free(struct io_mapping *mapping);
void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
This would have the additional effect of making all other mappings of
this BAR be required to match the specific prot, both providing
protection against broken drivers, and providing support for old X
I'd say we should leave things alone for .28 and work on making an
official IO mapping API available.
--=20
keith.packard@intel.com
The important thing is that mappings need to be per-CPU, so the above may work, but only if it's designed so that "io_reserve_pci_resource()" will actually reserve space for 'nr_possible_cpu' page mappings, and then the "io_[un]map_atomic()" functions do per-CPU mappings. Anything else is a disaster, because anything else implies TLB shootdown. And quite frankly, even so, we'd possibly still be _better_ off with just exposing the "kmap_atomic_pfn()" functionality even so. Because quite frankly, your "io_reserve_pci_resource()" infrastructure is going to inevitably be more complex and slower than the rather efficient kmap_atomic_pfn() thing we have. [ The *non-atomic* kmap() functions are fairly high-overhead, in that they want to keep track of cached mappings and remember page addresses etc. So those are the ones we don't want to support for non-HIGHMEM setups. But the atomic kmaps are pretty simple, and really only need some trivial FIXMAP support. We could easily extend it for x86-64, methinks, and do it for x86-32 even when we don't do HIGHMEM. Ingo? ] One small detail: our we currently have "kmap_atomic_pfn()" and "kmap_atomic_prot()", and we really should maek the fundamental core operation be "kmap_atomic_pfn_prot()", and have everything be done in terms of that. Looking at it, it also looks like kmap_atomic_prot() is actually incorrect right now, and doesn't do a "prot" thing for non-highmem pages, but just returns "page_address(page);" Linus --
Actually, a "kmap_atomic_prot_pfn()" has been lurking in the drm repos for some time now, but hasn't been suggested for upstream. It was intended for drivers that require quick in-kernel patching of write-combined io and highmem pages. The latter is a common situation for PCIE graphics devices with their own MMU, so IMHO an exported kmap_atomic_pfn_prot() would be a big help in such cases. --
Yes, of course. The goal is to make it look exactly like kmap_atomic_pfn, but work on non-memory pages even on non-HIGHMEM configs for x86 (we have to use ioremap on those systems today, which is I want it to work just like kmap_atomic_pfn; Venki wanted some way to "know" that no-one else was mapping the pages in non-WC mode. This seemed like a reasonable compromise; the 'mapping' object exists solely to hold the desired prot value to keep all PTEs consistent across the Fortunately, we use kmap_atomic_pfn only for our BAR, which is not a non-highmem page. Right now, we're counting on having our BAR covered by an MTRR register so that we get WC access here. I'd like to have the API expose this so that the driver will work on systems which don't have a spare MTRR for the graphics aperture. --=20 keith.packard@intel.com
agreed, and there's certainly no resistance from the x86 architecture side to extend our mapping APIs in such directions. But i think the direction of the new GEM code is subtly wrong here, because it tries to manage memory even on 64-bit systems. IMO it should just map the _whole_ graphics aperture (non-cached) and be done with it. There's no faster method at managing pages than the CPU doing a TLB fill from pagetables. The only real API need i see is on 32-bit: with a 1GB or 2GB graphics aperture we just cannot map that permanently, so kmap_atomic() is a necessity. We can certainly extend that to non-highmem as well. But this should be an ad-hoc transitionary thing for 32-bit, and on 64-bit we really should not be using any form of kmap. Especially with large vertex buffers or textures, mapping a lot of pages via kmap is not going to be trivial overhead - even if INVLPG is faster than a full TLB flush, it's still on the order of 100-200 cycles - and with a lot of pages that mounts up quickly. And if i understood your workload correctly you want to do tens of thousand of map/unmap/remap events per frame generated - depending on the type of the 3D app/engine. Or am i missing something subtle? Why do you want the overhead of kmap on 64-bit? Ingo --
Yeah, we're stuck thinking that we "can't" map the aperture because it's too large, but with a 64-bit kernel, we should be able to keep it mapped permanently. Of course, the io_reserve_pci_resource and io_map_atomic functions could Yes, this is where exposing an io-specific atomic mapping function will Yeah, data transfer from CPU to GPU is through a pwrite interface, and we perform the transfer within the kernel using map/unmap operations on We don't, but I think it would be nice to have a common API that works across all 32-bit configurations as well as 64-bit systems. --=20 keith.packard@intel.com
okay, so basically what we need is a shared API that does per page
kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the
impression that you were suggesting to extend kmap_atomic() to 64-bit -
which would be wrong.
So, in terms of the 4 APIs you suggest:
struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
int bar,
int prot);
void io_mapping_free(struct io_mapping *mapping);
void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
here is what we'd do on 64-bit:
- io_reserve_pci_resource() would just do an ioremap(), and would save
the ioremap-ed memory into struct io_mapping.
- io_mapping_free() does the iounmap()
- io_map_atomic(): just arithmetics, returns mapping->base + pfn - no
TLB activities at all.
- io_unmap_atomic(): NOP.
it's as fast as it gets: zero overhead in essence. Note that it's also
shared between all CPUs and there's no aliasing trouble.
And we could make it even faster: if you think we could even use 2MB
TLBs for the _linear_ ioremap()s here, hm? There's plenty of address
space on 64-bit so we can align to 2MB just fine - and aperture sizes
are 2MB sized anyway.
Or we could go one step further and install these aperture mappings into
the _kernel linear_ address space. That would be even faster, because
we'd have a constant offset. We have the (2MB mappings aware) mechanism
for that already. (Yinghai Cc:-ed - he did a lot of great work to
generalize this area.)
(In fact if we installed it into the linear kernel address space, and if
the aperture is 1GB aligned, we will automatically use gbpages for it.
Were Intel to support gbpages in the future ;-)
the _real_ remapping in a graphics aperture happens on the GPU level
anyway, you manage an in-RAM GPU pagetable that just works like an
IOMMU, ...Is it possible to use a segment register to map the whole aperture on 32b? A segment register might allow common code on 64b/32b by -- Jon Smirl jonsmirl@gmail.com --
No. Segment registers don't extend the virtual address space, they can only limit visibility into the one single 32-bit one. IOW, segment registers don't actually extend addressing in any way, they only limit it. There's a reason why people don't use them (except as a strange base register for things like per-cpu or per-thread variables, and that is not to extend the address space, but to avoid wasting precious _useful_ registers on that). Linus --
Yes, a one-level linear MMU which uses BIOS-reserved memory. So, at least for a prototype, on 64-bit we can just use ioremap_wc and hold the mapping while the driver is open? Is there any huge benefit to I've got Venki lined up to do this work for me; once we're happy enough with the API. In particular, the non-highmem 32-bit case seems a bit tricky. Also, does anyone have a better set of names for this stuff? io_reserve_pci_mapping seems fairly ugly to me. --=20 keith.packard@intel.com
On Sat, 18 Oct 2008 17:38:11 -0700 how about pci_resource_force_caching(pci_dev, barnr, cachetype)? -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
and then 'pci_map_atomic'? Not sure this makes sense in the pci namespace. --=20 keith.packard@intel.com
correct. There's no benefit to using the kernel mapping - and we can add 2MB pages support to ioremap just fine and then you'll benefit from it automatically. Ingo --
Here's what these functions would look like as wrappers on top of the
existing APIs:
/* x86_64 style */
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
return (struct io_reserve *) ioremap_wc(base, size);
}
static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
return ((char *) reserve) + offset;
}
static inline void
iounmap_atomic(void *vaddr)
{
}
static inline void
iomap_unreserve(struct io_reserve *reserve)
{
iounmap(reserve);
}
/* HIGHMEM style */
#if 0
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
return (struct io_reserve *) base;
}
static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
offset +=3D (unsigned long) reserve;
return kmap_atomic(offset >> PAGE_SHIFT, KM_USER0);
}
static inline void
iounmap_atomic(void *vaddr)
{
kunmap_atomic(vaddr, KM_USER0);
}
static inline void
iomap_unreserve(struct io_reserve *reserve)
{
}
#endif
/* 32-bit non-HIGHMEM style */
#if 0
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
return NULL;
}
static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
return NULL;
}
static inline void
iounmap_atomic(void *vaddr)
{
}
static inline void
iomap_unreserve(struct io_reserve *reserve)
{
}
--=20
keith.packard@intel.com
Here's a patch for the i915 driver that includes the new API. Tested on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in the i915 directory for this patch, but it should go elsewhere. The new APIs are: io_reserve_create_wc io_reserve_free io_reserve_map_atomic_wc io_reserve_unmap_atomic io_reserve_map_wc io_reserve_unmap I added the non-atomic variants at Eric's suggestion so that we can use the direct map on x86_64, avoiding any use of ioremap at run-time. I think the resulting code looks quite a bit cleaner now. Also, one benchmark I tried ran about 18 times faster in 64-bit mode than the original code. =46rom 2f6b125cb586a0671a2b9c22aadb03fcafdf99ab Mon Sep 17 00:00:00 2001 From: Keith Packard <keithp@taka.keithp.com> Date: Sat, 18 Oct 2008 22:59:58 -0700 Subject: [PATCH] [drm/i915] Create new 'io_reserve' API for mapping GTT pag= es The io_reserve API abstracts away the operations necessary to construct mappings for our GTT aperture, providing atomic and non-atomic mappings for GTT pages that work efficiently on x86_64 and x86_32+HIGHMEM configurations= . This eliminates the in-drive abuse of the kmap_atomic_pfn function as well as improving GEM performance on x86_64 architecture. Signed-off-by: Keith Packard <keithp@taka.keithp.com> --- drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_gem.c | 71 ++++++++++++---------- drivers/gpu/drm/i915/i915_irq.c | 4 +- drivers/gpu/drm/i915/io_reserve.h | 122 +++++++++++++++++++++++++++++++++= ++++ 4 files changed, 165 insertions(+), 35 deletions(-) create mode 100644 drivers/gpu/drm/i915/io_reserve.h diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_dr= v.h index f20ffe1..c99b9ea 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -31,6 +31,7 @@ #define _I915_DRV_H_ =20 #include "i915_reg.h" +#include "io_reserve.h" =20 /* General customization: */ @@ -246,6 +247,8 @@ typedef struct ...
very nice!
I think we need a somewhat different abstraction though.
Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move
to generic code.
Secondly, wouldnt the right abstraction be to attach this functionality
to 'struct resource' ? [or at least create a second struct that embedds
struct resource]
this abstraction is definitely not a PCI thing and not a
detached-from-everything thing, it's an IO resource thing. We could make
it a property of struct resource:
struct resource {
resource_size_t start;
resource_size_t end;
const char *name;
unsigned long flags;
struct resource *parent, *sibling, *child;
+ void *mapping;
};
The APIs would be:
int io_resource_init_mapping(struct resource *res);
void io_resource_free_mapping(struct resource *res);
void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
void io_resource_unmap(struct resource *res, void *kaddr);
Note how simple and consistent it all gets: IO resources already know
their physical location and their size limits. Being able to cache an
ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a
relatively simple and natural extension to the concept.
i think that would be quite acceptable - and the APIs could just
transparently work on it. This would also allow the PCI code to
automatically unmap any cached mappings from resources, when the driver
deinitializes.
Linus, Jesse, what do you think?
i think we need to finalize the API names and their abstraction level,
and then could even merge those APIs into v2.6.28 on a fast path, to
enable you to use it. It does not interact with anything else so it
should be safe to do.
(i'd not suggest to merge the i915 bits just yet - but that's obviously
not my call.)
Ingo
--
On Sun, 19 Oct 2008 19:53:20 +0200 and making a simple wrapper around this that turns "struct pci_dev, barnr" into a resource would make sense too, but yes. We need one more io_resource_force_cachability(struct resource, cachetype) or maybe only io_resource_force_writecombine(struct resource) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
This API needs the cacheability control, which I don't see in it currently. In the past we've been relying on an MTRR over the GTT resulting in all of our UC- mappings getting us the correct WC behavior, but now there aren't enough MTRRs to go around and graphics loses out (at about a 20% CPU cost as a conservative estimate). The primary goal of the new API is to let us eat PAT costs up front so we don't have to at runtime. Second, we need to know when we're doing a mapping whether we're affected by atomic scheduling restrictions. Right now our plan has been to try doing page-by-page io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if we fail at that at some point (map returns NULL or we get a partial completion from copy_from_user_inatomic) then fall back to io_map_wc() and copy_from_user() the whole thing at once. That gets us good performance on both x86 with highmem and x86-64, and not too shabby performance on x86 non-highmem. Also, while it's rare, there have been graphics cards (looking at you, S3) where BARs were expensive for some reason and they stuffed both the framebuffer and registers into one PCI BAR, where you want the FB to be WC and the registers to be UC. Not sure if they would be supportable with this API or not. And if it's not, I'm not sure how much we care to design for them, but it's something to potentially consider. Finally, I'm confused by the pfn and offset args to io_resource_map, when I expected something parallel to ioremap but with our resource arg added. --=20 Eric Anholt eric@anholt.net eric.anholt@intel.com
yes, these two should do the trick: int io_resource_init_mapping_wc(struct resource *res); that gets ugly very fast. I think we should not use atomic kmaps but NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically atomic kmaps but without the preemption restrictions). We could take/drop the mutex and statistically you'll stay on the same CPU and yes, this is a weakness of this API - you cannot mix multiple cachability domains within the same BAR. and that can happen on non-graphics as well: some storage controller that has regular control registers in one portion of the BAR, which all need to be consistently accessed via UC and properly POST-ed - while it could also have some large mailbox structure at the end of the BAR, which could be mapped both cacheable or perhaps WC. So ... i guess we can go back to the io_mapping API proposed by Keith, but not make it atomic kmap based but fixmap + mutex based - for good ok. Ingo --
peterz pointed it out that we need to be careful with the TLBs in that case. I think it's solvable: a small non-default special-case in switch_to() would invlpg any pending such page. (and no two such mappings could be held at the same time) So as long as we dont leave the CPU we wont ever do TLB flushes, and the flush will be local even if we do. Ingo --
I'm not sure I see any value in caching mappings here; we're mostly interested in copying lots of data onto the card and so we use a lot of different mappings; atomic mappings are easy to use, and efficient enough. Also, if we're using a resource, I'd like to just use byte offsets and Let's figure out the API we want, then figure out whether it makes sense to stick it into 2.6.28 or whether we just want to wait for .29. I don't mind rewriting the driver for the next release. I will probably use something like the io_reserve stuff for .28 though; it makes the code easier to read and gives us good performance on 64 bit kernels. --=20 keith.packard@intel.com
yes but note that by caching the whole mapping on 64-bit we get everything we want: trivially lockless, works from any CPU, can be preempted at will, and there are no ugly INVLPG flushes anywhere. you'll even get 2MB mappings automatically, if the BAR is aligned and sized correctly. 32-bit we should handle as well but not design for it. Ingo --
I was assuming that on 64-bit, the map would be created at driver init time and be left in place until the driver closed; if that's what you As long as we get kmap_atomic-like performance, and we get to simplify our code, I'm up for it. --=20 keith.packard@intel.com
okay. So ... mind sending your io_mapping patch as a generic facility? It looks all good to me in its present form, except that it should live also, please send at least two patches, so that we can look at (and possibly merge) the generic facility in isolation. Ingo --
The first patch in this series (assuming I'm driving git-send-email correctly) adds the io_mapping API. I ended up creating a new linux/io_mapping.h file as the kernel init code uses io.h and got very angry when I tried to include linux/highmem.h from that. I'm afraid I gave up at that point and just moved the code to a new file. The second patch switches the drm/i915 driver to the new API. Performance improvements on 64-bit kernels are impressive as we were using the slow path before and now get to take advantage of 64-bit wonderfulness. --
Graphics devices have large PCI apertures which would consume a significant fraction of a 32-bit address space if mapped during driver initialization. Using ioremap at runtime is impractical as it is too slow. This new set of interfaces uses atomic mappings on 32-bit processors and a large static mapping on 64-bit processors to provide reasonable 32-bit performance and optimal 64-bit performance. The current implementation sits atop the existing CONFIG_HIGHMEM kmap_atomic mechanism for 32-bit processors when present. When absent, it just uses ioremap, which remains horribly inefficient. Fixing non-HIGHMEM 32-bit kernels to provide per-CPU mappings ala HIGHMEM would resolve that performance issue. Signed-off-by: Keith Packard <keithp@keithp.com> --- Documentation/io-mapping.txt | 84 ++++++++++++++++++++++++++++ include/linux/io-mapping.h | 125 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 209 insertions(+), 0 deletions(-) create mode 100644 Documentation/io-mapping.txt create mode 100644 include/linux/io-mapping.h diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt new file mode 100644 index 0000000..ebf6dc5 --- /dev/null +++ b/Documentation/io-mapping.txt @@ -0,0 +1,84 @@ +The io_mapping functions in linux/io.h provide an abstraction for +efficiently mapping small regions of an io device to the CPU. The initial +usage is to support the large graphics aperture on 32-bit processors where +ioremap_wc cannot be used to statically map the entire aperture to the CPU +as it would consume too much of the kernel address space. + +A mapping object is created during driver initialization using + + struct io_mapping * + io_mapping_create_wc(unsigned long base, unsigned long size) + + 'base' is the bus address of the region to be made + mappable, while 'size' indicates how large a mapping region to + enable. Both are in bytes. + + This _wc variant provides a mapping which may only be used + with the io_mapping_map_atomic_wc or ...
Switch the i915 device aperture mapping to the io-mapping interface, taking
advantage of the cleaner API to extend it across all of the mapping uses,
including both pwrite and relocation updates.
This dramatically improves performance on 64-bit kernels which were using
the same slow path as 32-bit non-HIGHMEM kernels prior to this patch.
Signed-off-by: Keith Packard <keithp@keithp.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 81 ++++++++++++++++-----------------------
2 files changed, 36 insertions(+), 48 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f20ffe1..8ca5fbc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
#define _I915_DRV_H_
#include "i915_reg.h"
+#include <linux/io-mapping.h>
/* General customization:
*/
@@ -246,6 +247,8 @@ typedef struct drm_i915_private {
struct {
struct drm_mm gtt_space;
+ struct io_mapping *io_mapping;
+
/**
* List of objects currently involved in rendering from the
* ringbuffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9255088..d38b052 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -177,14 +177,14 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
struct drm_file *file_priv)
{
struct drm_i915_gem_object *obj_priv = obj->driver_private;
+ drm_i915_private_t *dev_priv = dev->dev_private;
ssize_t remain;
- loff_t offset;
+ loff_t offset, base;
char __user *user_data;
char __iomem *vaddr;
char *vaddr_atomic;
- int i, o, l;
+ int o, l;
int ret = 0;
- unsigned long pfn;
unsigned long unwritten;
user_data = (char __user *) (uintptr_t) args->data_ptr;
@@ -211,42 +211,38 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
while (remain > 0) {
/* Operation in this page
*
- * i = page ...And I wish you could lose that horrible (non-Linux kernel) style of function return type on a separate line. --- ~Randy --
On Thu, 2008-10-23 at 21:49 -0700, Randy Dunlap wrote: (A bunch of helpful edits for the io-mapping.txt file) Thanks! They're in my tree and will get included in the next version of this patch. --=20 keith.packard@intel.com
ah ... good call, i missed that mess. linux/io.h is indeed dependency heh, cool - the wonders of 64-bit x86 :-) Any ballpark-figure numbers you can share with us? Ingo --
For one quake-3 based game we use for performance regression checking, 64-bit kernels run about 18 times faster now. That's the difference between using a zero-cost dynamic mapping and using ioremap_wc for each page. --=20 keith.packard@intel.com
So I've put these patches into Fedora rawhide kernel to test, and glxgears on my 945G hw went from 85fps to 380fps, clearly we would want these patches upstream sooner rather than later. Or we at least need something in the i915 driver to get the speed under GEM back up. I'm happy to ship these patches in F10 GA if that makes any difference, so I'd really like them upstream asap. Dave. --
yep, it's all lined up already in tip/core/resources, and got massively tested over the past few days. Will send a pull request to Linus tomorrow-ish - we need one final cleanup patch and then it's green to go. Ingo --
I'm inclined to agree. Not that I think 380fps sounds very impressive (I get 850+ fps with _software_ rendering, for chissake), but because 85 fps is a joke, and clearly without this setup there's not even any point to try to do any other optimizations. And if we're looking at the patches being in Fedora anyway, there really is no reason not to merge them. Even if it's out of the merge window. Ingo? Linus --
there was one pending item: i was waiting for the x86 #ifdef to be
cleaned up out of include/linux/io-mapping.h, but that is trivial with
no impact to any object code and can thus perhaps wait - would be nice
to get this in early in this -rc.
Please pull the latest io-mappings-for-linus git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git io-mappings-for-linus
Thanks,
Ingo
------------------>
Keith Packard (3):
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
resources: add io-mapping functions to dynamically map large device apertures
i915: use io-mapping interfaces instead of a variety of mapping kludges
Documentation/io-mapping.txt | 76 +++++++++++++++++
arch/x86/include/asm/fixmap.h | 4 +
arch/x86/include/asm/fixmap_32.h | 4 -
arch/x86/include/asm/highmem.h | 5 +-
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/init_32.c | 3 +-
arch/x86/mm/iomap_32.c | 59 +++++++++++++
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 174 +++++++++++++++++--------------------
include/asm-x86/iomap.h | 30 +++++++
include/linux/io-mapping.h | 118 ++++++++++++++++++++++++++
11 files changed, 373 insertions(+), 105 deletions(-)
create mode 100644 Documentation/io-mapping.txt
create mode 100644 arch/x86/mm/iomap_32.c
create mode 100644 include/asm-x86/iomap.h
create mode 100644 include/linux/io-mapping.h
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..cd2f726
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,76 @@
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as ...These cleanups can now be found in the second tree below.
They can be cleanly pulled over the other tree i just posted, but it
has slightly less testing. (the cleanup commits are fresh - but there
should be no difference in functionality)
Please pull the latest io-mappings-for-linus-2 git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git io-mappings-for-linus-2
Thanks,
Ingo
------------------>
Keith Packard (5):
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
resources: add io-mapping functions to dynamically map large device apertures
i915: use io-mapping interfaces instead of a variety of mapping kludges
io mapping: improve documentation
io mapping: clean up #ifdefs
Documentation/io-mapping.txt | 82 ++++++++++++++++++
arch/x86/Kconfig | 4 +
arch/x86/include/asm/fixmap.h | 4 +
arch/x86/include/asm/fixmap_32.h | 4 -
arch/x86/include/asm/highmem.h | 5 +-
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/init_32.c | 3 +-
arch/x86/mm/iomap_32.c | 59 +++++++++++++
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 174 +++++++++++++++++--------------------
include/asm-x86/iomap.h | 30 +++++++
include/linux/io-mapping.h | 125 +++++++++++++++++++++++++++
12 files changed, 390 insertions(+), 105 deletions(-)
create mode 100644 Documentation/io-mapping.txt
create mode 100644 arch/x86/mm/iomap_32.c
create mode 100644 include/asm-x86/iomap.h
create mode 100644 include/linux/io-mapping.h
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..473e43b
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,82 @@
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large ...Should the documentation for this function (perhaps the certainly-forthcoming kerneldoc comments :) mention loudly that this function uses KM_USER0? This isn't kmap(), and doesn't look much like it; someday some developer might get an ugly surprise when they try to use that slot simultaneously for something else. jon --
definitely worth a comment. Ingo --
We just ran some numbers on a box with PAT enabled and broken MTRRs.
Finally we have a test platform for the difference between kmap_atomic
and kmap_atomic_prot. Using regular kmap_atomic on this platform, we get
UC access to the graphics device; sending data from the CPU is a bit
slow. Adding kmap_atomic_prot_pfn and specifying WC access raises that
by a fairly significant factor, taking our CPU utilization for
copy_from_user from 40% to 2%.
Here's a patch which adds kmap_atomic_prot_pfn to the kernel for this
usage. When we add native io-mapping support instead of sitting on top
of kmap, we can remove this function.
I've reworked the io_mapping patches to use this function as well, along
with cleaning up the i915 code along the lines of Linus' current
version. I'll post those if this patch looks acceptable.
=46rom e3f633dcb36889fa85ea06cca339072df3c44ae0 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Thu, 23 Oct 2008 11:53:45 -0700
Subject: [PATCH] Add kmap_atomic_prot_pfn
kmap_atomic_prot_pfn is a mixture of kmap_atomic_prot and kmap_atomic_pfn,
accepting both a pfn instead of a page and an explicit protection value.
Signed-off-by: Keith Packard <keithp@keithp.com>
---
arch/x86/mm/highmem_32.c | 19 +++++++++++++++++++
include/asm-x86/highmem.h | 1 +
include/linux/highmem.h | 1 +
3 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index bcc079c..91ae5e8 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -139,6 +139,25 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type =
type)
}
EXPORT_SYMBOL_GPL(kmap_atomic_pfn); /* temporarily in use by i915 GEM unti=
l vmap */
=20
+/* This is the same as kmap_atomic_prot() but can map memory that doesn't
+ * have a struct page associated with it.
+ */
+void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t =
prot)
+{
+ enum fixed_addresses idx;
+ unsigned long ...On Thu, 23 Oct 2008 13:22:12 -0700 Given that all highmem-implementing archtiectures must use the same declaration here, we might as well put it into include/linux/highmem.h. Although that goes against current mistakes^Wcode. Does powerpc32 still implement highmem? It seems that way. You broke it, no? --
The goal is to stop needing this function fairly soon and replace it Powerpc32 doesn't have kmap_atomic_pfn either. Seems like the set of HIGHMEM functions is not uniform across architectures. --=20 keith.packard@intel.com
This really shouldn't be in highmem.h AT ALL. The whole point of that function has absolutely nothing to do with highmem, and it *must* be useful on non-highmem configurations to be appropriate. So I'd much rather create a new <linux/kmap.h> or something. Or just expose this from to <asm/fixmap.h> or something. Let's not confuse this with highmem, even if the implementation _historically_ was due to that. Linus --
Right, I'd just like my io_mapping_map_atomic_wc to be able to rapidly map random pages from my PCI BAR in WC mode. The code in your tree uses kmap_atomic_pfn, which does the mapping, but use the wrong prot bits. On a system with MTRR failure, this all ends badly, with factors of 20 Sure, we readily admit that we're abusing the highmem API. So we wrapped that abusive code in a pretty package that can be re-implemented however you desire. How (and when) would you like to see this fixed? --=20 keith.packard@intel.com
I'm not entirely sure who wants to own up to owning that particular part of code, and is willing to make kmap_atomic_prot_pfn() also work in the absense of CONFIG_HIGHMEM. I suspect it's Ingo, but I also suspect that he'd like to see a tested patch by somebody who actually _uses_ this code. So I would suspect that if you guys actually write a patch, and make sure that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to Ingo, good things will happen. As to the "without CONFIG_HIGHMEM" part: making all of this work even without HIGHMEM should literally be a matter of: - remove the '#ifdef CONFIG_HIGHMEM' in <asm/fixmap_32.h>, so that we have fixmap entries for the temporary kernel mappings regardless (ie FIX_KMAP_BEGIN and FIX_KMAP_END). - move the code for the atomic accesses that use those fixmap entries into a file that gets compiled regardless of CONFIG_HIGHMEM. Or at least just kmap_atomic_prot_pfn() and kunmap_atomic(). and it probably should all work automatically. The kmap_atomic() stuff really should be almost totally independent of CONFIG_HIGHMEM, since it's really much more closely related to fixmaps than HIGHMEM. I guess there may be some debugging code that depends on HIGHMEM (eg that whole testing for whether a page is a highmem page or not), so it might be a _bit_ more than just moving code around, but I really didn't look closer. Then, there's the issue of 64-bit, and just mapping everything there, and the interface to that. I liked the trivial extension to "struct resource" to have a "cached mapping" pointer. So if we can just make it pass resources around and get a page that way (and not even need kmap() on 64-bit architections), that would be good. It's too late for -rc1 (which I'm planning on cutting within the hour), and if it ends up being complex, I guess that it means this will eb a 2.6.29 issue. But if it really is just a matter of mostly just some trivial code movement and both ...
I'm not that fan of carrying a mapping with a struct resource because if we do that we should probably also refcount the mapping, and then there is the whole question of mappings with different attributes, etc etc... Cheers, Ben. --
I'm fine with sticking the mapping in a separate structure; it's just the return from ioremap_wc on 64-bit systems, and nothing at all on 32-bit systems. I don't really see the advantage of moving it into the resource, especially as we may need different access modes (UC vs WC) for different portions of a single BAR. We may want to add some bounds and access mode information to this structure to check for broken drivers. --=20 keith.packard@intel.com
Actually, on 32-bit, the 'prot' should be there, as should the starting physical page. Otherwise the two interfaces would be very odd, and you'd have to repeat those arguments in all callers (ie both in "prepare" and in the actual "access"). Linus --
I suppose. What I did instead was create _wc versions of both the prepare and access functions to eliminate the need for additional data. Either way is fine with me; I took the route which didn't require an additional allocation. --=20 keith.packard@intel.com
All of the kmap_atomic functions *do* work without CONFIG_HIGHMEM, they just don't do what we want in this case. Without knowing the history, it seems fairly clear that the kmap functions are designed to map physical memory pages into the kernel virtual address space. On small 32-bit systems, that's trivial, you just use the direct map (as one does on 64-bit systems). The magic fixmap entries make this work with CONFIG_HIGHMEM as well. So, I fear touching the kmap API as it appears to have a specific and useful purpose, irrespective of the memory size the kernel is configured for. What I've proposed is that we create a new io-space specific set of fixmap APIs. On CONFIG_HIGHMEM, they'd just use the existing kmap_atomic mechanism, but on small 32-bit systems, we'd enable the fixmaps and have As above, I think kmap_atomic should be left alone as a way of quickly mapping memory pages. There are a users of both kmap_atomic_prot (xen) The io_mapping API I proposed does precisely this. On 32-bit systems, it uses kmap_atomic for each page access while on 64-bit systems it uses ioremap_wc at device init time and then just uses an offset for each page access. Hiding this detail behind an API leaves the driver code independent of If we do end up pushing this out to 2.6.29, I'd like to see kmap_atomic_prot_pfn in place as a stop-gap so that PAT performance on 32-bit systems is reasonable. I don't think too many people are running desktop systems without CONFIG_HIGHMEM at this point, and if so, we can I'll try to get something working in the next day or so and see how it looks. My plan at this point is to create new API for 32-bit systems: void *io_map_atomic_wc(unsigned long pfn) void io_unmap_atomic(void *addr); With this, I can switch my existing io_mapping API over to an io-specific interface and implement those using the fixmap code. --=20 keith.packard@intel.com
Something like the following (yes, I know, this is missing the include/asm-x86 rename)? =46rom e7921809c72f940295311cfa6c300d5234ac96c1 Mon Sep 17 00:00:00 2001 From: Keith Packard <keithp@keithp.com> Date: Thu, 23 Oct 2008 23:17:40 -0700 Subject: [PATCH] [x86_32] Add io_map_atomic using fixmaps This steals the code used for CONFIG_HIGHMEM memory mappings except that it's designed for dynamic io resource mapping. These fixmaps are available even with CONFIG_HIGHMEM turned off. Signed-off-by: Keith Packard <keithp@keithp.com> --- arch/x86/mm/Makefile | 2 +- arch/x86/mm/init_32.c | 3 +- arch/x86/mm/iomap_32.c | 59 +++++++++++++++++++++++++++++++++++++++= ++++ include/asm-x86/fixmap.h | 4 +++ include/asm-x86/fixmap_32.h | 4 --- include/asm-x86/highmem.h | 8 +++--- include/asm-x86/iomap.h | 30 ++++++++++++++++++++++ include/linux/io-mapping.h | 15 +++------- 8 files changed, 104 insertions(+), 21 deletions(-) create mode 100644 arch/x86/mm/iomap_32.c create mode 100644 include/asm-x86/iomap.h diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 59f89b4..fea4565 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -1,7 +1,7 @@ obj-y :=3D init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \ pat.o pgtable.o gup.o =20 -obj-$(CONFIG_X86_32) +=3D pgtable_32.o +obj-$(CONFIG_X86_32) +=3D pgtable_32.o iomap_32.o =20 obj-$(CONFIG_HUGETLB_PAGE) +=3D hugetlbpage.o obj-$(CONFIG_X86_PTDUMP) +=3D dump_pagetables.o diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 8396868..c483f42 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -334,7 +334,6 @@ int devmem_is_allowed(unsigned long pagenr) return 0; } =20 -#ifdef CONFIG_HIGHMEM pte_t *kmap_pte; pgprot_t kmap_prot; =20 @@ -357,6 +356,7 @@ static void __init kmap_init(void) kmap_prot =3D PAGE_KERNEL; } =20 +#ifdef CONFIG_HIGHMEM static void __init permanent_kmaps_init(pgd_t ...
Keith, What you actually are doing here is claiming copyright on code that other people have written, and tighten the export restrictions. kmap_atomic_prot_pfn() appeared long ago in drm git with identical code and purpose, but with different authors, and iounmap_atomic is identical to kunmap_atomic. Pls fix. /Thomas --
At this point I have no such use for it, no. The original user was a MIT style licenced driver. /Thomas --
okay, then just to put this question to rest: i wrote the original 32-bit highmem code ~10 years ago. I wrote the first version of fixmap support - in fact i coined the term. I wrote the first version of the atomic-kmap facility as well. All of that code is licensed under the GPLv2. So if anyone wants to make any copyright claims about highmem/kmap/fixmap derivative works, consider it in that light. Regarding this new API variant that Keith wrote: it would be silly and dangerous to export it anywhere but to in-kernel drivers. The API disables preemption on 32-bit and rummages deep in the guts of the kernel as well, uses up a precious resource (fixmap slots), etc. It's internal and we eventually might want to deprecate forms of it and concentrate on the good 64-bit performance side. Ingo --
I fully acknowledge that. The fact that others wrote almost all of the code is the reason why there is no additional copyright claims in the file of the original kmap_atomic_prot_pfn() implementation. What I'm considering very bad form is someone taking other people's contributions, be it code or ideas, no matter how small they are, claim copyright on them and restrict the original usage. That's really the point here. Frankly, from my own usage point of view I don't really care about the specific case (whether they are exported GPL or not), but with the current form of this patch, I'm basically no longer allowed to use These are all good arguments to reserve this api for in-kernel drivers. There are other reasons why it should be available to out of tree drivers: * The api enables a fast facility that will be extremely important by graphics drivers in the future, probably not only for in-kernel drivers. Particularly as future graphics drivers will want to get fast write-combined kernel mappings also on highmem pages, not only io. This latter actually suggests keeping the form kmap_atomic_prot_pfn instead of iomap_atomic_prot_pfn, and make it permanent, unless the same goals can be achieved by the VMAP work Nick Piggin is suggesting. * The use of k[un]map_atomic() would, following your argumentation, be equally dangerous to export non-GPL? Now, considering these pros and cons one might still come to the conclusion that for stability reasons it is best to keep this API for in kernel drivers. I don't really know enough about the affected kernel internals to be able to judge. I just think it's important that all facts are considered. /Thomas --
Yeah, I just stuck my usual license header on it and didn't think about authorship. I'll fix that, once we figure out what the appropriate name is. But, as this code is clearly a trivial adaptation of the existing kernel code, it should carry a GPLv2 license. I'm also not particular as to the EXPORT restriction, I was just following the EXPORT advice given for the other newly exposed kernel symbols we're using. --=20 keith.packard@intel.com
Thanks, Keith. If there is a chance that people who do want the EXPORT_SYMBOL_GPL restriction are OK to go with just EXPORT_SYMBOL(), I guess that would fit best considering what's already exported and doable in drivers today. /Thomas --
yeah. I already asked Keith yesterday to send one coherent bundle of patches and i think we've got all the code sent already, you also pulled the latest DRM tree - so it all just needs sorting out and testing. [ I can possibly test the end result with a bleeding-edge Xorg which supports the new DRI APIs. (Whether X comes up fine is a regular, necessary and 'fun' component of the x86 regression testing we do hm, the thing that i find the weakest aspect of that (despite having suggested this approach) is that the structure and the granularity of the resource tree is really controlled by the hardware environment. We try to map the hardware circumstances accurately at bootstrap / device discovery time, and keep it rather static from that point on. (modulo hotplug and dynamic BAR sizing) And this static property of the resource tree is _good_ IMO, because we can think about it as a hardware property - not something tweaked by the kernel. (the kernel does tweak it when need to be or when the hardware asks for it, but it's more of an exception) That means that if a driver wants to map just a portion of a BAR, (because the hardware itself compresses various different pieces of functionality into the same BAR), this abstraction becomes a limitation on usage. And it might even be _impossible_ to use the simplest form of resource->mem_cache facility with certain types of hardware: say there's a cacheable and an uncacheable window within the same BAR - and we'd map the whole thing as cacheable. The CPU is then entitled to (and will most likely) prefetch into the uncacheable region as well, causing hw lockups, etc. [In this thread it was claimed that S3 chips have such properties.] And tweaking this abstraction to cover those cases, for the ioremap to not be a full mapping of the resource looks a bit hacky/limiting as well: - we'd either have to add 'size', 'offset' properties to the window we cache in the struct resource. (and possibly ...
Well, on powerpc, we just went (or rather, Kumar just went) through the oops of implementing fixmap and then kmap on top of it... just because we wanted kmap_atomic functionality on non-highmem platforms :-) (and the fixmap approach has some other interesting features). So yes, I agree. Typically very useful for any 32-bit processor with a memory mapped PCI Express config space since it's something like 256M of virtual space to map it all iirc. Cheers, Ben. --
the downsize would be that we'd attach a runtime property to the IORESOURCE_MEM resource tree - which is a fairly static thing right now, after the point where we finalize the resource tree. (modulo device/bridge hotplug variances) Another downside is that we might not want to map the whole thing. I.e. the structure of the IO memory space we want to map by drivers might be different from how it looks like in the resource tree. the concept of introducing resource->mapping does not feel _that_ wrong though and has a couple of upsides: it could act as a natural mapping type serializer for example and drivers wouldnt have to explicitly manage ioremap results - they could just use the resource descriptor directly and "read" and "write" to/from it. readl/writel could be extended to operate on the resource descriptor transparently, getting rid of a source of resource mismatches and overmapping, etc. etc. We could even safety check IO space accesses this way. and we'd get rid of the complication that your APIs introduced, the need to introduce a separate io_mapping type, etc. Dunno, i might be missing some obvious downside why this wasnt done like that until now. Ingo --
we could expand init_memory_mapping to support direct map for them... with needed prot. or use set init_memory_mapping() + set_memory_wb() directly. but that is only for 64bit x86 also someone is talking about to have 6 pcie display adapters on 64bit system. and every card will have 4g ram. 32bit could use fix map for 1G or 2G mapping. YH --
TLB shootdown need only be implied if the behaviour required is to unmap the virtual address *and* cause any other CPU that subsequently touches it to fault. For kva, that would be a bug anyway (use after free). The only thing it implies is that a TLB shootdown happens at some point before the address get reused. Still, it's always going to be faster than a global mapping, if done properly. I was thinking about doing a vmap_atomic thing generically in the vmap layer... why exactly do we need the FIXMAP stuff for it? --
