This patch series brings a long-awaited kernel memory manager to the i915 driver. This will allow us to do correct composited OpenGL, speed up OpenGL-based compositing, and enable framebuffer objects and other "new" OpenGL extensions. This patchset is also being built to enable kernel modesetting for a non-root, flicker-free X Server. This is a re-submit of the changes for DRM-GEM. It relies on patches submitted by airlied which are currently queued in linux-next. The tree still has all the changes required, based off of 2.6.27-rc4. git://people.freedesktop.org/~anholt/linux-2.6 on the drm-gem-merge branch http://cgit.freedesktop.org/~anholt/linux-2.6/log/?h=drm-gem-merge New in this edition since the original submission: - Exporting kmap_atomic_pfn. (The previous submission was slow because it was checking for the drm_compat.c version of this function from the external tree) - shmem_getpage usage replaced with read_mapping_page. - fixes for software fallbacks on tiled buffers. - speedups for software fallbacks. - replaced pci_read_base usage with using the MCHBAR mirror aperture - fixed some issues on X server exit What's not new in this edition: Still using shmem_file_setup. We need to be able to allocate objects from the kernel, and didn't get any clear agreement that doing a VFS dance would be preferable to letting us behave like other kernel subsystems and use the function. Still using small integers to identify our objects rather than fds. We need more than just basic syscalls on the objects -- the alternate mmap issue is more serious than before, for X pixmap usage. There are also the ioctls for cache management for software fallbacks. And the issue of getting high fds and large numbers of them still remained. Still have an issue with PAT on x86_64 -- initialization fails because ioremap() apparently has different semantics there than on x86 or non-PAT x86_64 (if somebody has mapped the space and there's a WC MTRR, the ioremap that defaults to ...
From: Keith Packard <keithp@keithp.com> GEM needs to create shmem files to back buffer objects. Though currently creation of files for objects could have been driven from userland, the modesetting work will require allocation of buffer objects before userland is running, for boot-time message display. Signed-off-by: Eric Anholt <eric@anholt.net> --- mm/shmem.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 04fb4f1..515909d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2582,6 +2582,7 @@ put_memory: shmem_unacct_size(flags, size); return ERR_PTR(error); } +EXPORT_SYMBOL(shmem_file_setup); /** * shmem_zero_setup - setup a shared anonymous mapping -- 1.5.6.3 --
The driver would like to map IO space directly for copying data in when
appropriate, to avoid CPU cache flushing for streaming writes.
kmap_atomic_pfn lets us avoid IPIs associated with ioremap for this process.
Signed-off-by: Eric Anholt <eric@anholt.net>
---
arch/x86/mm/highmem_32.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 165c871..d52e91d 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -137,6 +137,7 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
return (void*) vaddr;
}
+EXPORT_SYMBOL(kmap_atomic_pfn);
struct page *kmap_atomic_to_page(void *ptr)
{
--
1.5.6.3
--
GEM allows the creation of persistent buffer objects accessible by the graphics device through new ioctls for managing execution of commands on the device. The userland API is almost entirely driver-specific to ensure that any driver building on this model can easily map the interface to individual driver requirements. GEM is used by the 2d driver for managing its internal state allocations and will be used for pixmap storage to reduce memory consumption and enable zero-copy GLX_EXT_texture_from_pixmap, and in the 3d driver is used to enable GL_EXT_framebuffer_object and GL_ARB_pixel_buffer_object. Signed-off-by: Eric Anholt <eric@anholt.net> --- drivers/gpu/drm/Makefile | 5 +- drivers/gpu/drm/drm_agpsupport.c | 51 +- drivers/gpu/drm/drm_cache.c | 76 + drivers/gpu/drm/drm_drv.c | 4 + drivers/gpu/drm/drm_fops.c | 6 + drivers/gpu/drm/drm_gem.c | 420 ++++++ drivers/gpu/drm/drm_memory.c | 2 + drivers/gpu/drm/drm_mm.c | 5 +- drivers/gpu/drm/drm_proc.c | 135 ++- drivers/gpu/drm/drm_stub.c | 10 + drivers/gpu/drm/i915/Makefile | 6 +- drivers/gpu/drm/i915/i915_dma.c | 94 +- drivers/gpu/drm/i915/i915_drv.c | 8 +- drivers/gpu/drm/i915/i915_drv.h | 253 ++++- drivers/gpu/drm/i915/i915_gem.c | 2509 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_debug.c | 201 +++ drivers/gpu/drm/i915/i915_gem_proc.c | 292 ++++ drivers/gpu/drm/i915/i915_gem_tiling.c | 256 ++++ drivers/gpu/drm/i915/i915_irq.c | 8 +- drivers/gpu/drm/i915/i915_reg.h | 37 +- include/drm/drm.h | 31 + include/drm/drmP.h | 151 ++ include/drm/i915_drm.h | 332 +++++ 23 files changed, 4835 insertions(+), 57 deletions(-) create mode 100644 drivers/gpu/drm/drm_cache.c create mode 100644 drivers/gpu/drm/drm_gem.c create ...
I wonder if you ever tested my vmap rework patches with this issue? It seems somewhat x86 specific and also not conceptually so clean to use kmap_atomic_pfn for this. vmap may not be used by all architectures but I think it might be able to cover some of them. As I said, there are some other possible improvements that can be made to my vmap rewrite if performance isn't good enough, but I simply have not seen numbers... Thanks, Nick --
The consumer of this is a driver for Intel platforms, so being x86-specific is not a worry this patch series. However, when other DRM drivers get around to doing memory management, I'm sure they'll also be interested in an ioremap_wc that doesn't eat ipi costs. For us, the ipis for flushing were eating over 10% of CPU time. If your patch series cuts that cost, we could drop this piece at that point. --=20 Eric Anholt eric@anholt.net eric.anholt@intel.com
It would help verify and improve the new vmap code, and it would be "doing the right thing" to begin with. It would avoid some nasty ifdefery in your driver too. And what about 64 bit x86 that doesn't It can cut the cost quite significantly on normal vmap/vunmap loads I tested. Whether it will work as well on your workload, I don't know but I would have liked to find out. I raised this issue quite a while back, so I'm disappointed it had not been tried... --
I think Eric has code with the vmap changes now? Given our discussions at KS/Plumbers would you be ok with acking these patches? Or do you want a repost so you can check out the vmap stuff? After talking a bit more about it, I think we agreed that the ioctl interface is actually a better approach then trying to shoehorn this stuff into system calls, so aside from the vmap code (which could be done in 2.6.29 or whenever the vmap stuff lands) I think this patchset is pretty close to what we want in drm-next now... Thanks, Jesse --
I've been trying to get a good test of the vmap changes. Unfortunately, it looks like things have changed in ioremap in the intervening time between when I last tested and now. I was taking 2.6.27-rc5-mm1 (since that was the only git tree I found with Nick's changes in it, unfortunately) and merging our stuff into there then reverting the kmap_atomic_prot_pfn bit. However, we've moved from 10% cost in unmapping due to IPIing to a >30% cost in mapping due to change_page_attr. This is regardless of whether I do ioremap or ioremap_wc. The physical range is covered by a WC MTRR, and the X Server's mapping the memory using the _wc resource. In the DRM, I just tried moving every ioremap to _wc, and it's the same. The call chain looks like i915_gem_pwrite_ioctl (60%) ioremap_wc (39%) ioremap_nocache (39%) ioremap_caller (39%) ioremap_change_attr (36%) _set_memory_uc (36%) change_page_attr_set_clr (36%) vm_unmap_aliases (31%) Maybe if I go back and generate a tree of just Nick's changes and try to merge them forward I can get a good test. However, it looks to me like we've got some serious brokenness in ioremap and attribute handling coming up. --=20 Eric Anholt eric@anholt.net eric.anholt@intel.com
