Re: [git pull] drm patches for 2.6.27-rc1

Previous thread: [GIT PATCHES for 2.6.28] V4L/DVB updates by Mauro Carvalho Chehab on Friday, October 17, 2008 - 2:20 pm. (1 message)

Next thread: [Slightly off topic] A question about R/B trees. by Maxim Levitsky on Friday, October 17, 2008 - 2:34 pm. (7 messages)
From: Dave Airlie
Date: Friday, October 17, 2008 - 2:29 pm

Hi Linus,

This is a new tree, I had a conflict with your latest tree due to some
trivial cleanups you merged. I've added the fix for CVE-2008-3831 which is
unembargoed.

Please pull the 'drm-next' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-next

This contains two patches outside the DRM git tree to add exports for GEM
functionality while we await Nick Piggins vmap and shmem changes.

This update contains the following.

1. CVE-2008-3831 - fixes for memset on arbitrary memory via i915 on G33 hw and
above.
2. Intel GEM memory manager .
3. Opregion support for Intel Integrated chipsets.
4. Updated radeon driver with support for new chipsets + bugfixes
5. vblank reworking to save power and interrupts on intel/radeon GPUs.

Dave.

 arch/x86/mm/highmem_32.c               |    1 +
 drivers/gpu/drm/Kconfig                |    3 +-
 drivers/gpu/drm/Makefile               |    5 +-
 drivers/gpu/drm/drm_agpsupport.c       |   52 +-
 drivers/gpu/drm/drm_cache.c            |   69 +
 drivers/gpu/drm/drm_drv.c              |    6 +
 drivers/gpu/drm/drm_fops.c             |    8 +-
 drivers/gpu/drm/drm_gem.c              |  421 ++++++
 drivers/gpu/drm/drm_irq.c              |  464 +++++-
 drivers/gpu/drm/drm_memory.c           |    2 +
 drivers/gpu/drm/drm_mm.c               |    5 +-
 drivers/gpu/drm/drm_proc.c             |  135 ++-
 drivers/gpu/drm/drm_stub.c             |   11 +-
 drivers/gpu/drm/drm_sysfs.c            |    2 +-
 drivers/gpu/drm/i915/Makefile          |    7 +-
 drivers/gpu/drm/i915/i915_dma.c        |  332 +++--
 drivers/gpu/drm/i915/i915_drv.c        |  476 +------
 drivers/gpu/drm/i915/i915_drv.h        | 1180 +++++-----------
 drivers/gpu/drm/i915/i915_gem.c        | 2558 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_debug.c  |  201 +++
 drivers/gpu/drm/i915/i915_gem_proc.c   |  292 ++++
 drivers/gpu/drm/i915/i915_gem_tiling.c |  257 ++++
 drivers/gpu/drm/i915/i915_irq.c        |  514 +++++--
 ...
From: Linus Torvalds
Date: Friday, October 17, 2008 - 3:43 pm

Grr.

This whole merge series has been full of people sending me UNTESTED CRAP.

So what's the excuse _this_ time for adding all these stupid warnings to 
my build log? Did nobody test this?

  drivers/gpu/drm/drm_proc.c: In function ‘drm_gem_one_name_info’:
  drivers/gpu/drm/drm_proc.c:525: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
  drivers/gpu/drm/drm_proc.c:533: warning: format ‘%9d’ expects type ‘int’, but argument 4 has type ‘size_t’
  drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_gtt_pwrite’:
  drivers/gpu/drm/i915/i915_gem.c:184: warning: unused variable ‘vaddr_atomic’

and I wonder how many other warnings got added that I never even noticed 
because I don't build them?

And yes, it's not just warnings. One of thse is horribly bad code:

	nid->len += sprintf(&nid->buf[nid->len],
                            "%6d%9d%8d%9d\n",
                            obj->name, obj->size,
                            atomic_read(&obj->handlecount.refcount),
                            atomic_read(&obj->refcount.refcount));

where it's just wrong to use the field width as a separator. Because if 
the counts are big enough, the separator suddenly goes away!

So that format string should be

	"%6d %8zd %7d %8d\n"

instead. Which  gives the same output when you don't overflow, and doesn't 
have the overflow bug when you do.

As to that "vaddr_atomic" thing, the warning would have been avoided if 
you had just cleanly split up the optimistic fast case.

IOW, write cleaner code, and the warning just goes away on its own. 
Something like the appended. UNTESTED!

Hmm?

I really wish people were more careful, and took more pride in trying to 
write readable code, with small modular functions instead. And move those 
variables down to the block they are needed in.

Anyway, I pulled the thing, but _please_ test this cleanup and send it 
back to me if it passes your testing. Ok? 

			Linus

---
 drivers/gpu/drm/drm_proc.c      |    ...
From: Eric Anholt
Date: Friday, October 17, 2008 - 7:10 pm

pects type =E2=80=98int=E2=80=99, but argument 3 has type =E2=80=98size_t=
xpects type =E2=80=98int=E2=80=99, but argument 4 has type =E2=80=98size_t=

Yeah, none of us are on x86-64, so we missed those warnings in testing.
The change looks pretty good except for s/return 1/return 0/.  We wanted
to pull the i_wish_it_was_ioremap_atomic() hack out so that we could use
it for relocation updates as well (which aren't copy_from_user), and
I've started writing that patch.  Expect something pushed at you soon.

--=20
Eric Anholt
eric@anholt.net                         eric.anholt@intel.com


From: Linus Torvalds
Date: Friday, October 17, 2008 - 7:47 pm

[Empty message]
From: Keith Packard
Date: Friday, October 17, 2008 - 8:49 pm

We're lazy, perhaps even lazier than yourself. Given that the whole goal
is to essentially ignore the CPU and get our code running on the GPU,
it's hard to get excited about the kind of kernel we're running on the
CPU.

We've got a bunch of test boxes that run 64-bits, unfortunately, the
people doing builds there appear not to care about warnings. That will
get fixed.

I've also got this new G45 box here; perhaps it's time to enter the 21st
century and run it in native 64-bit mode. It's scary though; I've been
writing window systems for 32-bit machines for thirty years now.

--=20
keith.packard@intel.com
From: Corbin Simpson
Date: Friday, October 17, 2008 - 11:44 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Indeed. I have been running 64 bit builds for quite a while now, and
merely ignored the warnings as "not my problem." In the future, I shall
make it my problem.

~ C.

- --
~ Corbin Simpson
<MostAwesomeDude@gmail.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkj5hecACgkQeCCY8PC5utAMAQCggtkBoolO58rW5PnPkYTosyZ5
DkgAnAqRSZzoFhgkdFcL92qV6Zyc7usJ
=vpMP
-----END PGP SIGNATURE-----
--

From: Eric Anholt
Date: Saturday, October 18, 2008 - 12:49 am

Writing 3D drivers means running 3D games.  Running 3D games
unfortunately means running a lot of 32-bit userland as the fun stuff is
binary-only.  So I stick to a 32-bit system, becuase past experience
trying to run both on the same system has been misery.

--=20
Eric Anholt
eric@anholt.net                         eric.anholt@intel.com


From: Peter Zijlstra
Date: Sunday, October 19, 2008 - 10:52 am

You can run 32bit user-space on a 64bit kernel just fine.
Any breakage, if anything, you find will need to get fixed anyway.

Also, one would hope Intel is poking these game writers to migrate to
64bit Linux ;-)

--

From: Steven J Newbury
Date: Sunday, October 19, 2008 - 9:17 pm

I'd like to second this.  32bit on 64bit works fine.  I've been
maintaining separate multilib versions of the Gentoo libdrm/Mesa ebuilds
for 32bit/64bit.  It is important that the 32bit and 64bit graphics
stacks are synchronised, rather than relying on external compatibility
libraries as Gentoo does currently.  Failing to do so does lead to
Indeed! :)

--

From: Linus Torvalds
Date: Monday, October 20, 2008 - 9:31 am

That's not a very good reason, though. We're supposed to be perfectly 
binary compatible, so it would actually be *better* if you did the testing 
using a 64-bit kernel.

Sure, don't do it on _all_ machines, but running 32-bit user land is 
definitely not a valid reason to avoid a 64-bit kernel. Quite the reverse. 
We'd like to see more coverage.

			Linus
--

From: Jesse Barnes
Date: Monday, October 20, 2008 - 1:04 pm

Actually, I'm on x86_64 pretty much exclusively and saw these warnings last 
week.  But I didn't send a fix (yet); sorry.

That said, this code was far from untested, even though it did contain a few 
compile warnings, so I think Linus's complaint about UNTESTED CRAP was at 
least half wrong.

-- 
Jesse Barnes, Intel Open Source Technology Center

--

From: Dave Airlie
Date: Saturday, October 18, 2008 - 2:11 am

The code has been tested on 32 and 64 bit in lots of places, however I
don't generally trawl the Fedora
kernel build logs, and the amount of warnings we have there, new ones
just get lost in the mists..

So its not untested on 64-bit, its just nobody who cared enough saw
the compiler warnings. If linux-next had been
going for the past few weeks someone would have spotted them, I just
got the report yesterday from linux-next after I fed the push request.

Perhaps we all need access to the Ingo compile farm.

Dave.
--

From: Nick Piggin
Date: Friday, October 17, 2008 - 6:37 pm

OK. I was hoping that kmap_atomic_pfn thing *never* see the light of
mainline ;) Because hopefully the vmap improvements should be OK for
2.6.28 as well...

But it's already there. OK, if vmap patches go upstream, then it can
be switched over and the export removed before the release...

Thanks,
Nick
--

From: Keith Packard
Date: Saturday, October 18, 2008 - 12:11 pm

Eric and I chatted with Venki Pallipadi this week and decided what we
really need is an official API for getting at pfns in our device BAR,
which is what we're (ab)using kmap_atomic_pfn for. This ties in with
some PAT issues as well, where we want to assert that all mappings of
our BAR are in WC mode. The basic plan is to have four new functions
(yes, I'm making up names here):

struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,=20
                                           int bar,=20
                                           int prot);
void io_mapping_free(struct io_mapping *mapping);

void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);

This would have the additional effect of making all other mappings of
this BAR be required to match the specific prot, both providing
protection against broken drivers, and providing support for old X

I'd say we should leave things alone for .28 and work on making an
official IO mapping API available.

--=20
keith.packard@intel.com
From: Linus Torvalds
Date: Saturday, October 18, 2008 - 12:31 pm

The important thing is that mappings need to be per-CPU, so the above may 
work, but only if it's designed so that "io_reserve_pci_resource()" will 
actually reserve space for 'nr_possible_cpu' page mappings, and then the 
"io_[un]map_atomic()" functions do per-CPU mappings.

Anything else is a disaster, because anything else implies TLB shootdown.

And quite frankly, even so, we'd possibly still be _better_ off with just 
exposing the "kmap_atomic_pfn()" functionality even so. Because quite 
frankly, your "io_reserve_pci_resource()" infrastructure is going to 
inevitably be more complex and slower than the rather efficient 
kmap_atomic_pfn() thing we have.

[ The *non-atomic* kmap() functions are fairly high-overhead, in that they 
  want to keep track of cached mappings and remember page addresses etc. 
  So those are the ones we don't want to support for non-HIGHMEM setups. 

  But the atomic kmaps are pretty simple, and really only need some 
  trivial FIXMAP support. We could easily extend it for x86-64, methinks, 
  and do it for x86-32 even when we don't do HIGHMEM.

  Ingo? ]

One small detail: our we currently have "kmap_atomic_pfn()" and 
"kmap_atomic_prot()", and we really should maek the fundamental core 
operation be "kmap_atomic_pfn_prot()", and have everything be done in 
terms of that. Looking at it, it also looks like kmap_atomic_prot() is 
actually incorrect right now, and doesn't do a "prot" thing for 
non-highmem pages, but just returns "page_address(page);"

		Linus
--

From: Thomas Hellström
Date: Saturday, October 18, 2008 - 1:07 pm

Actually, a "kmap_atomic_prot_pfn()" has been lurking in the drm repos 
for some time now, but hasn't been suggested for upstream. It was 
intended for drivers that require quick in-kernel patching of 
write-combined io and highmem pages. The latter is a common situation 
for PCIE graphics devices with their own MMU, so IMHO an exported 
kmap_atomic_pfn_prot() would be a big help in such cases.




--

From: Keith Packard
Date: Saturday, October 18, 2008 - 1:20 pm

Yes, of course. The goal is to make it look exactly like
kmap_atomic_pfn, but work on non-memory pages even on non-HIGHMEM
configs for x86 (we have to use ioremap on those systems today, which is

I want it to work just like kmap_atomic_pfn; Venki wanted some way to
"know" that no-one else was mapping the pages in non-WC mode. This
seemed like a reasonable compromise; the 'mapping' object exists solely
to hold the desired prot value to keep all PTEs consistent across the

Fortunately, we use kmap_atomic_pfn only for our BAR, which is not a
non-highmem page.

Right now, we're counting on having our BAR covered by an MTRR register
so that we get WC access here. I'd like to have the API expose this so
that the driver will work on systems which don't have a spare MTRR for
the graphics aperture.

--=20
keith.packard@intel.com
From: Ingo Molnar
Date: Saturday, October 18, 2008 - 1:37 pm

agreed, and there's certainly no resistance from the x86 architecture 
side to extend our mapping APIs in such directions.

But i think the direction of the new GEM code is subtly wrong here, 
because it tries to manage memory even on 64-bit systems. IMO it should 
just map the _whole_ graphics aperture (non-cached) and be done with it. 
There's no faster method at managing pages than the CPU doing a TLB fill 
from pagetables.

The only real API need i see is on 32-bit: with a 1GB or 2GB graphics 
aperture we just cannot map that permanently, so kmap_atomic() is a 
necessity. We can certainly extend that to non-highmem as well.

But this should be an ad-hoc transitionary thing for 32-bit, and on 
64-bit we really should not be using any form of kmap.

Especially with large vertex buffers or textures, mapping a lot of pages 
via kmap is not going to be trivial overhead - even if INVLPG is faster 
than a full TLB flush, it's still on the order of 100-200 cycles - and 
with a lot of pages that mounts up quickly. And if i understood your 
workload correctly you want to do tens of thousand of map/unmap/remap 
events per frame generated - depending on the type of the 3D app/engine.

Or am i missing something subtle? Why do you want the overhead of kmap 
on 64-bit?

	Ingo
--

From: Keith Packard
Date: Saturday, October 18, 2008 - 2:51 pm

Yeah, we're stuck thinking that we "can't" map the aperture because it's
too large, but with a 64-bit kernel, we should be able to keep it mapped
permanently.

Of course, the io_reserve_pci_resource and io_map_atomic functions could

Yes, this is where exposing an io-specific atomic mapping function will

Yeah, data transfer from CPU to GPU is through a pwrite interface, and
we perform the transfer within the kernel using map/unmap operations on

We don't, but I think it would be nice to have a common API that works
across all 32-bit configurations as well as 64-bit systems.

--=20
keith.packard@intel.com
From: Ingo Molnar
Date: Saturday, October 18, 2008 - 3:32 pm

okay, so basically what we need is a shared API that does per page 
kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the 
impression that you were suggesting to extend kmap_atomic() to 64-bit - 
which would be wrong.

So, in terms of the 4 APIs you suggest:

  struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
                                             int bar,
                                             int prot);
  void io_mapping_free(struct io_mapping *mapping);

  void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
  void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);

here is what we'd do on 64-bit:

  - io_reserve_pci_resource() would just do an ioremap(), and would save 
    the ioremap-ed memory into struct io_mapping.

  - io_mapping_free() does the iounmap()

  - io_map_atomic(): just arithmetics, returns mapping->base + pfn - no 
    TLB activities at all.

  - io_unmap_atomic(): NOP.

it's as fast as it gets: zero overhead in essence. Note that it's also 
shared between all CPUs and there's no aliasing trouble.

And we could make it even faster: if you think we could even use 2MB 
TLBs for the _linear_ ioremap()s here, hm? There's plenty of address 
space on 64-bit so we can align to 2MB just fine - and aperture sizes 
are 2MB sized anyway.

Or we could go one step further and install these aperture mappings into 
the _kernel linear_ address space. That would be even faster, because 
we'd have a constant offset. We have the (2MB mappings aware) mechanism 
for that already. (Yinghai Cc:-ed - he did a lot of great work to 
generalize this area.)

(In fact if we installed it into the linear kernel address space, and if 
the aperture is 1GB aligned, we will automatically use gbpages for it. 
Were Intel to support gbpages in the future ;-)

the _real_ remapping in a graphics aperture happens on the GPU level 
anyway, you manage an in-RAM GPU pagetable that just works like an 
IOMMU, ...
From: Jon Smirl
Date: Saturday, October 18, 2008 - 3:47 pm

Is it possible to use a segment register to map the whole aperture on
32b? A segment register might allow common code on 64b/32b by



-- 
Jon Smirl
jonsmirl@gmail.com
--

From: Linus Torvalds
Date: Saturday, October 18, 2008 - 3:53 pm

No. Segment registers don't extend the virtual address space, they can 
only limit visibility into the one single 32-bit one.

IOW, segment registers don't actually extend addressing in any way, they 
only limit it. There's a reason why people don't use them (except as a 
strange base register for things like per-cpu or per-thread variables, and 
that is not to extend the address space, but to avoid wasting precious 
_useful_ registers on that).

			Linus
--

From: Keith Packard
Date: Saturday, October 18, 2008 - 5:38 pm

Yes, a one-level linear MMU which uses BIOS-reserved memory.

So, at least for a prototype, on 64-bit we can just use ioremap_wc and
hold the mapping while the driver is open? Is there any huge benefit to


I've got Venki lined up to do this work for me; once we're happy enough
with the API. In particular, the non-highmem 32-bit case seems a bit
tricky.

Also, does anyone have a better set of names for this stuff?
io_reserve_pci_mapping seems fairly ugly to me.

--=20
keith.packard@intel.com
From: Arjan van de Ven
Date: Saturday, October 18, 2008 - 6:06 pm

On Sat, 18 Oct 2008 17:38:11 -0700

how about pci_resource_force_caching(pci_dev, barnr, cachetype)?

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Keith Packard
Date: Saturday, October 18, 2008 - 6:15 pm

and then 'pci_map_atomic'? Not sure this makes sense in the pci
namespace.

--=20
keith.packard@intel.com
From: Ingo Molnar
Date: Monday, October 20, 2008 - 3:01 am

correct. There's no benefit to using the kernel mapping - and we can add 
2MB pages support to ioremap just fine and then you'll benefit from it 
automatically.

	Ingo
--

From: Keith Packard
Date: Saturday, October 18, 2008 - 9:14 pm

Here's what these functions would look like as wrappers on top of the
existing APIs:

/* x86_64 style */
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
        return (struct io_reserve *) ioremap_wc(base, size);
}

static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
        return ((char *) reserve) + offset;
}

static inline void
iounmap_atomic(void *vaddr)
{
}

static inline void
iomap_unreserve(struct io_reserve *reserve)
{
        iounmap(reserve);
}

/* HIGHMEM style */
#if 0
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
        return (struct io_reserve *) base;
}

static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
        offset +=3D (unsigned long) reserve;
        return kmap_atomic(offset >> PAGE_SHIFT, KM_USER0);
}

static inline void
iounmap_atomic(void *vaddr)
{
        kunmap_atomic(vaddr, KM_USER0);
}

static inline void
iomap_unreserve(struct io_reserve *reserve)
{
}
#endif

/* 32-bit non-HIGHMEM style */
#if 0
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
        return NULL;
}

static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
        return NULL;
}

static inline void
iounmap_atomic(void *vaddr)
{
}

static inline void
iomap_unreserve(struct io_reserve *reserve)
{
}
--=20
keith.packard@intel.com
From: Keith Packard
Date: Saturday, October 18, 2008 - 11:41 pm

Here's a patch for the i915 driver that includes the new API. Tested on
x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in the
i915 directory for this patch, but it should go elsewhere.

The new APIs are:

io_reserve_create_wc
io_reserve_free
io_reserve_map_atomic_wc
io_reserve_unmap_atomic
io_reserve_map_wc
io_reserve_unmap

I added the non-atomic variants at Eric's suggestion so that we can use
the direct map on x86_64, avoiding any use of ioremap at run-time. I
think the resulting code looks quite a bit cleaner now. Also, one
benchmark I tried ran about 18 times faster in 64-bit mode than the
original code.

=46rom 2f6b125cb586a0671a2b9c22aadb03fcafdf99ab Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@taka.keithp.com>
Date: Sat, 18 Oct 2008 22:59:58 -0700
Subject: [PATCH] [drm/i915] Create new 'io_reserve' API for mapping GTT pag=
es

The io_reserve API abstracts away the operations necessary to construct
mappings for our GTT aperture, providing atomic and non-atomic mappings for
GTT pages that work efficiently on x86_64 and x86_32+HIGHMEM configurations=
.

This eliminates the in-drive abuse of the kmap_atomic_pfn function as well
as improving GEM performance on x86_64 architecture.

Signed-off-by: Keith Packard <keithp@taka.keithp.com>
---
 drivers/gpu/drm/i915/i915_drv.h   |    3 +
 drivers/gpu/drm/i915/i915_gem.c   |   71 ++++++++++++----------
 drivers/gpu/drm/i915/i915_irq.c   |    4 +-
 drivers/gpu/drm/i915/io_reserve.h |  122 +++++++++++++++++++++++++++++++++=
++++
 4 files changed, 165 insertions(+), 35 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/io_reserve.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_dr=
v.h
index f20ffe1..c99b9ea 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
 #define _I915_DRV_H_
=20
 #include "i915_reg.h"
+#include "io_reserve.h"
=20
 /* General customization:
  */
@@ -246,6 +247,8 @@ typedef struct ...
From: Ingo Molnar
Date: Sunday, October 19, 2008 - 10:53 am

very nice!

I think we need a somewhat different abstraction though.

Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move 
to generic code.

Secondly, wouldnt the right abstraction be to attach this functionality 
to 'struct resource' ? [or at least create a second struct that embedds 
struct resource]

this abstraction is definitely not a PCI thing and not a 
detached-from-everything thing, it's an IO resource thing. We could make 
it a property of struct resource:

struct resource {
        resource_size_t start;
        resource_size_t end;
        const char *name;
        unsigned long flags;
        struct resource *parent, *sibling, *child;
+       void *mapping;
};

The APIs would be:

  int   io_resource_init_mapping(struct resource *res);
 void   io_resource_free_mapping(struct resource *res);
 void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
 void   io_resource_unmap(struct resource *res, void *kaddr);

Note how simple and consistent it all gets: IO resources already know 
their physical location and their size limits. Being able to cache an 
ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a 
relatively simple and natural extension to the concept.

i think that would be quite acceptable - and the APIs could just 
transparently work on it. This would also allow the PCI code to 
automatically unmap any cached mappings from resources, when the driver 
deinitializes.

Linus, Jesse, what do you think?

i think we need to finalize the API names and their abstraction level, 
and then could even merge those APIs into v2.6.28 on a fast path, to 
enable you to use it. It does not interact with anything else so it 
should be safe to do.

(i'd not suggest to merge the i915 bits just yet - but that's obviously 
not my call.)

	Ingo
--

From: Arjan van de Ven
Date: Sunday, October 19, 2008 - 11:00 am

On Sun, 19 Oct 2008 19:53:20 +0200

and making a simple wrapper around this that turns "struct pci_dev,
barnr" into a resource would make sense too, but yes.

We need one more

io_resource_force_cachability(struct resource, cachetype)

or maybe only

io_resource_force_writecombine(struct resource)




-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Eric Anholt
Date: Sunday, October 19, 2008 - 12:07 pm

This API needs the cacheability control, which I don't see in it
currently.  In the past we've been relying on an MTRR over the GTT
resulting in all of our UC- mappings getting us the correct WC behavior,
but now there aren't enough MTRRs to go around and graphics loses out
(at about a 20% CPU cost as a conservative estimate).  The primary goal
of the new API is to let us eat PAT costs up front so we don't have to
at runtime.

Second, we need to know when we're doing a mapping whether we're
affected by atomic scheduling restrictions.  Right now our plan has been
to try doing page-by-page
io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if
we fail at that at some point (map returns NULL or we get a partial
completion from copy_from_user_inatomic) then fall back to io_map_wc()
and copy_from_user() the whole thing at once.  That gets us good
performance on both x86 with highmem and x86-64, and not too shabby
performance on x86 non-highmem.

Also, while it's rare, there have been graphics cards (looking at you,
S3) where BARs were expensive for some reason and they stuffed both the
framebuffer and registers into one PCI BAR, where you want the FB to be
WC and the registers to be UC.  Not sure if they would be supportable
with this API or not.  And if it's not, I'm not sure how much we care to
design for them, but it's something to potentially consider.

Finally, I'm confused by the pfn and offset args to io_resource_map,
when I expected something parallel to ioremap but with our resource arg
added.


--=20
Eric Anholt
eric@anholt.net                         eric.anholt@intel.com


From: Ingo Molnar
Date: Monday, October 20, 2008 - 4:55 am

yes, these two should do the trick:

 int   io_resource_init_mapping_wc(struct resource *res);

that gets ugly very fast. I think we should not use atomic kmaps but 
NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically 
atomic kmaps but without the preemption restrictions). We could 
take/drop the mutex and statistically you'll stay on the same CPU and 

yes, this is a weakness of this API - you cannot mix multiple 
cachability domains within the same BAR.

and that can happen on non-graphics as well: some storage controller 
that has regular control registers in one portion of the BAR, which all 
need to be consistently accessed via UC and properly POST-ed - while it 
could also have some large mailbox structure at the end of the BAR, 
which could be mapped both cacheable or perhaps WC.

So ... i guess we can go back to the io_mapping API proposed by Keith, 
but not make it atomic kmap based but fixmap + mutex based - for good 

ok.

	Ingo
--

From: Ingo Molnar
Date: Monday, October 20, 2008 - 5:20 am

peterz pointed it out that we need to be careful with the TLBs in that 
case.

I think it's solvable: a small non-default special-case in switch_to() 
would invlpg any pending such page. (and no two such mappings could be 
held at the same time)

So as long as we dont leave the CPU we wont ever do TLB flushes, and the 
flush will be local even if we do.

	Ingo
--

From: Keith Packard
Date: Sunday, October 19, 2008 - 2:04 pm

I'm not sure I see any value in caching mappings here; we're mostly
interested in copying lots of data onto the card and so we use a lot of
different mappings; atomic mappings are easy to use, and efficient
enough.

Also, if we're using a resource, I'd like to just use byte offsets and

Let's figure out the API we want, then figure out whether it makes sense
to stick it into 2.6.28 or whether we just want to wait for .29. I don't
mind rewriting the driver for the next release.

I will probably use something like the io_reserve stuff for .28 though;
it makes the code easier to read and gives us good performance on 64 bit
kernels.

--=20
keith.packard@intel.com
From: Ingo Molnar
Date: Monday, October 20, 2008 - 4:58 am

yes but note that by caching the whole mapping on 64-bit we get 
everything we want: trivially lockless, works from any CPU, can be 
preempted at will, and there are no ugly INVLPG flushes anywhere.

you'll even get 2MB mappings automatically, if the BAR is aligned and 
sized correctly.

32-bit we should handle as well but not design for it.

	Ingo
--

From: Keith Packard
Date: Monday, October 20, 2008 - 8:49 am

I was assuming that on 64-bit, the map would be created at driver init
time and be left in place until the driver closed; if that's what you

As long as we get kmap_atomic-like performance, and we get to simplify
our code, I'm up for it.

--=20
keith.packard@intel.com
From: Ingo Molnar
Date: Wednesday, October 22, 2008 - 2:36 am

okay. So ... mind sending your io_mapping patch as a generic facility? 
It looks all good to me in its present form, except that it should live 

also, please send at least two patches, so that we can look at (and 
possibly merge) the generic facility in isolation.

	Ingo
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 12:14 am

The first patch in this series (assuming I'm driving git-send-email
correctly) adds the io_mapping API. I ended up creating a new
linux/io_mapping.h file as the kernel init code uses io.h and got very angry
when I tried to include linux/highmem.h from that. I'm afraid I gave up at
that point and just moved the code to a new file. 

The second patch switches the drm/i915 driver to the new API. Performance
improvements on 64-bit kernels are impressive as we were using the slow path
before and now get to take advantage of 64-bit wonderfulness.
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 12:14 am

Graphics devices have large PCI apertures which would consume a significant
fraction of a 32-bit address space if mapped during driver initialization.
Using ioremap at runtime is impractical as it is too slow. This new set of
interfaces uses atomic mappings on 32-bit processors and a large static
mapping on 64-bit processors to provide reasonable 32-bit performance and
optimal 64-bit performance.

The current implementation sits atop the existing CONFIG_HIGHMEM kmap_atomic
mechanism for 32-bit processors when present. When absent, it just uses
ioremap, which remains horribly inefficient. Fixing non-HIGHMEM 32-bit
kernels to provide per-CPU mappings ala HIGHMEM would resolve that
performance issue.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 Documentation/io-mapping.txt |   84 ++++++++++++++++++++++++++++
 include/linux/io-mapping.h   |  125 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 209 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/io-mapping.txt
 create mode 100644 include/linux/io-mapping.h

diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..ebf6dc5
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,84 @@
+The io_mapping functions in linux/io.h provide an abstraction for
+efficiently mapping small regions of an io device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as it would consume too much of the kernel address space.
+
+A mapping object is created during driver initialization using
+
+	struct io_mapping *
+	io_mapping_create_wc(unsigned long base, unsigned long size)
+
+		'base' is the bus address of the region to be made
+		mappable, while 'size' indicates how large a mapping region to
+		enable. Both are in bytes.
+
+		This _wc variant provides a mapping which may only be used
+		with the io_mapping_map_atomic_wc or ...
From: Keith Packard
Date: Thursday, October 23, 2008 - 12:14 am

Switch the i915 device aperture mapping to the io-mapping interface, taking
advantage of the cleaner API to extend it across all of the mapping uses,
including both pwrite and relocation updates.

This dramatically improves performance on 64-bit kernels which were using
the same slow path as 32-bit non-HIGHMEM kernels prior to this patch.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 drivers/gpu/drm/i915/i915_drv.h |    3 +
 drivers/gpu/drm/i915/i915_gem.c |   81 ++++++++++++++++-----------------------
 2 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f20ffe1..8ca5fbc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
 #define _I915_DRV_H_
 
 #include "i915_reg.h"
+#include <linux/io-mapping.h>
 
 /* General customization:
  */
@@ -246,6 +247,8 @@ typedef struct drm_i915_private {
 	struct {
 		struct drm_mm gtt_space;
 
+		struct io_mapping *io_mapping;
+
 		/**
 		 * List of objects currently involved in rendering from the
 		 * ringbuffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9255088..d38b052 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -177,14 +177,14 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
 		    struct drm_file *file_priv)
 {
 	struct drm_i915_gem_object *obj_priv = obj->driver_private;
+	drm_i915_private_t *dev_priv = dev->dev_private;
 	ssize_t remain;
-	loff_t offset;
+	loff_t offset, base;
 	char __user *user_data;
 	char __iomem *vaddr;
 	char *vaddr_atomic;
-	int i, o, l;
+	int o, l;
 	int ret = 0;
-	unsigned long pfn;
 	unsigned long unwritten;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
@@ -211,42 +211,38 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
 	while (remain > 0) {
 		/* Operation in this page
 		 *
-		 * i = page ...
From: Randy Dunlap
Date: Thursday, October 23, 2008 - 9:49 pm

And I wish you could lose that horrible (non-Linux kernel) style of function
return type on a separate line.

---
~Randy
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 11:26 pm

On Thu, 2008-10-23 at 21:49 -0700, Randy Dunlap wrote:

(A bunch of helpful edits for the io-mapping.txt file)

Thanks! They're in my tree and will get included in the next version of
this patch.

--=20
keith.packard@intel.com
From: Ingo Molnar
Date: Thursday, October 23, 2008 - 1:05 am

ah ... good call, i missed that mess. linux/io.h is indeed dependency 

heh, cool - the wonders of 64-bit x86 :-)

Any ballpark-figure numbers you can share with us?

	Ingo
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 8:39 am

For one quake-3 based game we use for performance regression checking,
64-bit kernels run about 18 times faster now. That's the difference
between using a zero-cost dynamic mapping and using ioremap_wc for each
page.

--=20
keith.packard@intel.com
From: Dave Airlie
Date: Monday, November 3, 2008 - 12:00 am

So I've put these patches into Fedora rawhide kernel to test, and
glxgears on my
945G hw went from 85fps to 380fps, clearly we would want these patches
upstream sooner
rather than later.

Or we at least need something in the i915 driver to get the speed
under GEM back up.

I'm happy to ship these patches in F10 GA if that makes any
difference, so I'd really like them upstream asap.

Dave.
--

From: Ingo Molnar
Date: Monday, November 3, 2008 - 3:48 am

yep, it's all lined up already in tip/core/resources, and got 
massively tested over the past few days. Will send a pull request to 
Linus tomorrow-ish - we need one final cleanup patch and then it's 
green to go.

	Ingo
--

From: Linus Torvalds
Date: Monday, November 3, 2008 - 9:36 am

I'm inclined to agree. Not that I think 380fps sounds very impressive (I 
get 850+ fps with _software_ rendering, for chissake), but because 85 fps 
is a joke, and clearly without this setup there's not even any point to 
try to do any other optimizations.

And if we're looking at the patches being in Fedora anyway, there really 
is no reason not to merge them. Even if it's out of the merge window.

Ingo?

			Linus
--

From: Ingo Molnar
Date: Monday, November 3, 2008 - 9:53 am

there was one pending item: i was waiting for the x86 #ifdef to be 
cleaned up out of include/linux/io-mapping.h, but that is trivial with 
no impact to any object code and can thus perhaps wait - would be nice 
to get this in early in this -rc.

Please pull the latest io-mappings-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git io-mappings-for-linus

 Thanks,

	Ingo

------------------>
Keith Packard (3):
      x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
      resources: add io-mapping functions to dynamically map large device apertures
      i915: use io-mapping interfaces instead of a variety of mapping kludges


 Documentation/io-mapping.txt     |   76 +++++++++++++++++
 arch/x86/include/asm/fixmap.h    |    4 +
 arch/x86/include/asm/fixmap_32.h |    4 -
 arch/x86/include/asm/highmem.h   |    5 +-
 arch/x86/mm/Makefile             |    2 +-
 arch/x86/mm/init_32.c            |    3 +-
 arch/x86/mm/iomap_32.c           |   59 +++++++++++++
 drivers/gpu/drm/i915/i915_drv.h  |    3 +
 drivers/gpu/drm/i915/i915_gem.c  |  174 +++++++++++++++++--------------------
 include/asm-x86/iomap.h          |   30 +++++++
 include/linux/io-mapping.h       |  118 ++++++++++++++++++++++++++
 11 files changed, 373 insertions(+), 105 deletions(-)
 create mode 100644 Documentation/io-mapping.txt
 create mode 100644 arch/x86/mm/iomap_32.c
 create mode 100644 include/asm-x86/iomap.h
 create mode 100644 include/linux/io-mapping.h

diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..cd2f726
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,76 @@
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as ...
From: Ingo Molnar
Date: Monday, November 3, 2008 - 10:29 am

These cleanups can now be found in the second tree below.

They can be cleanly pulled over the other tree i just posted, but it 
has slightly less testing. (the cleanup commits are fresh - but there 
should be no difference in functionality)

Please pull the latest io-mappings-for-linus-2 git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git io-mappings-for-linus-2

 Thanks,

	Ingo

------------------>
Keith Packard (5):
      x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
      resources: add io-mapping functions to dynamically map large device apertures
      i915: use io-mapping interfaces instead of a variety of mapping kludges
      io mapping: improve documentation
      io mapping: clean up #ifdefs


 Documentation/io-mapping.txt     |   82 ++++++++++++++++++
 arch/x86/Kconfig                 |    4 +
 arch/x86/include/asm/fixmap.h    |    4 +
 arch/x86/include/asm/fixmap_32.h |    4 -
 arch/x86/include/asm/highmem.h   |    5 +-
 arch/x86/mm/Makefile             |    2 +-
 arch/x86/mm/init_32.c            |    3 +-
 arch/x86/mm/iomap_32.c           |   59 +++++++++++++
 drivers/gpu/drm/i915/i915_drv.h  |    3 +
 drivers/gpu/drm/i915/i915_gem.c  |  174 +++++++++++++++++--------------------
 include/asm-x86/iomap.h          |   30 +++++++
 include/linux/io-mapping.h       |  125 +++++++++++++++++++++++++++
 12 files changed, 390 insertions(+), 105 deletions(-)
 create mode 100644 Documentation/io-mapping.txt
 create mode 100644 arch/x86/mm/iomap_32.c
 create mode 100644 include/asm-x86/iomap.h
 create mode 100644 include/linux/io-mapping.h

diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..473e43b
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,82 @@
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large ...
From: Jonathan Corbet
Date: Tuesday, November 4, 2008 - 3:36 pm

Should the documentation for this function (perhaps the
certainly-forthcoming kerneldoc comments :) mention loudly that this
function uses KM_USER0?  This isn't kmap(), and doesn't look much like
it; someday some developer might get an ugly surprise when they try to
use that slot simultaneously for something else.

jon
--

From: Ingo Molnar
Date: Wednesday, November 5, 2008 - 2:01 am

definitely worth a comment.

	Ingo
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 1:22 pm

We just ran some numbers on a box with PAT enabled and broken MTRRs.
Finally we have a test platform for the difference between kmap_atomic
and kmap_atomic_prot. Using regular kmap_atomic on this platform, we get
UC access to the graphics device; sending data from the CPU is a bit
slow. Adding kmap_atomic_prot_pfn and specifying WC access raises that
by a fairly significant factor, taking our CPU utilization for
copy_from_user from 40% to 2%.

Here's a patch which adds kmap_atomic_prot_pfn to the kernel for this
usage. When we add native io-mapping support instead of sitting on top
of kmap, we can remove this function.

I've reworked the io_mapping patches to use this function as well, along
with cleaning up the i915 code along the lines of Linus' current
version. I'll post those if this patch looks acceptable.

=46rom e3f633dcb36889fa85ea06cca339072df3c44ae0 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Thu, 23 Oct 2008 11:53:45 -0700
Subject: [PATCH] Add kmap_atomic_prot_pfn

kmap_atomic_prot_pfn is a mixture of kmap_atomic_prot and kmap_atomic_pfn,
accepting both a pfn instead of a page and an explicit protection value.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 arch/x86/mm/highmem_32.c  |   19 +++++++++++++++++++
 include/asm-x86/highmem.h |    1 +
 include/linux/highmem.h   |    1 +
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index bcc079c..91ae5e8 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -139,6 +139,25 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type =
type)
 }
 EXPORT_SYMBOL_GPL(kmap_atomic_pfn); /* temporarily in use by i915 GEM unti=
l vmap */
=20
+/* This is the same as kmap_atomic_prot() but can map memory that doesn't
+ * have a struct page associated with it.
+ */
+void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t =
prot)
+{
+	enum fixed_addresses idx;
+	unsigned long ...
From: Andrew Morton
Date: Thursday, October 23, 2008 - 1:38 pm

On Thu, 23 Oct 2008 13:22:12 -0700


Given that all highmem-implementing archtiectures must use the same
declaration here, we might as well put it into include/linux/highmem.h.
Although that goes against current mistakes^Wcode.

Does powerpc32 still implement highmem?  It seems that way.  You broke
it, no?
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 2:03 pm

The goal is to stop needing this function fairly soon and replace it


Powerpc32 doesn't have kmap_atomic_pfn either. Seems like the set of
HIGHMEM functions is not uniform across architectures.

--=20
keith.packard@intel.com
From: Linus Torvalds
Date: Thursday, October 23, 2008 - 2:24 pm

This really shouldn't be in highmem.h AT ALL.

The whole point of that function has absolutely nothing to do with 
highmem, and it *must* be useful on non-highmem configurations to be 
appropriate.

So I'd much rather create a new <linux/kmap.h> or something. Or just 
expose this from to <asm/fixmap.h> or something. Let's not confuse this 
with highmem, even if the implementation _historically_ was due to that.

			Linus
--

From: Keith Packard
Date: Thursday, October 23, 2008 - 6:50 pm

Right, I'd just like my io_mapping_map_atomic_wc to be able to rapidly
map random pages from my PCI BAR in WC mode. The code in your tree uses
kmap_atomic_pfn, which does the mapping, but use the wrong prot bits. On
a system with MTRR failure, this all ends badly, with factors of 20

Sure, we readily admit that we're abusing the highmem API. So we wrapped
that abusive code in a pretty package that can be re-implemented however
you desire.

How (and when) would you like to see this fixed?

--=20
keith.packard@intel.com
From: Linus Torvalds
Date: Thursday, October 23, 2008 - 7:48 pm

I'm not entirely sure who wants to own up to owning that particular part 
of code, and is willing to make kmap_atomic_prot_pfn() also work in the 
absense of CONFIG_HIGHMEM.

I suspect it's Ingo, but I also suspect that he'd like to see a tested 
patch by somebody who actually _uses_ this code.

So I would suspect that if you guys actually write a patch, and make sure 
that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to 
Ingo, good things will happen.

As to the "without CONFIG_HIGHMEM" part: making all of this work even 
without HIGHMEM should literally be a matter of:

 - remove the '#ifdef CONFIG_HIGHMEM' in <asm/fixmap_32.h>, so that we 
   have fixmap entries for the temporary kernel mappings regardless (ie 
   FIX_KMAP_BEGIN and FIX_KMAP_END).

 - move the code for the atomic accesses that use those fixmap entries 
   into a file that gets compiled regardless of CONFIG_HIGHMEM. Or at 
   least just kmap_atomic_prot_pfn() and kunmap_atomic().

and it probably should all work automatically. The kmap_atomic() stuff 
really should be almost totally independent of CONFIG_HIGHMEM, since it's 
really much more closely related to fixmaps than HIGHMEM. 

I guess there may be some debugging code that depends on HIGHMEM (eg that 
whole testing for whether a page is a highmem page or not), so it might be 
a _bit_ more than just moving code around, but I really didn't look 
closer.

Then, there's the issue of 64-bit, and just mapping everything there, and 
the interface to that. I liked the trivial extension to "struct resource" 
to have a "cached mapping" pointer. So if we can just make it pass 
resources around and get a page that way (and not even need kmap() on 
64-bit architections), that would be good.

It's too late for -rc1 (which I'm planning on cutting within the hour), 
and if it ends up being complex, I guess that it means this will eb a 
2.6.29 issue.

But if it really is just a matter of mostly just some trivial code 
movement and both ...
From: Benjamin Herrenschmidt
Date: Thursday, October 23, 2008 - 8:24 pm

I'm not that fan of carrying a mapping with a struct resource because if
we do that we should probably also refcount the mapping, and then there
is the whole question of mappings with different attributes, etc etc...

Cheers,
Ben.


--

From: Keith Packard
Date: Thursday, October 23, 2008 - 10:37 pm

I'm fine with sticking the mapping in a separate structure; it's just
the return from ioremap_wc on 64-bit systems, and nothing at all on
32-bit systems. I don't really see the advantage of moving it into the
resource, especially as we may need different access modes (UC vs WC)
for different portions of a single BAR.

We may want to add some bounds and access mode information to this
structure to check for broken drivers.

--=20
keith.packard@intel.com
From: Linus Torvalds
Date: Friday, October 24, 2008 - 7:53 am

Actually, on 32-bit, the 'prot' should be there, as should the starting 
physical page. Otherwise the two interfaces would be very odd, and you'd 
have to repeat those arguments in all callers (ie both in "prepare" and 
in the actual "access").

			Linus
--

From: Keith Packard
Date: Friday, October 24, 2008 - 8:45 am

I suppose. What I did instead was create _wc versions of both the
prepare and access functions to eliminate the need for additional data.
Either way is fine with me; I took the route which didn't require an
additional allocation.

--=20
keith.packard@intel.com
From: Keith Packard
Date: Thursday, October 23, 2008 - 9:29 pm

All of the kmap_atomic functions *do* work without CONFIG_HIGHMEM, they
just don't do what we want in this case. Without knowing the history, it
seems fairly clear that the kmap functions are designed to map physical
memory pages into the kernel virtual address space. On small 32-bit
systems, that's trivial, you just use the direct map (as one does on
64-bit systems). The magic fixmap entries make this work with
CONFIG_HIGHMEM as well.

So, I fear touching the kmap API as it appears to have a specific and
useful purpose, irrespective of the memory size the kernel is configured
for.

What I've proposed is that we create a new io-space specific set of
fixmap APIs. On CONFIG_HIGHMEM, they'd just use the existing kmap_atomic
mechanism, but on small 32-bit systems, we'd enable the fixmaps and have


As above, I think kmap_atomic should be left alone as a way of quickly
mapping memory pages. There are a users of both kmap_atomic_prot (xen)

The io_mapping API I proposed does precisely this. On 32-bit systems, it
uses kmap_atomic for each page access while on 64-bit systems it uses
ioremap_wc at device init time and then just uses an offset for each
page access.

Hiding this detail behind an API leaves the driver code independent of

If we do end up pushing this out to 2.6.29, I'd like to see
kmap_atomic_prot_pfn in place as a stop-gap so that PAT performance on
32-bit systems is reasonable. I don't think too many people are running
desktop systems without CONFIG_HIGHMEM at this point, and if so, we can

I'll try to get something working in the next day or so and see how it
looks. My plan at this point is to create new API for 32-bit systems:

void *io_map_atomic_wc(unsigned long pfn)
void io_unmap_atomic(void *addr);

With this, I can switch my existing io_mapping API over to an
io-specific interface and implement those using the fixmap code.

--=20
keith.packard@intel.com
From: Keith Packard
Date: Thursday, October 23, 2008 - 11:22 pm

Something like the following (yes, I know, this is missing the
include/asm-x86 rename)?

=46rom e7921809c72f940295311cfa6c300d5234ac96c1 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Thu, 23 Oct 2008 23:17:40 -0700
Subject: [PATCH] [x86_32] Add io_map_atomic using fixmaps

This steals the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic io resource mapping. These fixmaps are available
even with CONFIG_HIGHMEM turned off.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 arch/x86/mm/Makefile        |    2 +-
 arch/x86/mm/init_32.c       |    3 +-
 arch/x86/mm/iomap_32.c      |   59 +++++++++++++++++++++++++++++++++++++++=
++++
 include/asm-x86/fixmap.h    |    4 +++
 include/asm-x86/fixmap_32.h |    4 ---
 include/asm-x86/highmem.h   |    8 +++---
 include/asm-x86/iomap.h     |   30 ++++++++++++++++++++++
 include/linux/io-mapping.h  |   15 +++-------
 8 files changed, 104 insertions(+), 21 deletions(-)
 create mode 100644 arch/x86/mm/iomap_32.c
 create mode 100644 include/asm-x86/iomap.h

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 59f89b4..fea4565 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -1,7 +1,7 @@
 obj-y	:=3D  init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
 	    pat.o pgtable.o gup.o
=20
-obj-$(CONFIG_X86_32)		+=3D pgtable_32.o
+obj-$(CONFIG_X86_32)		+=3D pgtable_32.o iomap_32.o
=20
 obj-$(CONFIG_HUGETLB_PAGE)	+=3D hugetlbpage.o
 obj-$(CONFIG_X86_PTDUMP)	+=3D dump_pagetables.o
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8396868..c483f42 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -334,7 +334,6 @@ int devmem_is_allowed(unsigned long pagenr)
 	return 0;
 }
=20
-#ifdef CONFIG_HIGHMEM
 pte_t *kmap_pte;
 pgprot_t kmap_prot;
=20
@@ -357,6 +356,7 @@ static void __init kmap_init(void)
 	kmap_prot =3D PAGE_KERNEL;
 }
=20
+#ifdef CONFIG_HIGHMEM
 static void __init permanent_kmaps_init(pgd_t ...
From: Thomas Hellström
Date: Friday, October 24, 2008 - 12:33 am

Keith,

What you actually are doing here is claiming copyright on code that 
other people have written, and tighten the export restrictions. 
kmap_atomic_prot_pfn() appeared long ago in drm git with identical code 
and purpose, but with different authors, and iounmap_atomic is identical 
to kunmap_atomic.

Pls fix.

/Thomas




--

From: Ingo Molnar
Date: Friday, October 24, 2008 - 1:38 am

you want to use this facility in a binary-only driver?

	Ingo
--

From: Thomas Hellström
Date: Friday, October 24, 2008 - 2:19 am

At this point I have no such use for it, no.
The original user was a MIT style licenced driver.

/Thomas



--

From: Ingo Molnar
Date: Friday, October 24, 2008 - 2:32 am

okay, then just to put this question to rest: i wrote the original 
32-bit highmem code ~10 years ago. I wrote the first version of fixmap 
support - in fact i coined the term. I wrote the first version of the 
atomic-kmap facility as well.

All of that code is licensed under the GPLv2. So if anyone wants to make 
any copyright claims about highmem/kmap/fixmap derivative works, 
consider it in that light.

Regarding this new API variant that Keith wrote: it would be silly and 
dangerous to export it anywhere but to in-kernel drivers. The API 
disables preemption on 32-bit and rummages deep in the guts of the 
kernel as well, uses up a precious resource (fixmap slots), etc. It's 
internal and we eventually might want to deprecate forms of it and 
concentrate on the good 64-bit performance side.

	Ingo
--

From: Thomas Hellström
Date: Friday, October 24, 2008 - 4:04 am

I fully acknowledge that. The fact that others wrote almost all of the 
code is the reason why there is no  additional copyright claims in the 
file of  the original kmap_atomic_prot_pfn() implementation.

What I'm considering very bad form is someone taking other people's 
contributions, be it code or ideas,  no matter how small they are, claim 
copyright on them and restrict the original usage. That's really the 
point here. Frankly, from my own usage point of view I don't really care 
about the specific case (whether they are exported GPL or not), but with 
the current form of this patch, I'm basically no longer allowed to use 
These are all good arguments to reserve this api for in-kernel drivers.

There are other reasons why it should be available to out of tree drivers:

* The api enables a fast facility that will be extremely important by 
graphics drivers in the future, probably not only for in-kernel drivers.
Particularly as future graphics drivers will want to get fast 
write-combined kernel mappings also on highmem pages, not only io.
This latter actually suggests keeping the form kmap_atomic_prot_pfn 
instead of iomap_atomic_prot_pfn, and make it permanent, unless the same 
goals can be achieved by the VMAP work Nick Piggin is suggesting.

* The use of k[un]map_atomic() would, following your argumentation, be 
equally dangerous to export non-GPL?

Now, considering these pros and cons one might still come to the 
conclusion that for stability reasons it is best to keep this API for in 
kernel drivers. I don't really know enough about the affected kernel 
internals to be able to judge. I just think it's important that all 
facts are considered.

/Thomas





--

From: Keith Packard
Date: Friday, October 24, 2008 - 8:48 am

Yeah, I just stuck my usual license header on it and didn't think about
authorship. I'll fix that, once we figure out what the appropriate name
is.

But, as this code is clearly a trivial adaptation of the existing kernel
code, it should carry a GPLv2 license. I'm also not particular as to the
EXPORT restriction, I was just following the EXPORT advice given for the
other newly exposed kernel symbols we're using.

--=20
keith.packard@intel.com
From: Thomas Hellström
Date: Friday, October 24, 2008 - 3:18 am

Thanks, Keith.

If there is a chance that people who do want the EXPORT_SYMBOL_GPL 
restriction are OK to go with just EXPORT_SYMBOL(), I guess that would 
fit best considering what's already exported and doable in drivers today.

/Thomas








--

From: Ingo Molnar
Date: Friday, October 24, 2008 - 2:14 am

yeah. I already asked Keith yesterday to send one coherent bundle of 
patches and i think we've got all the code sent already, you also pulled 
the latest DRM tree - so it all just needs sorting out and testing. 

[ I can possibly test the end result with a bleeding-edge Xorg which 
  supports the new DRI APIs. (Whether X comes up fine is a regular, 
  necessary and 'fun' component of the x86 regression testing we do 


hm, the thing that i find the weakest aspect of that (despite having 
suggested this approach) is that the structure and the granularity of 
the resource tree is really controlled by the hardware environment. We 
try to map the hardware circumstances accurately at bootstrap / device 
discovery time, and keep it rather static from that point on. (modulo 
hotplug and dynamic BAR sizing)

And this static property of the resource tree is _good_ IMO, because we 
can think about it as a hardware property - not something tweaked by the 
kernel. (the kernel does tweak it when need to be or when the hardware 
asks for it, but it's more of an exception)

That means that if a driver wants to map just a portion of a BAR, 
(because the hardware itself compresses various different pieces of 
functionality into the same BAR), this abstraction becomes a limitation 
on usage.

And it might even be _impossible_ to use the simplest form of 
resource->mem_cache facility with certain types of hardware: say there's 
a cacheable and an uncacheable window within the same BAR - and we'd map 
the whole thing as cacheable. The CPU is then entitled to (and will most 
likely) prefetch into the uncacheable region as well, causing hw 
lockups, etc. [In this thread it was claimed that S3 chips have such 
properties.]

And tweaking this abstraction to cover those cases, for the ioremap to 
not be a full mapping of the resource looks a bit hacky/limiting as 
well:

 - we'd either have to add 'size', 'offset' properties to the window we 
   cache in the struct resource. (and possibly ...
From: Benjamin Herrenschmidt
Date: Thursday, October 23, 2008 - 8:21 pm

Well, on powerpc, we just went (or rather, Kumar just went) through the
oops of implementing fixmap and then kmap on top of it... just because
we wanted kmap_atomic functionality on non-highmem platforms :-) (and
the fixmap approach has some other interesting features).

So yes, I agree. Typically very useful for any 32-bit processor with a
memory mapped PCI Express config space since it's something like 256M of
virtual space to map it all iirc.

Cheers,
Ben.

--

From: Ingo Molnar
Date: Monday, October 20, 2008 - 3:10 am

the downsize would be that we'd attach a runtime property to the 
IORESOURCE_MEM resource tree - which is a fairly static thing right now, 
after the point where we finalize the resource tree. (modulo 
device/bridge hotplug variances)

Another downside is that we might not want to map the whole thing. I.e. 
the structure of the IO memory space we want to map by drivers might be 
different from how it looks like in the resource tree.

the concept of introducing resource->mapping does not feel _that_ wrong 
though and has a couple of upsides: it could act as a natural mapping 
type serializer for example and drivers wouldnt have to explicitly 
manage ioremap results - they could just use the resource descriptor 
directly and "read" and "write" to/from it. readl/writel could be 
extended to operate on the resource descriptor transparently, getting 
rid of a source of resource mismatches and overmapping, etc. etc. We 
could even safety check IO space accesses this way.

and we'd get rid of the complication that your APIs introduced, the need 
to introduce a separate io_mapping type, etc.

Dunno, i might be missing some obvious downside why this wasnt done like 
that until now.

	Ingo
--

From: Yinghai Lu
Date: Saturday, October 18, 2008 - 9:28 pm

we could expand init_memory_mapping to support direct map for them...
with needed prot.
or use set init_memory_mapping() + set_memory_wb() directly.

but that is only for 64bit x86

also someone is talking about to have 6 pcie display adapters on 64bit
system. and every card will have 4g ram.

32bit could use fix map for 1G or 2G mapping.

YH
--

From: Nick Piggin
Date: Saturday, October 18, 2008 - 8:14 pm

TLB shootdown need only be implied if the behaviour required is to
unmap the virtual address *and* cause any other CPU that subsequently
touches it to fault.

For kva, that would be a bug anyway (use after free). The only thing
it implies is that a TLB shootdown happens at some point before the
address get reused.

Still, it's always going to be faster than a global mapping, if done
properly. I was thinking about doing a vmap_atomic thing generically
in the vmap layer... why exactly do we need the FIXMAP stuff for it?
--

Previous thread: [GIT PATCHES for 2.6.28] V4L/DVB updates by Mauro Carvalho Chehab on Friday, October 17, 2008 - 2:20 pm. (1 message)

Next thread: [Slightly off topic] A question about R/B trees. by Maxim Levitsky on Friday, October 17, 2008 - 2:34 pm. (7 messages)