Re: [2.6.25-git18 => 2.6.26-rc1-git1] Xorg crash with xf86MapVidMem error

Previous thread: [PATCH] ata: ahci: power off unused ports by Kristen Carlson Accardi on Thursday, May 8, 2008 - 7:10 pm. (40 messages)

Next thread: regression fixed by using pci=rom by Dave Airlie on Thursday, May 8, 2008 - 7:54 pm. (10 messages)
To: H. Peter Anvin <hpa@...>
Cc: Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 8, 2008 - 7:40 pm

Also, note that we only look for start while looking at fixed range MTRRs.
This is not as scary as it seems. We are finding the effective memory type
by just looking at the start of the address range. We still go through
the PAT reserve free mechanism, once we find the effective memory type
and that list will catch any other users with conflicting type anywhere
in the start to end range. And we will still keep effective type consistent
across all mappings.

For example:
0xf0000000-0xf1000000 is mapped WC by one user with appropriate MTRR setting
0xf2000000-0xf3000000 is mapped UC by another user with appropriate MTRR setting

Now if there is a request to map
0xf0000000-0xf4000000, we do mtrr_lookup and assume effective type for the
whole range is WC. Then we go through PAT list to see any conflicts. And
while parsing the list, we will find there is an overlap with conflicting
attributes and we fail the reserve.

In case the second UC mapping above is not present, we use effective memory
type for the whole range 0xf0000000-0xf4000000 as WC and track it in our list
that way.

Thanks,
Venki
--

To: Venki Pallipadi <venkatesh.pallipadi@...>
Cc: Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 8, 2008 - 7:54 pm

I don't believe that is true in the slightest. You can iterate over the
variable MTRRs and see if any part of them overlaps the target range;
doing exhaustive enumeration is clearly bogus, especially on 64-bit

So what you're saying here is "it's bogus, but it doesn't really matter
anyway?" Why bother having it at all, then?

Seriously, if it's not unconditionally correct, then:

a. it should be *clearly* labelled a heuristic.
b. it should be *clearly* explained why having the heuristic is much
better than not having anything.

In this case, neither of those conditions appear to be addressed.

-hpa
--

To: H. Peter Anvin <hpa@...>
Cc: Venki Pallipadi <venkatesh.pallipadi@...>, Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 8, 2008 - 8:34 pm

What I meant was:
MTRRs are not really base and size. They are defined as base and mask.
Any addr is affected by mtrr if addr & mask == base & mask.
So, MTRR entry like
base = 0xf00000, mask = 0xff00000 with 36 bit physical address covers
0xf00000-0xffffff, 0x10f00000-0x10ffffff, 0x20f00000-0x20ffffff, ....

In this case if user is trying to mmap 0x1a000000-0x2a000000, we cannot really
cover this case with single parsing of variable address ranges. We will have
to go through the sub-ranges withing single variable range, which can be page

Agree that is has to be called a heuristic. Yes. Not having that will work
may be not as optimially. Having it gives us better starting point when we
start to find a proper memory type for requests, especially when there are
no other overlapping mappings in PAT list.

One of the reason for heristic was to get proper type for /dev/mem maps for WB
region (ACPI and BIOS area). /dev/mem checks to see the mtrr type of the start
address and starts with that type for the request. As long as there are no
other conflict in PAT list, we can give WB to this /dev/mem request. Not
having this heuristic we will have to most probably default to UC.

When there are overlapping PAT list entries, the mtrr_lookup does not matter
as we have to inherit things from those PAT entries or conflicting with them.

This discussion points to - code is not sufficiently commented and/or needs
some refactoring. Will look at this afresh tomorrow morning and see whether
I can some up with some better alternative.

Thanks,
Venki
Thanks,
Venki
--

To: H. Peter Anvin <hpa@...>
Cc: Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 29, 2008 - 3:01 pm

This is an alternate solution to the problem in this thread. The change
clarifies what is expected out of mtrr_lookup in PAT code and how we handle
failure.

Thanks,
Venki

Alternate patch to clarify the usage of mtrr_lookup() in PAT code, and to make
PAT code resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under
what conditions we look for mtrr hint. pat_x_mtrr_type() uses a default
type when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup for
request type of '-1' and also defaults in a sane way on any mtrr lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API:
(1) hasn't specified any type (/dev/mem mapping) and we do not want to take
performance hit by always mapping UC_MINUS. This will be the case
for /dev/mem mappings used to map BIOS area or ACPI region which are WB'able.
In this case, as long as MTRR is not WB, PAT will request UC_MINUS for such
mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or WC.
In this case, PAT can map as WB (without checking MTRR) and still effective
type will be UC or WC. But, a subsequent request to map same region as
UC or WC may fail, as the region will get trackked as WB in PAT list. Looking
at MTRR hint helps us to track based on effective type rather than what user
requested. Again, here mtrr_lookup is only used as hint and we fallback to
WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the memtype
list to make sure there are no inconsistencies among multiple users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
arch/x86/mm/pat.c | 49 ++++++++++++++++++++++++-------------------------
1 file changed, 24 insertions(+), 25 deletions(-)

Index: linux...

To: Venki Pallipadi <venkatesh.pallipadi@...>
Cc: H. Peter Anvin <hpa@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 29, 2008 - 5:51 pm

Hi Venki,

I try your patch above and it works fine for my config, as your previous
patch.

Thanks for your help.

Regards.
--

To: Rufus & Azrael <rufus-azrael@...>
Cc: Venki Pallipadi <venkatesh.pallipadi@...>, H. Peter Anvin <hpa@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Saturday, May 31, 2008 - 4:10 am

thanks - i've applied Venki's fix to tip/x86/urgent.

Ingo
--

To: Venki Pallipadi <venkatesh.pallipadi@...>
Cc: Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 8, 2008 - 8:52 pm

Okay, now here is something...

I don't believe it is kosher for mmap() to behave differently if we
called it once or called it twice.

Therefore, the proper behaviour in the case of a range that covers
multiple types is to break it down into multiple requests each mapped
with the appropriate type, as if mmap had been called separate on each
subrange. In the limiting case, yes, this does mean the subranges can
be individual pages, but such an outcome is vanishingly improbable in
practice.

-hpa

--

To: H. Peter Anvin <hpa@...>
Cc: Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Friday, May 9, 2008 - 9:46 am

May be I am missing something. Only time mmap will behave differently is
when some of the pre-existing conditions are different. If the MTRRs are
different and/or there has been conflicting mapping in between first and
second mmap. Otherwise, it will be same. We had to do this inheriting
Business if there are other conflicting maps to have compatibility with

We thought about splitting the range. But, in general the interface
requires
us to give one single type of access back. Interfaces like pci accesses
in /sysfs. If user maps the file with mmap, we give one particular type
of mapping.
returning multi type may be a possibility, but that will mean whole new
level of complications in tracking, splitting the vmas, returning this
subrange mapping info to user in some way etc.

Sub-range of a page is probably improbable. But overlaps are common.
That is exactly what is happening with the issue in this thread.
Part of the region is mmaped first (it will get UC_MINUS) and MTRR setup
for WC.
Later a bigger region that includes the first smaller part is being
mmaped. We can
safely return UC_MINUS to this bigger request (without causing any
aliases) and things
will work fine as this bigger mmap does its own MTRR for WC as well.
This is the case
where I feel just maintaining the consistency across the PAT list (which
is UC_MINUS)
here is safe irrespective of any MTRR setting we may have underneath.
Worrying about
MTRR ranges here doesn't seem to add any additional safety-ness.

Thanks,
Venki

--

To: Venki Pallipadi <venkatesh.pallipadi@...>
Cc: Rufus & Azrael <rufus-azrael@...>, Ingo Molnar <mingo@...>, Siddha, Suresh B <suresh.b.siddha@...>, Linux-kernel Mailing List <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Thomas Gleixner <tglx@...>
Date: Thursday, May 8, 2008 - 8:49 pm

In practice, though, such MTRRs are never seen. Even in the presence of
such pathological MTRRs, I'm sure one can figure out a *much* smarter
overlap algorithm. I'd have to sit down and think about it, but I'm
pretty sure one could; the basic observation, though, is that any set
mask bit that are in a position <= floor(log2(range_len))-1 don't matter
at all, since the range WILL end up covering both the 0 and the 1 case
in this bit position. At this point, you can test only a limited number
of points (I *believe* you can reduce it down to only the beginning and
the end, but I haven't proven that, so don't count on it yet.)

The key, of course, is to look for the case of multiple MTRRs matching
the range.

-hpa
--

Previous thread: [PATCH] ata: ahci: power off unused ports by Kristen Carlson Accardi on Thursday, May 8, 2008 - 7:10 pm. (40 messages)

Next thread: regression fixed by using pci=rom by Dave Airlie on Thursday, May 8, 2008 - 7:54 pm. (10 messages)