Re: Unable to continue testing of 2.6.25

Previous thread: 2.6.25-rc[12] Video4Linux Bttv Regression by Bongani Hlope on Sunday, February 17, 2008 - 4:36 am. (23 messages)

Next thread: uli526x link problem solved, fix not on 2.6.25-rc2 by Santiago Garcia Mantinan on Sunday, February 17, 2008 - 5:42 am. (2 messages)
To: <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>
Date: Sunday, February 17, 2008 - 5:25 am

Yesterday, after spending quite a few hours over the last days on bisecting
some serious regressions and finding workarounds for them, I thought I
could start using 2.6.25-rc2 as the new kernel for my desktop.
Unfortunately I found that I cannot because it would make my other main
activity - working on the Debian installation system - impossible.

For my work on the Debian Installer I heavily rely on emulators to run test
installs and ATM my emulator of choice is VirtualBox (the fully open "ose"
version). This requires the vboxdrv kernel module, but unfortunately:
vboxdrv: Unknown symbol change_page_attr

At first I traced this to:
e1271f68
x86: deprecate change_page_attr() for drivers
With the introduction of the new API, no driver or non-archcore code needs
to use c-p-a anymore, so this patch also deprecates the EXPORT_SYMBOL of
CPA (it's a horrible API after all).
which had:
-EXPORT_SYMBOL(change_page_attr);
+EXPORT_UNUSED_SYMBOL(change_page_attr); /* to be removed in 2.6.27 */

Which seemed entirely reasonable but left me wondering about the error I
got.

But then I found:
d1028a15
x86: make various pageattr.c functions static
change_page_attr_add is only used in pageattr.c now, so we can
make this function static.
change_page_attr() isn't used anywere at all anymore; this function
is a really bad API anyway so just remove the bloat entirely.

Which removed the entire function (without even properly mentioning it in
the shortlog).

OF COURSE it is up to the VirtualBox developers to adjust to the new
interface (and based on past experience I expect they will with their next
version). And it may very well be that they were totally braindead to use
the function in the first place. I don't know and I really don't care.
The important fact for me is that I can no longer use a piece of software
that is essential to me and thereby lose the motivation to do any work on
the kernel.

Lesson of the day: thinking only about in-kernel users of published ...

To: Frans Pop <elendil@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Sunday, February 17, 2008 - 4:46 pm

On Sun, 17 Feb 2008 10:25:30 +0100

the initial plan was for a depreciation period. Sadly it was untenable since the API
was changing entirely to fix bugs and add a really important feature
(the ability to clflush the exact range rather than wbinvd'ing the caches of all cpus in the system),
at which point we had to pull the function right away.

As for virtualbox; a fix for that already exists, and it consists of removing 25 lines of workarounds
for c-p-a bugs, and just using set_memory_x / set_memory_nx.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 8:31 am

Just for the record: I posted full patches to implement clflush
support some time ago without changing any exported API. So your
claims that changing the API was needed to implement CLFLUSH are not
correct.

Also I believe some assumptions behind the new API are faulty (in
particular that the caller doesn't fully own the to be changed pages)
and make it actually impossible to implement the cache attribute PTE
changing operation fully correct according to the Intel x86 manual
(which requires temporary unmap)

-Andi
--

To: Andi Kleen <andi@...>
Cc: Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 12:50 pm

On Mon, 18 Feb 2008 13:31:48 +0100

yeah of course it is possible to make things "smart" by having hidden state.

the Intel x86 manual explicitly only has a temporary unmap when going from a
cached state to a write-combining state. Any other transition does not require
an unmap. Which makes this not impossible, all a cached->WC transition needs
to do is go via an intermediate UC state and the really expensive process from
the manual is not needed.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 1:11 pm

Not sure how you can call global_flush_tlb() "hidden". It was a quite
exposed interface.

Anyways there was also no principle reason in the old interface why
the flush couldn't have been done immediately. The only reason
it wasn't done was that Linus long ago asked for separate

Ok then you're proposing to use a even more expensive operation just
to patch this over. I guess that will work as long as we assume
none of the callers cares too much about performance, but trying to describe
it as an improvement is quite a stretch.

-Andi

--

To: Andi Kleen <andi@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 1:32 pm

On Mon, 18 Feb 2008 18:11:59 +0100

I've yet to see a user who wants WC. Lets face it, WC *sucks*. This is why
the folks who care about performance (the graphics guys) stopped using it.
WC is slow, and on modern cpus leads to really bad performance. I'm really
half tempted to just ignore WC entirely and suggest that we don't even implement
it in the kernel. Yes it's really that bad.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 2:53 pm

> I've yet to see a user who wants WC. Lets face it, WC *sucks*. This is why
> the folks who care about performance (the graphics guys) stopped using it.
> WC is slow, and on modern cpus leads to really bad performance. I'm really
> half tempted to just ignore WC entirely and suggest that we don't even implement
> it in the kernel. Yes it's really that bad.

I know of one case at least where WC is very useful. Some InfiniBand
adapters allow small messages to be written directly into the
adapter's PCI space BAR to lower latency (having the CPU write the
message avoids doing something like build descriptor, ring doorbell
register on adapter, adapter DMA message out of CPU memory). And
mapping the PCI space with WC is a pretty big win -- for example for
mlx4 hardware it gets MPI latency from ~1.8 usec to ~1.3 usec which is
a big deal. I think most real users of mlx4 hardware are using a
hacky out-of-tree patch to allow using PAT to set WC.

AFAIK mapping PCI memory WB is not allowed, so WC is really our only choice.

- R.
--

To: Roland Dreier <rdreier@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 3:07 pm

On Mon, 18 Feb 2008 10:53:42 -0800

afaik that depends on the BAR being prefetchable or not.

(and by your argument, ioremap_cached() would not be useful, and since that was, until
2.6.25-rc1, the default behavior for ioremap(), would have caused massive problems)

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 3:18 pm

> > AFAIK mapping PCI memory WB is not allowed, so WC is really our only
> > choice.

> afaik that depends on the BAR being prefetchable or not.

In my case the BAR is prefetchable.

> (and by your argument, ioremap_cached() would not be useful, and since that was, until
> 2.6.25-rc1, the default behavior for ioremap(), would have caused massive problems)

I'm not sure what ioremap_cached() would really do in my case, since
the MTRRs for PCI memory are set to UC, so without monkeying with MTRR
contents (which can't really be done safely) the only choices we have
are leaving the mapping as UC or using PAT to get WC.

Also in my case I'm more concerned about latency of finishing a small
write rather than througput. So I'm not sure that I would really want
to do a write to a WB mapping followed by CLFLUSH anyway.

- R.
--

To: Roland Dreier <rdreier@...>
Cc: Arjan van de Ven <arjan@...>, Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 3:42 pm

Even if the BAR is prefetchable, on some platforms mapping MMIO space

thanks,
suresh
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 2:40 pm

WC for main memory or WC for mmio spaces ?

--

To: Alan Cox <alan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 2:52 pm

On Mon, 18 Feb 2008 18:40:50 +0000

if you mean 'framebuffer' with mmio space, they stopped using it there
as well afaik.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 4:15 pm

And also bursting commands to mmio fifos - eg the 3Dfx where it makes a
*huge* difference.

Alan
--

To: Arjan van de Ven <arjan@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 2:52 pm

> I've yet to see a user who wants WC. Lets face it, WC *sucks*. This is why

I didn't know this. What do they do instead?

I understand that WC was added originally because AGP was really slow
at IO towards the CPU. You mean on PCI-E it is fast enough now

At least the X server still uses it. In fact there are already some
performance regression regarding this from differing kernel behavioun
in the sysfs interfaces vs /dev/mem.

What would you recommend should the X server use instead? Always
map standard WB? How about on older AGP systems?

-Andi

--

To: Andi Kleen <andi@...>
Cc: Arjan van de Ven <arjan@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 3:18 pm

does this refresh your memory:

http://lkml.org/lkml/2008/1/10/99

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Andi Kleen <andi@...>, Arjan van de Ven <arjan@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 5:35 am

I'm still far from convinced it covers all cases (see also Roland's example)
Arjan might be right for some modern graphics hardware, but there is
a lot more hardware out there than only this, old and new and non graphics
and graphics that Arjan didn't cover.

If Arjan was 100% right then PAT would not be needed at all. I wonder
why Venki/Suresh went through all the pain of resurrecting the old
PAT patchkit then!? Surely there is more about this.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Andi Kleen <andi@...>, Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Monday, February 18, 2008 - 2:32 pm

On Mon, 18 Feb 2008 19:52:28 +0100

they use a cached mapping and use clflush for the cases they want to be sure
the GPU sees the data. This turns out to be faster than WC.

since then the graphics programming paradigm has changed as well;
WC is really bad for reading data no matter what; it focuses on group writes
so that you don't get one transaction per write, but reads are extremely slow.
And apparently in the current graphics systems (with "composite" and the like)
all of this stuff doesn't get used anymore. Instead they use cached mappings,

depends on which driver. The new generation Intel graphics ones don't.
I doubt the proprietary ones do either, those are even more performance tuned.
I understand that the radeon driver is doing or going to do something similar to the

the X drivers need to do what is best; the ones with acceleration use WB nowadays.
(For the Intel driver that is somewhat recent, so it could be that your distro
doesn't do that yet).

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Frans Pop <elendil@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Sunday, February 17, 2008 - 9:16 am

I get your problem, but you are looking in the wrong direction for
a solution.

The real problem is that the kernel seems to lack functionality you
require for doing some work.

Why does your work on the Debian Installer depend on VirtualBox and

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 5:55 pm

Work on the installer does not so much hard depend on VirtualBox (or any
other emulator), but can be done much more effectively and efficiently
using one.

It allows me to run the installer inside the emulator without the need for a
second computer (and using the same keyboard). It allows me to take
snapshots just before stages I'm interested in and then add debugging or
try changes. If what I tried does not work, I can just revert to the
snapshot and try something else without having to run the full installation
from scratch. It allows me to easily test RAID setups without having
multiple physical disks. Etc.

The fact that VirtualBox does not (yet) work for me with 2.6.25 means that
I'm unable to run 2.6.25 on my main desktop and do any real work. Rebooting
my desktop whenever I want to use VirtualBox is just not a realistic
option.
That in turn means that I cannot do any more testing of 2.6.25, because most
of the testing I do consists of just using a new kernel and critically
observing the behavior of my system. As I've caught a fair number of bugs
that way for the past few releases, I'd say that's a useful contribution.
--

To: Frans Pop <elendil@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 6:19 pm

The "or any other emulator" is exactly where my question is directed at.

Xen, KVM or even qemu come into my mind, but considering how loudly you
complained about a temporary breakage for VirtualBox there must be a
reason why your work on the Debian Installer can only be done
effectively and efficiently with an emulator module that has AFAIK not
been submitted for inclusion in the kernel.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 6:49 pm

- Xen is currently not supported by the kernel Debian Installer uses (though
work is being done to change that.
- KVM AFAIK requires hardware support that I don't have.
- QEMU is completely useless because of its slow speed without the (also out
of tree) kqemu module, which does not work when the host system is x86_64
[1]. Also, I very much prefer the VirtualBox user interface over what qemu
has to offer.
- I've actually used VMWare for a long time (licenced), but stopped after
the 5 series stopped working with current kernels and around that time
VirtualBox became available as an alternative.

Hope that explains.

Cheers,
FJP

[1] http://bugs.debian.org/444160
--

To: Frans Pop <elendil@...>
Cc: Adrian Bunk <bunk@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 6:15 pm

And it will be just as valuable when you stay with 2.6.24 finding bugs
there.

And once VirtualBox has been updated to work with 2.6.25, you can test
2.6.25.

Harvey

--

To: Frans Pop <elendil@...>
Cc: Adrian Bunk <bunk@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Tuesday, February 19, 2008 - 5:59 pm

On Tue, 19 Feb 2008 22:55:01 +0100

I assume you've read the other mails right and went to the virtualbox form
to get the small patch to make it work on .25 right?

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Adrian Bunk <bunk@...>
Cc: Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Sunday, February 17, 2008 - 3:24 pm

No, that's not the real problem. Even if the kernel didn't lack
any required functionality and it could all be done today without
VirtualBox, pulling the rug from underneath it like that leaves
all those who are currently relying on it without the ability to
continue testing newer kernels until they find the time to redesign
their working environment. Sticking to the removal schedule could
have avoided that.

HTH
T.

--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=C3=B6ffnet mindestens haltbar bis: (siehe R=C3=BCckseite)

To: <tilman@...>
Cc: <bunk@...>, <elendil@...>, <linux-kernel@...>, <torvalds@...>
Date: Sunday, February 17, 2008 - 10:33 pm

From: Tilman Schmidt <tilman@imap.cc>

No, it is VirtualBox's problem. Nobody outside of the kernel
should be using that symbol, it can only be used for totally
unsupportable things as far as upstream is concerned.
--

To: David Miller <davem@...>
Cc: <tilman@...>, <bunk@...>, <elendil@...>, <linux-kernel@...>, <torvalds@...>
Date: Monday, February 18, 2008 - 8:27 am

Creating any uncacheable or write protected mappings in unsupportable?
I suspect you would have a hard time actually justifying this. That's
a relatively common and useful operation in device drivers.

The only special case I would agree with you is doing write-combined
mappings using this which is indeed generally unsupported yet on x86
before the full PAT infrastructure goes in (but
I cannot imagine VirtualBox would use WC for anything)

Most likely if they just want to write protect something
they should either use the new interface or just open code
it using lookup_address() / change ptes / flush tlbs.

-Andi

--

To: David Miller <davem@...>
Cc: <bunk@...>, <elendil@...>, <linux-kernel@...>, <torvalds@...>
Date: Monday, February 18, 2008 - 7:40 am

Then why was it exported in the first place?

Still, we are talking about two completely different types of
problem here: (a) the technical problem of why something stopped
working and how it should be fixed and (b) the practical problem
of users left standing in the rain while such fixing is taking
place. While (a) might look like the only real problem to kernel
developers, (b) often feels much more real to users.
Unfortunately, discussing alternative solutions to (a) does
little to solve (b).

The conventional way of addressing (b) is to avoid it altogether
by giving sufficient advance warning before removing a feature,
and sticking to the feature removal schedule. If (as apparently
in this case) that isn't possible, then other ways to mitigate
the problem need to be found. That may require switching off
some SEP field generators, though. :-)

HTH
T.

--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)

To: Tilman Schmidt <tilman@...>
Cc: Frans Pop <elendil@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Sunday, February 17, 2008 - 3:44 pm

Frans said that he requires VirtualBox for his work on the Debian
Installer.

If the kernel would offer everything Frans needs for his work on the
Debian Installer he wouldn't have sent his email.

So let's fix the problem (kernel lacks functionality) and not the

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: <tilman@...>, <elendil@...>, <linux-kernel@...>, <torvalds@...>
Date: Sunday, February 17, 2008 - 4:38 pm

That's the problem as understood by Adrian.

I hear another problem as well ...

That seems plain enough to me. It's not just the lack of functionality,
but that such lack apparently happened with too little warning.

If this is a fair representation of what happened, then seems to me that
we could have left that EXPORT_UNUSED_SYMBOL(change_page_attr) in place
a bit longer.

Or at least, if we really did have to make Frans life difficult like
this, we could offer more appreciation for how things might look from
his perspective.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--

To: Paul Jackson <pj@...>
Cc: Adrian Bunk <bunk@...>, <tilman@...>, <elendil@...>, <linux-kernel@...>, <torvalds@...>
Date: Sunday, February 17, 2008 - 4:51 pm

On Sun, 17 Feb 2008 14:38:51 -0600

it's not a fair repersentation. Again.. this export was unkeepable due to the
API being nasty and having to be fixed anyway ;(.

One of the problems was that the c-p-a api has to be followed by a cache flush function call.
Sadly that does a TOTAL flush of the caches of all cpus in the system. As part of the -rc1
changes, it is now done only on the exact pages that need to be flushed (so you no longer
flush 12Mb of caches when you only needed to flush 4Kb), but to achieve that, it was no longer
an option to keep this as 2 separate function calls.
Add to this that some very fundemanteal bugs couldn't be fixed without the function underlying

I understand where he's coming from; at the same time it's a very small change to virtualbox to fix this
and has been done already... in minutes. Frans should take that up with the virtual box support forum, I'm sure
they have the patch available there. (it's mostly removing workarounds for cpa bugs and then just calling set_memory_x / set_memory_nx).

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

To: Arjan van de Ven <arjan@...>
Cc: Paul Jackson <pj@...>, Adrian Bunk <bunk@...>, <tilman@...>, <linux-kernel@...>, <torvalds@...>
Date: Tuesday, February 19, 2008 - 6:41 pm

That's a much better explanation than can be found in the changelog.

The changelog effectively says: this commits makes a few (new) functions
static; oh, and by the way, let's delete the old function _because it is
unused and ugly_.

Well, I've shown that the first is not true and the second by itself is
absolutely not sufficient reason to remove an exported function that has
long been part of the kernel.
I would probably have started this thread differently if the removal had
been done in a separate commit and had included the explanation you give

The problem here is that it may be a simple change for you and other

I did actually manage to create such a patch for the breakage in the
VirtualBox source because of 2.6.24, but those really were very minor
issues (basically overlapping definitions). I posted that on their user
list (to which I am subscribed), but I never saw any response to it.

I also don't think it is really realistic to expect Innotek to provide
patches for unreleased kernel versions. Maybe some other user could provide
me with the patch, but I'm not as confident as you are.

My point still is that I feel they should not have to. That it's also the
kernel developers responsibility to ensure backwards compatibility or at
least a grace period for conversion whenever possible.
And IMO that not only goes for user-space API, but also for interfaces used
by out-of-tree kernel modules.

Is it really impossible in this case for example to rewrite the old function
so that it becomes a wrapper around the new interface? If it is impossible,

I haven't seen anything on the user mailing list yet...

If you can give me a link to that patch, I'd be grateful. As I said, I'm
subscribed to their user list but have not seen any patch there.
I've also googled for it, without result.
I've also just quickly checked their "VirtualBox on Linux" forum, but did
not see any obvious existing post about it.

Cheers,
FJP
--

Previous thread: 2.6.25-rc[12] Video4Linux Bttv Regression by Bongani Hlope on Sunday, February 17, 2008 - 4:36 am. (23 messages)

Next thread: uli526x link problem solved, fix not on 2.6.25-rc2 by Santiago Garcia Mantinan on Sunday, February 17, 2008 - 5:42 am. (2 messages)