Re: Linus 2.6.23-rc1

Previous thread: Git tree for old kernels from before the current tree by Jon Smirl on Sunday, July 22, 2007 - 4:49 pm. (32 messages)

Next thread: please pull from the trivial tree by Adrian Bunk on Sunday, July 22, 2007 - 5:08 pm. (1 message)
To: Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 22, 2007 - 5:04 pm

Ok, right on time, two weeks afetr 2.6.22, there's a 2.6.23-rc1 out there.

And it has a *ton* of changes as usual for the merge window, way too much
for me to be able to post even just the shortlog or diffstat on the
mailing list (but I had many people who wanted to full logs to stay
around, so you'll continue to see those being uploaded to kernel.org).

Lots of architecture updates (for just about all of them - x86[-64], arm,
alpha, mips, ia64, powerpc, s390, sh, sparc, um..), lots of driver updates
(again, all over - usb, net, dvb, ide, sata, scsi, isdn, infiniband,
firewire, i2c, you name it).

Filesystems, VM, networking, ACPI, it's all there. And virtualization all
over the place (kvm, lguest, Xen).

Notable new things might be the merge of the cfs scheduler, and the UIO
driver infrastructure might interest some people.

Oh, and I personally like how "sendfile" is now totally gone internally,
and the kernel now ends up doing all that with splice insted. Good
riddance, although we'll obvously end up supporting the old user level
interfaces for a long time.

So give it all a good whacking, and report back about all the neat new
features!

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 10:52 am

Hmm

- Linus 2.6.23-rc1
+ Linux 2.6.23-rc1

Or are *you* now under versioning?
Or maybe a silent namechange of the kernel?

/ronni
-

To: Ronni Nielsen <theronni@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 1:30 pm

Yeah, yeah, my fingers get confused. I type "Linux" and "Linus"
interchangably, and _most_ of the time I notice, but then at other times I
don't look closely enough at what I wrote, so something slips through.

And sometimes when my right hand is off by a key, I'm Kubys.

My fingers have minds all their own.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Friday, July 27, 2007 - 10:04 pm

(sorry for repost, but there seemed to have been some troubles..)

Im still not so keen about this, Ingo never did get CFS to match SD in
smoothness for 3d applications, where my test subjects are quake(s),
world of warcraft via wine, unreal tournament 2004. And this is despite
many patches he sent me to try and tweak it. As far as im concerned, i
may be forced to unofficially maintain SD for my own systems(allthough
lots in the gaming community is bound to be interrested, as it does make
games lots better)

<snip>

-

To: Kasper Sandberg <lkml@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Sunday, July 29, 2007 - 11:04 am

hi Kasper,

hey, i thought you vanished from the face of the earth :-) The last
email i got from you was more than 2 months ago, where you said that
you'll try the latest CFS version as soon as possible but that you were
busy with work. I sent 2 more emails to you about new CFS versions but
then stopped pestering you directly - work _does_ take precedence over
games =B-)

CFS v14, v15, v16, v17, v18 and v19 was released meanwhile, CFS v20 went
upstream, there were no 3D related CFS regressions open for quite some
time and because i never heard back from you i assumed everything's
peachy.

In any case i'm glad you found the time to try CFS again, so please let
me know in what way it regresses. In your most recent emails you did not
indicate what specific problem you are having (and you did not reply to
my last emails from May) - are your old regression reports against CFS
v13 from May still true as of v2.6.23-rc1? If they are, could you please
indicate which specific report of yours describes it best and send me
(or upload to some webspace) the specific .config you are using on your
box, and the cfs-debug-info.sh snapshot taken when you are running your
game. (make sure you have CONFIG_SCHED_DEBUG=y enabled, for highest
quality debug output) You can pick the script up from:

http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

Giving us that info would help us immensely with tracking down any CFS
problem you might still be having.

Or, if you feel adventurous enough to look into the internals of the
kernel (which, considering your offer to take up SD maintenance, you
must be ;-), here's my kernel latency tracer:

http://people.redhat.com/mingo/latency-tracing-patches/

( see: latency-tracer-v2.6.23-rc1-combo.patch )

the simplest way to use it is to enable CONFIG_WAKEUP_TIMING, to set
/proc/sys/kernel/preempt_max_latency back to 0 (after bootup) and to
thus measure raw worst-case scheduler latencies - if you regularly see
the...

To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Monday, July 30, 2007 - 12:13 pm

I did respond to that one, but perhaps some mail have been getting lost,

Well, im not sure how good i would be at maintaining SD, my idea was
more or less just do the bare minimum to get the thing running on newer

Actually, now that you mention ogg123, i've had some bugs on CFS with
this, i thought it was an ogg123 bug, but now that i remember it its
only on CFS i have it.. when i run multiple ogg123 instances, suddenly
they will just stop playing and lock up. This happens when someone
writes alot fast to me on kopete, where i use ogg123 to play a bling

-

To: Ingo Molnar <mingo@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Sunday, July 29, 2007 - 7:04 pm

<chuckle>

You're advocating plugsched now?
-

To: George Sescher <gesacs@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Monday, July 30, 2007 - 2:44 am

hm, the way you posited this question implies that you see an
inconsistency in my position or that it surprised you - i cannot explain
the '<chuckle>' in any other way :) Which bit do you see as inconsistent
and/or which bit surprised you and why?

Ingo

-

To: Ingo Molnar <mingo@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Monday, July 30, 2007 - 3:06 am

The idea is not good enough for mainline and has no place in mainline
yet you say it's very important to maintain it... but out of mainline.
Place the responsibility of keeping mainline's performance in check
"reality check as you called it" on to someone who is forced to
develop out of mainline? I have zero interest one way or the other
myself, but how can one not chuckle?

Again I have no interest either way but if this is that important a
reality check that it needs maintaining it should be oh I don't know,
an -mm only feature or something?

Please don't jump down my throat, your position just needs clarifying. :-|
-

To: George Sescher <gesacs@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Monday, July 30, 2007 - 3:55 am

What you should realize is that _all_ future code that goes into Linux
is 'forced' to be developed 'out of mainline' today. So what you seem to
characterise via negative terms like 'forced', and what seems to make
you 'chuckle' (not meant as a compliment either i gather ;), is in fact
the _very engine_ that keeps Linux running.

And there's no exception: Linus himself creates an "out of mainline"
fork of Linux every time he develops something new. "Forks" are _the_
main mechanism to develop Linux, and it always was. External code is the
"reality check" of mainline code. It is the 'external pool of genes'
that is _competing_ against in-tree code.

Sometimes the decision to include new bits of code is easy and positive
(so it is a "fork" only very briefly and nobody actually ever has enough
time to think of that code as a "fork"), sometimes it takes some time
and the decision is positive, sometimes the decision is immediately
negative and the code is rejected, sometimes it's negative after some
time. Often code goes through several cycles of rejection before it is
merged. The larger the code, the more rejections it will see - and that
is natural. Sometimes, very rarely, out of the hundreds of thousands of
external changes that went into Linux so far, code seems to be staying
'in limbo' forever - such as the kernel debugger. So _every_ color of
the spectrum is present: immediate integration, immediate rejection,
long-term integration, long-term rejection, ping-pong of rejections
until integration, and even decisions that seem to take a near
'eternity' in very rare cases.

If a biologist took a look at these gene pool dynamic parameters alone,
without knowing a squat about kernel technology, the likely conclusion
would be that this is "a healthy, diverse gene pool that is being
affected by many many external factors. A true expert at survival, that
critter!" ;-)

For example, i'm at the moment maintaining in excess of 400 patches "out
of mainline", many ...

To: Ingo Molnar <mingo@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>, Peter Williams <pwil3058@...>
Date: Monday, July 30, 2007 - 5:26 am

<permission to jump down my throat granted now>

Nope. I can't equate your soliloquy about the development process with
what it appears you are doing in the case of plugsched but you're
obviously too smart for me to argue against or I don't understand and
I've already overstepped my authority on this mailing list being an
ordinary user. I'll just end up trying to extract your boot from my
anus.
-

To: George Sescher <gesacs@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>, Peter Williams <pwil3058@...>
Date: Monday, July 30, 2007 - 6:26 am

could you please be a bit more specific - what do you mean under "what
you are doing in the case of plugsched"?

In the above section which you characterised as 'soliloquy' (i guess i
must have failed to make myself clear enough) i tried to answer the

by pointing out that "developing out of mainline" (such as PlugSched or
like the 400+ patches i maintain out of tree), is not something negative
as you seem to have suggested/implied but the main mechanism of Linux
development - so not surprisingly, while i might disagree whether
something out of tree should go upstream or not, i dont disagree with
the idea of keeping out-of-tree patches - why should i? I do it myself
and always did it. Or in other words: without out-of-tree patches the
kernel 'pool of genes' would become stagnant.

If you disagree with me or if you have any other questions then feel
free and let me know. And as always, i could be mistaken so dont expect
me to "jump down on your throat" in any way, shape or form :-) Thanks,

Ingo
-

To: George Sescher <gesacs@...>
Cc: Ingo Molnar <mingo@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Sunday, July 29, 2007 - 7:18 pm

I'd suggest people here take a look at the code. It's not what Ingo was
saying, and it's not what the code is set up to do. He's just stating that
the way he split up the files, it's actually easier from a patching
standpoint to just create a new file to include instead of
"kernel/sched_fair.c".

But quite frankly, I've seen a lot of totally stupid flamage, and very
little actual sense in this discussion. People probably didn't even look
at the patches. Did you?

For example, how hard is it for people to just admit that CFS actually
does better than SD on a number of things? Including very much on the
desktop.

Ingo posted numbers. Look at those numbers, and then I would suggest some
people could seriously consider just shutting up. I've seen too many
idiotic people who claim that Con got treated unfairly, without those
people admitting that maybe I had a point when I said that we have had a
scheduler maintainer for years that actually knows what he's doing.

And no, it has never been about "desktop" vs "servers", or similar
claptrap. It's been about improving the scheduler.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: George Sescher <gesacs@...>, Ingo Molnar <mingo@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>, Bill Huey (hui) <billh@...>
Date: Tuesday, July 31, 2007 - 6:05 am

Here's the problem, *a lot* of folks can do scheduler development in and
outside community, so what's with exclusive-only attitude towards the
scheduler ?

There's sufficient effort coming from folks working on CFS from many
sources so how's sched-plugin a *threat* to stock kernel scheduler
development if it gets to the main tree as the default compile option ??

Those are the core question that Con brought in the APC article, folks
are angry because and nobody central to the current Linux has address
this and instead focused on a single narrow set of technical issues
to justify a particular set of actions.

I mean, I'm not the only that has said this so there has to be some
kind of truth behind it.

bill

-

To: Bill Huey <billh@...>
Cc: George Sescher <gesacs@...>, Ingo Molnar <mingo@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Tuesday, July 31, 2007 - 11:44 am

There is no exclusive-only attitude towards the scheduler.

If you send me small and obvious improvements, they'll get applied to the
scheduler, exactly the same way they get applied to anything else.

And if you try to rewrite everything, and do it on your own, and then
don't even send me a patch, it also won't get applied.

Surprise?

Linus
-

To: Bill Huey <billh@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, July 31, 2007 - 10:04 am

You came to us as an ex-BSD developer (which has a completely different
contribution culture) and early on i tried to explain to you (and we met
in person at OLS2004) that the Linux way of getting code upstream is not
that of social-engineering or talking the code upstream, but that of
_coding_ your stuff upstream: working with others and getting good code
accepted. I'm not sure you ever realized that point.

To counter your myth of "upstream development exclusivity", here are
some git-provided hard numbers. Since 2.6.22 was released 3 weeks ago,
over half a million lines of code were added/modified/removed in the
upstream -git kernel:

5965 files changed, 332689 insertions(+), 269500 deletions(-)

that massive amount of work was done by over 750 contributors. Out of
those 750 contributors, more than 160 are _first time contributors_.
Think about it, there's _lots_ of fresh blood, about 650 new kernel
contributors a year. The kernel/sched.c file itself, with 274 commits
and 88 unique contributors over the past ~2 years alone is one of the
most actively and most diversely developed core kernel subsystems in the
kernel.

Regarding PlugSched, i'd suggest you to read the detailed explanation
that has been offered in this and in related discussions over the past
few years on lkml. (see: http://lkml.org/lkml/2007/4/15/124 and
http://lkml.org/lkml/2007/4/15/64 and many other postings)

To recap: we dont have a pluggable TCP/IP core either. Nor do we have a
pluggable MM. Pluggable I/O schedulers are not an unconditional success
either - Nick (I/O scheduler author) recently stated that and suggested
the CPU scheduler to not be pluggable.

Whether something becomes pluggable or not depends on many
_technological_ factors but you appear to be dead-set on spinning _any_
technical decision against pluggability into a conjured-up non-sequitor
non-technical "so this means you have an exclusive-only attitude"
position.

Why do you do that? Why cannot you accept t...

To: Linus Torvalds <torvalds@...>
Cc: George Sescher <gesacs@...>, CK Mailinglist <ck@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 30, 2007 - 1:12 am

Actually in benchmarks Ingo has quoted, SD was better on the desktop
(by a small margin).
CFS is still a bit bursty, though it has significantly improved with
age. I know, I did those benchmarks. That being said, I'm really
glad to see CFS in your git tree because the new framework around it
really improves the readability of the code, and actually makes it
easier to start experimenting with scheduler improvements from an
entire scheduler like SD to minor bits like priorities.

I have one concern - my benchmarking took load average as the common
denominator and CFS alters the way the load average is calculated, so
perhaps it wasn't a fair comparison. That being said, they still
showed CFS could scale very well and SD did not, so considering we're
dealing with everything from wristwatches to BlueGene/L I believe the
right choice was made. Those arguing for the 2% improvement that SD
would give them in their environment would be better off

a) helping port SD to the new scheduler framework
b) assisting Ingo in improving CFS to meet/exceed their requirements
c) giving practical assistance to anyone doing either of the above

I'm re-learning git and using my Copious Spare Time (tm) to do what I
can - but I have to admit I'm really in over my head. But hey, if
Jeff Garzik can do it, so can I. I remember when he couldn't grok C &
now he's got control over all our data :-)

--
Matt
-

To: Linus Torvalds <torvalds@...>
Cc: Ingo Molnar <mingo@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Sunday, July 29, 2007 - 7:38 pm

<snip long other discussion unrelated to my question>

Ingo's origiinal comment:

He said having reality checks is a good thing. He's encouraging some
poor bastard to maintain plugsched out of mainline to have SD or
whatever to compare to. I did not say I advocated anything whatsoever.
I was asking if this is what Ingo is suggesting people use their
energy doing. Not good enough for mainline, but definitely worth
keeping around and good enough for... no idea what. I was asking Ingo
that.
-

To: George Sescher <gesacs@...>
Cc: Ingo Molnar <mingo@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, CK Mailinglist <ck@...>
Date: Sunday, July 29, 2007 - 7:58 pm

My bad, it was me who misread that (I didn't react to the name, I was
thinking people were talking about maintaining SD that way).

Mea culpa.

Linus
-

To: Kasper Sandberg <lkml@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Friday, July 27, 2007 - 10:35 pm

You realize that different people get different behaviour, don't you?
Maybe not.

People who think SD was "perfect" were simply ignoring reality. Sadly,
that seemed to include Con too, which was one of the main reasons that I
never ended entertaining the notion of merging SD for very long at all:
Con ended up arguing against people who reported problems, rather than
trying to work with them.

Andrew also reported an oops in the scheduler when SD was merged into -mm,

You know what? You can do whatever you want to. That's kind of the point
of open source. Keep people honest by having alternatives.

But the the thing is, if you want to do a good job of doing that, here's a
big hint: instead of keeping to your isolated world, instead of just
talking about your own machine and ignoring other peoples machines and
issues and instead of just denying that problems may exist, and instead of
attacking people who report problems, how about working with them?

That was where the SD patches fell down. They didn't have a maintainer
that I could trust to actually care about any other issues than his own.

So here's a hint: if you think that your particular graphics card setup is
the only one that matters, it's not going to be very interesting for
anybody else.

[ I realize that this comes as a shock to some of the SD people, but I'm
told that there was a university group that did some double-blind
testing of the different schedulers - old, SD and CFS - and that
everybody agreed that both SD and CFS were better than the old, but that
there was no significant difference between SD and CFS. You can try
asking Thomas Gleixner for more details. ]

I'm happy that SD was perfect for you. It wasn't for others, and it had
nobody who was even interested in trying to solve those issues.

As a long-term maintainer, trust me, I know what matters. And a person who
can actually be bothered to follow up on problem reports is a *hell* of a
lot more important than one...

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, CK Mailinglist <ck@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 5:07 pm

Once again linus blows a nut getting off about this and that. The fact
of the matter linus is a one sided. The fact is linus says what he wants
and people think he is god. The fact is noone get code in unless they
are a major player in a linux distro. Ingo had much advantage by using
fedora users. The fact Con did not take all bugs serious yes that is a
player of the game but linus is GOD so all bow before him before he
blows his back out while jacking off to his rants about how the kernel
and other projects should run.

Jory

-

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, CK Mailinglist <ck@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 9:18 am

I do recall there is one issue on which Con wouldn't budge -- anything
that involved boosting certain kinds of processes in the kernel. He
said that it would defeat the whole point of the way he had designed
it, and that nicing could work just as well. Perhaps there could have
been a better way of handling that issue, such as adding (yet another)
kernel compilation configuration option for this code (since Con

I don't know how you can blame Con for not finding a PPC oops before
SD was merged into -mm, considering he seemed to be coding solely on
an x86-based architecture. Of course, you could say that his design
should have factored in all the architectures and such, but even the
best design can fall apart if it doesn't get tested somewhere. Again,
this is probably a subjective case in that Con might have pushed SD to
-mm rather early; but considering the readership of his -ck list, I
don't think it would have been tested on anything other than X86 until
it went into -mm because I've ever seen anyone on -ck report "it works
on <something other than x86/x86-64/IA64>". I don't know what made it
on to other lists, but Con tried his best to fix this oops, and unless
it was done privately, this oops was never re-reported. (Now, if SD
was _STILL_ causing this oops on PPC, I can see how this might be a

So if we found a better maintainer who would commit to maintaining the
SD patches, would you still be willing to consider merging them? Is

--
Michael Chang

Please avoid sending me Word or PowerPoint attachments. Send me ODT,
RTF, or HTML instead.
See http://www.gnu.org/philosophy/no-word-attachments.html
Thank you.
-

To: Michael Chang <thenewme91@...>
Cc: Kasper Sandberg <lkml@...>, CK Mailinglist <ck@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 1:25 pm

I did that myself, so that's a non-issue.

No. The complaints were about the CK scheduler not being as responsive
under load as even the _old_ scheduler was. I don't know why people ignore
this fact. It was a long thread back in March or April, and I'm pretty
sure the CK mailing list was cc'd.

Sure, most people don't actually have load-averages above ten etc, but
it's important to do those well _too_.

Linus
-

To: <ck@...>
Cc: Linus Torvalds <torvalds@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 2:03 pm

Of course it wasn't. The speed of tasks slows proportionally with the amoun=
t=20
of system usage. That's the whole point, and CFS can't fix that either, can=
=20

<sarcasm>
http://osnews.com/permalink.php?news_id=3D18350&comment_id=3D259044

Now I wonder. Apparently, one person complaining about SD was reason to kee=
p=20
it out http://osnews.com/permalink.php?news_id=3D18350&comment_id=3D258997

Will this first post stop CFS from entering the kernel?
</sarcasm>

Now I'll try to be a bit more constructive. I hope your benevolent=20
dictatorship allows self reflection.

Sure, the difference in behaviour (not in code) between SD and CFS is small=
,=20
and for me it doesn't matter. I'm fine with CFS in the kernel, it's a huge=
=20
improvement over the previous one. But why, while there was a seemingly goo=
d=20
alternative, did THAT one stay in that long? And this argument goes for mor=
e=20
code 'out there', btw.
=20
Some things get into the kernel, other don't. Some get in too soon, others=
=20
too late. Sure. But shouldn't we try to improve this process, instead of=20
saying 'it is what it is, get over it'?
=20
For me, that's the purpose of this whole discussion. We're losing valuable=
=20
code and contributors, yet at the same time code which isn't mature yet=20
enters the kernel. Acknowledging there is a problem is the first step in=20
solving it.

Of course, I don't have answers - but I do feel strongly that you think th=
ere=20
is no issue. Is there, or isn't there? And if there is, what do you plan to=
=20
do about it?

Your influence on the behaviour of the people around you, your 'lieutenants=
',=20
is huge. Larger than you might think. And in many cases, ppl following=20
someone behave more extreme. That's a big reason why the LKML isn't very=20
polite nor inviting (mind you, I don't think that's necessarily a bad thing=
,=20
that's up to you to decide).

You might want to think about ways to improve the whole process. Ag...

To: jos poortvliet <jos@...>
Cc: <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 2:28 pm

You seem to be not understanding the argument.

It wasn't about "one person complaining". Of *course* people will
complain. That always happens, and sometimes with totally bogus complaints
(the most common being "I'm not used to it").

The problem was the reaction to complaints.

Ingo got lots of complaints too. He was very responsive to them (which is
not something surprising - he's been doing this a long time), and while
some of the tangents he went off on were definitely bogus (the whole
renicing thing), they were still useful as part of the discussion.

And Ingo got other - totally unrelated - developers involved too, ie the
group fairness logic came from Vatsa. And he ended up supporting not just
scheduler people, but also talking to the block layer people ahout the
scheduler timer usage as a fast clock for block requests etc.

And you have to realize that to me, as the top-level maintainer and one
who seldom actually does big coding things, but just ends up making sure
that people work with others, and fix the problems that crop up, *that*
kind of behaviour is much much MUCH more important than the code itself.

Can you see that?

Actually, nobody pushed SD to me, and neither Con nor anybody else tried
to get me to merge it until some time in March of this year, I think.

Do you think I go trolling for code to merge? No. I actually _require_
that people send it to me, and that I also get the feeling that people are
asking for it!

In other words, my job is not to "merge code" (even though I sometimes
describe it that way), my job is actually largely to "say no". You
shouldn't see me as the person who goes out and tries to get everything
together - quite the reverse. My job is to say "too late for the merge
window", or "too experimental", or "you need to show numbers" or "are

Umm. The absolute *last* thing we want to do is to merge earlier. In fact,
one of our biggest problems is that people send half-cooked stuff to me
(and even mor...

To: Linus Torvalds <torvalds@...>
Cc: <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 3:28 pm

Op Saturday 28 July 2007, schreef Linus Torvalds:

Your point here seems to be: this is how it went, and it was right. Ok, got=
=20
that. Yet, Con walked away (and not just over SD). Seeing Con go, I wonder=
=20
how many did leave without this splash. How many didn't even get involved a=
t=20
all??? Did THAT have to happen? I don't blame you for it - the point is tha=
t=20
somewhere in the process a valuable kernel hacker went away. How and why? A=
nd=20
is it due to a deeper problem?

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

=A0 =A0A: Because it destroys the flow of the conversation
=A0 =A0Q: Why is top-posting bad?

To: jos poortvliet <jos@...>
Cc: <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 4:31 pm

But I wanted to bring out more than what you make sound like "that's what
happened, deal with it". I tried to explain _why_ the choices that were
made were in fact made.

And it's a (I think) important thing for people to be aware of. The fact
is, "personality" and "work with the other developers" is a big issue.

You cannot just go off and do your own thing in your private world, and
then expect it to be accepted without any discussion or other people
showing up and doing alternate things. That's _especially_ true in an area

We've had people go with a splash before. Quite frankly, the current
scheduler situation looks very much like the CML2 situation. Anybody
remember that? The developer there also got rejected, the improvement was
made differently (and much more in line with existing practices and
maintainership), and life went on. Eric Raymond, however, left with a
splash.

It's not common, but it's not unheard of. Anybody who thinks that
developers don't have huge egos probably haven't ever met a software
engineer. And I suspect kernel people have bigger egos than most. No
wonder there are clashes every once in a while - it's a wonder there

Well, one part of it is that the way to make changes in the kernel
community is to do them incrementally.

Small and incremental improvements are much easier to merge. If you go off
and rewrite a subsystem, you shouldn't expect it to get merged, at least
not unless it can live side-by-side with the old one (the new firewire
stack is an example of that, and most filesystems are this way too). And
the closer to some central part you get, the harder that gets.

So the *bulk* of the kernel stuff can be handled either incrementally, or
side-by-side, and as a result, you actually seldom see issues like this.
The kernel is extremely modular, and a large reason for that is exactly to
avoid couplings.

Some (very few) things cannot be done incrementally. That's why I bring
up CML2 as a fairly good example of thi...

To: Linus Torvalds <torvalds@...>
Cc: jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 12:17 am

Hi,

Since I was directly involved I'd like to point out a key difference.

http://lkml.org/lkml/2002/2/21/57 was the very first start of Kconfig and
initially I didn't plan on writing a new config system. At the beginning
there was only the converter, which I did to address the issue that Eric
created a complete new and different config database, so the converter was
meant to create a more acceptable transition path. What happened next is
that I haven't got a single response from Eric, so I continued hacking on
it until was complete.

The key difference is now that Eric refused the offered help, while Con
was refused the help he needed to get his work integrated.

When Ingo posted his rewrite http://lkml.org/lkml/2007/4/13/180, Con had
already pretty much lost. I have no doubt that Ingo can quickly transform
an idea into working code and I would've been very surprised if he
wouldn't be able to turn it into something technically superior. When Ingo
figured out how to implement fair scheduling in a better way, he didn't
use this idea to help Con to improve his work. He decided instead to
work against Con and started his own rewrite, this is of course his right
to do, but then he should also accept the responsibility that Con felt his
years of work ripped apart and in vain and we have now lost a developer
who tried to address things from a different perspective.

bye, Roman
-

To: Roman Zippel <zippel@...>
Cc: Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 1:46 am

When Ingo wrote something that went head-on with what Con wrote, it was his
prerogative to do so. There's no speaking here of rights to do or not to
do since as matter of evidence, in the open source world, that which is
superior (i.e. code, function, not person) has the right to exist and the
inferior to die away. Did Ingo have the obligation to improve Con's work?
Definitely not. Did Con have a right to get Ingo's improvements or
suggestions? Definitely not. There are no such rights in this open source
development framework (TM).

What Ingo did, I think, was what he wanted and he has the right to do that.
I believe that Ingo does not have an obligation to be responsible
for what Con felt. You feel what you feel because you choose to feel that
way. Let us remember that "Happiness is a choice, not a state."

And let's just look at the attitudes on how both Ingo and Con reacted to
the issues regarding their respective schedulers. I won't list them here
now since they're all there in the archives.

Since attitude also plays a big part in getting your code in mainline, I
think we would know the reason why one got chosen for the other.

Thank you very much.

Best Regards,

Carlo

--
Carlo Florendo
Softare Engineer/Network Co-Administrator
Astra Philippines Inc.
UP-Ayala Technopark, UP Campus Diliman
1101 Quezon City, Philippines
http://www.astra.ph

--
The Astra Group of Companies
5-3-11 Sekido, Tama City
Tokyo 206-0011, Japan
http://www.astra.co.jp
-

To: 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>
Cc: 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, <ck@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 2:16 am

Yes, and that's where the inequality is.

Unless the maintainer does a really bad job or pisses off Linus,
anyone who wants to merge his code into mainline pretty much
has to get the blessing of the maintainer. On the other hand,

I don't think it's the code superiority that decided the fate of the two
schedulers. When CFS came out, the fate of SD was pretty much already
decided. The fact is that Linus trusts Ingo, and as such he wants to merge
Ingo's code. Of course I cannot say it's wrong, and Ingo's earned this
it through years of hard work, but let's not kid ourselves and deny the
obvious fact.

I think Con was simply too frustrated after years of rejection. He could
have been more diplomatic this time round. But no matter how he'd have
done, once Ingo decided to write a new scheduler, the outcome was pretty
much already decided.

SD (and years of Con's work) inspired CFS. This is a fact. No matter how
smart and capable Ingo is, he needs inspiration to keep the good work going.
So I wish Ingo could work more closely with others and let them share a bit
more credit which would just produce even better work.

Hua

-

To: Hua Zhong <hzhong@...>
Cc: 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, <ck@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 8:31 am

Umm nope. As a maintainer if you feed Linus stuff you wrote that he
thinks is a bad idea it will not go in, and you'll get an explanation of
why.

The process isn't perfect (eg removing half-vanished maintainers isnt
handled well) but it isn't as you claim.

-

To: Hua Zhong <hzhong@...>
Cc: 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, <ck@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 3:09 am

I agree with you here. It's not simply code superiority that matters but a
balance of attitude and the code's corroboration with numbers. Both had

I don't see where credit was lacking. As far as I've observed, SD's author
was given acknowledgment on what he did and even got praise.

Thank you very much.

Best Regards,

Carlo

--
Carlo Florendo
Softare Engineer/Network Co-Administrator
Astra Philippines Inc.
UP-Ayala Technopark, UP Campus Diliman
1101 Quezon City, Philippines
http://www.astra.ph

--
The Astra Group of Companies
5-3-11 Sekido, Tama City
Tokyo 206-0011, Japan
http://www.astra.co.jp
-

To: Hua Zhong <hzhong@...>
Cc: 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 3:05 am

I think a lot of people are missing some key things here:

It does not matter who's code gets merged.

The CFS-SD competition was a GOOD THING. Both sides were in heavy, fast
improvement mode, and competed on all fronts and borrowed heavily from
eachother in terms of ideas that worked, and innovated to stay ahead.
The end result is that both were good schedulers, and Linux won by
getting the fruit of this competition. Think of it as a mini-evolution
of scheduler ideas compressed into a short time period.

Now compare this to a single patch without competition/the need to
survive in the habitat, say the first SD or the first CFS patch....
whatever your poison is. If there had been no competition element, we
would have ended up with either one of those, and it would have been not
nearly as good as they both ended up as in the end.

Who wrote the code is not relevant in the large picture, the fact that
the problem at hand (2.6 scheduler behavior) got solved is.

I wish people would focus less on who wrote the actual code that got
merged in the end, and more on the problem that got solved.... People
who care about the desktop should be happy that the scheduler improved a
lot due to the competition where the two new schedulers were hair-close
in most aspects. Again.. think about the problem being solved. Not who
wrote the code or which of the competitive patches got merged in the
end.

Let me repeat the key message:

It does not matter who's code gets merged.
It does not matter who's code gets merged.
It does not matter who's code gets merged.
It does not matter who's code gets merged.

What matters is that the problem gets solved and that the Linux kernel
innovates forward.

I've had several cases myself where I spent quite some time solving a
problem, just to get some random remark from someone smart on lkml
saying "if you had done <this simple thing> you would have had <this
simple and superior solution>". Was I pissed off that my patch didn't
get merged ...

To: Arjan van de Ven <arjan@...>
Cc: Hua Zhong <hzhong@...>, 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Thursday, August 2, 2007 - 4:03 pm

This attitude has risks over the long term, if outsiders with fresh
ideas are discouraged. Risking becoming known to defer too much to
established maintainers, those fresh ideas may stop coming to linux.

- FChE
-

To: Frank Ch. Eigler <fche@...>
Cc: Arjan van de Ven <arjan@...>, Hua Zhong <hzhong@...>, 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Saturday, August 4, 2007 - 4:04 am

Amen to that, Frank. Driving off talented contributers is a Very Bad
Thing for Linux in the long run. This will not not stop evolutionary
progress, but it slows it down and may result in an overly inbred
animal.

It is especially easy to drive off a contributor whose day job is not
Linux hacking.

Regards,

Daniel
-

To: Frank Ch. Eigler <fche@...>
Cc: Hua Zhong <hzhong@...>, 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Thursday, August 2, 2007 - 4:05 pm

My concern is that only "get my line of code merged" is seen as "the
ultimate thing". It's more than that. Linux is about collaboration,
where it matters more that people work together to solve a problem, far
far more than who actually types the lines in on the keyboard. Working
on the problem should be seen (and recognized) as the right thing. Who
writes the code is secundary to that.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

-

To: Arjan van de Ven <arjan@...>
Cc: Hua Zhong <hzhong@...>, 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Thursday, August 2, 2007 - 4:33 pm

Unfortunately, this spirit of collaboration sometimes gets lost in
practice when feedback is asymmetric, obnoxious, or absent.

- FChE
-

To: Arjan van de Ven <arjan@...>
Cc: Hua Zhong <hzhong@...>, 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Thursday, August 2, 2007 - 11:22 am

Hey to me it even happened I had this nice and safe pte-highmem patch
but the buggy highpte was merged instead, go figure. Con got lucky.
-

To: Arjan van de Ven <arjan@...>
Cc: Hua Zhong <hzhong@...>, Carlo Florendo <subscribermail@...>, Roman Zippel <zippel@...>, Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 4:14 am

And, from a standpoint of ONGOING, long-term innovation: what matters
is that brilliant, new ideas get rewarded one way or another. Because
if you don't, the people with the 'different' ideas walk away, you end
up with only those who 'fit' the culture, and there goes innovation.

That's why I tried to get involved in this discussion. It doesn't
matter who's code gets merged. But it does matter that people get
scared away. It took the kernel folks a few years, but they managed to
get someone kicked out who's not 'in-crowd', who clearly has a
different view, and who has the intent and motivation to write and
maintain code.

And that's bad.

I've quoted this before: Reward Brilliant Failures, Punish Mediocre Successes.

Of course that's 'overdone', but it conveys a point: If you focus too
much on exploiting current code, instead of fundamentally exploring
new ideas you go down in the long run. There has to be a balance. And
in some area's of the kernel, there seems to be a good balance - new
ideas come in, code is being re-factored. But in scheduling and VM, I
wonder if there's enough exploration...

I hear 'We don't do politics' a lot in the kernel community.

Well, what are politics? Managing the way code gets into the kernel?
That's important for sure, right? And what about thinking about the
hacker culture? Nobody would object to preserving and securing that.
But those are not just technical matters. Yet they require thought. If
the kernel culture doesn't work, the code won't work. There is a
delicate balance, and a key part of what Linus has been doing is
preserving it. I think he must not ignore that there is always room
for improvement, and moments like these (where a big 'fight' is going
on, and there is a clear sense of urgency about the matter) are the
perfect times for a good discussion, and possible change.

Use it.

Love,

Jos

* Disclaimer:
- I'm no kernel hacker
- actually I help at the KDE project in the area of marketing...
- yet, i have followed ...

To: <jos@...>
Cc: Hua Zhong <hzhong@...>, Carlo Florendo <subscribermail@...>, Roman Zippel <zippel@...>, Linus Torvalds <torvalds@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 10:02 am

and in this case, the reward is that the idea got used and credit was

yet at the same time if people walk away just because their code didn't
get used, even though their problem got solved, should we merge "worse"
code just to prevent that ? That's almost blackmail, and also just
stupid.

(not suggesting that SD in this case was better or worse, just trying to

And he did manage to get some of his code in, just not all. He also
managed to get people interested in his problem so much that a healthy
stint of competition happened and his problem got solved. If people walk
away because they don't 100% always get things done EXACTLY their way..

here's the thing. Fair scheduling DID get explored. deeply so.

now, getting people interested in your problem (and that is needed to
get them to pay attention to it) is a sales job, no ifs and buts there.
You need to convince them that 1) the problem is real, 2) the problem is
relevant. If you also have a proposed solution you also need to convince
them that 3) the solution solves the problem and 4) that it's the right
way to solve the problem. That isn't politics, it's part of how the
ecosystem works; people are not stupid, but you need to convince them
about your problem and solution. And that "default a bit skeptical and
overworked" approach is the foundation of the process; the same way as
you need to pass a code review before people will merge your code.
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

-

To: 'Arjan van de Ven' <arjan@...>, <jos@...>
Cc: 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 2:40 pm

You mean, when Ingo announced CFS he mentioned Con's name?

If it is a general point, sure, but it's hardly 1/10 of what happened
here. And note I don't agree with Con's decision either - I wish he'd
be back, but the reason I jumped in was to show some understanding, as
I see some comments in the thread that were not doing so.

When you said "it does not matter whose code got merged", I have to
disagree. Sure, for the Linux community as a whole, for Linux itself,
it may not matter, but for the individuals involved, it does. And I
think benefits of individuals are as important as benefits of the
community (or the nation).

Con has been working on scheduler (fair or not) for years, and nothing
got merged. Yet CFS got merged in a blink despite the fact that the
competition just began to show. Have we given SD a fair chance? No.

Ingo has a unique position that nobody else could challenge. Note I
have said that he earned it through hard work and talent, so that's
not the problem. The problem is how he could have handled it better,
not "grab the food right under other's nose" blatantly.

I don't think merging CFS was a wrong decision. The problem was how
this decision was made. And I think Linus made some rather unfair
comments about Con's personality, and I don't think deeply that
was the reason he merged Ingo's code.

Hua

-

To: Hua Zhong <hzhong@...>
Cc: <jos@...>, 'Carlo Florendo' <subscribermail@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 6:04 pm

I agree it's a nice ego boost to see your code merged.
But... do you care more about your ego boost or about your problem
getting solved? I really want to change this if you say "ego for code
merging"... "ego boost for getting linux improved and being involved in
solving an important problem" is a lot better type of ego boost..

No developer can or should expect that most, or even half of his code to
be merged. Even Linus doesn't get half the code he writes into linux :)

Con did get a whole bunch of stuff merged over the years, and for the
rest he mostly got the problem solved. That's pretty successful....

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

-

To: Arjan van de Ven <arjan@...>
Cc: Hua Zhong <hzhong@...>, 'Roman Zippel' <zippel@...>, 'Linus Torvalds' <torvalds@...>, 'jos poortvliet' <jos@...>, 'Michael Chang' <thenewme91@...>, 'Kasper Sandberg' <lkml@...>, 'Linux Kernel Mailing List' <linux-kernel@...>
Date: Wednesday, August 1, 2007 - 3:12 am

Very rational. I would now have to contend that CFS didn't lose and
neither did SD. Linux won.

Thank you very much.

Best Regards,

Carlo

--
Carlo Florendo
Softare Engineer/Network Co-Administrator
Astra Philippines Inc.
UP-Ayala Technopark, UP Campus Diliman
1101 Quezon City, Philippines
http://www.astra.ph

--
The Astra Group of Companies
5-3-11 Sekido, Tama City
Tokyo 206-0011, Japan
http://www.astra.co.jp
-

To: <ck@...>
Cc: Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 8:03 pm

Interesting... Trying to avoid reading email but with a flooded inbox it's
quite hard to do.

A lot of useful discussion seems to have generated in response to people's
_interpretation_ of my interview rather than what I actually said. For
example, everyone seems to think I quit because CFS was chosen over SD (hint:
it wasn't). Since it's generating good discussion I'll otherwise leave it as
is.

As a parting gesture; a couple of hints for CFS.

Any difference in behaviour between CFS and SD since they both aim for
fairness would come down to the way they interpret fair. Since CFS accounts
sleep time whereas SD does not, that would be the reason.

As for volanomark regressions, they're always the sched_yield implementation.
SD addressed a similar regression a few months back.

Good luck.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>
Date: Saturday, July 28, 2007 - 9:23 pm

Con, good to hear from you. Good luck with your future endeavors.

Charles

=2D-=20
"Are [Linux users] lemmings collectively jumping off of the cliff of
reliable, well-engineered commercial software?"
(By Matt Welsh)

To: jos poortvliet <jos@...>
Cc: Linus Torvalds <torvalds@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Bill Huey (hui) <billh@...>
Date: Saturday, July 28, 2007 - 4:07 pm

Absolutely, the current Linux community hasn't realized how large the
community has gotten and the internal processes for dealing with new
developers, that aren't at companies like SuSE or RedHat, haven't been
extended to deal with it yet. It comes off as elitism which it partially
is.

Nobody tries to facilitate or understand ideas in the larger community
which locks folks like Con out that try to do provocative things outside
of the normal technical development mindset. He was punished for doing
so and is a huge failure in this community.

Con basically got caught in a scheduler philosophical argument of whether
to push a policy into userspace or to nice a process instead because
of how crappy X is. This is an open argument on how to solve, but it
should not have resulted in really one scheduler over the other. Both
where capable but one is locked out now because of the choices of
current high level kernel developers in Linux.

There are a lot good kernel folks in many different communities that
look at something like this and would be turned off to participating
in Linux development. And I have a good record of doing rather
interesting stuff in kernel.

bill

-

To: Bill Huey <billh@...>
Cc: jos poortvliet <jos@...>, Linus Torvalds <torvalds@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 5:06 pm

So your argument is that SD shouldn't have been merged either, because it

Well, there are two schedulers...it's obvious that "high level kernel
developers" needed to chose one.

The main problem is clearly that no scheduler was clearly better than the
other. This remembers me of the LVM2/MD vs EVMS in the 2.5 days - both
of them were good enought, but only one of them could be merged. The
difference is that EVMS developers didn't get that annoyed, and not only
they didn't quit but they continued developing their userspace tools to
make it work with the solution included in the kernel
(http://lwn.net/Articles/14714/)
-

To: Diego Calleja <diegocg@...>
Cc: Bill Huey <billh@...>, jos poortvliet <jos@...>, Linus Torvalds <torvalds@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, August 7, 2007 - 2:55 am

Not that I want to be in this thread, particularly since it is already
two weeks stale, but your take on the EVMS story is incorrect. The
EVMS developers (that is, Kevin) sent out a nice, conciliatory email,
the project sputtered on for a while, then basically died.

http://marc.info/?l=evms-devel&m=118240945708775&w=2

Bill is right. People who know people are right. A lot of good talent
has been lost to Linux over the years because of various, perhaps good
intentioned, gaffs. The thing is, if you contribute to a project like
Linux for fun, when it stops being fun you walk.

Regards,

Daniel
-

To: Daniel Phillips <phillips@...>
Cc: Diego Calleja <diegocg@...>, Bill Huey <billh@...>, jos poortvliet <jos@...>, Linus Torvalds <torvalds@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, August 7, 2007 - 11:33 am

This is perfectly normal. It was outevolved and ran out of people who
cared enough to continue it. Happens all the time. In the proprietary
world its normally one company putting another out of business and lots
of people losing jobs and money so its actually a good deal friendlier
this side of the fence

When you contribute to a big project some of your stuff will get nowhere,
other stuff will eventually get kicked out and replaced. Its part of the
progress of the system.

And yes one day the Linux kernel will probably go the same was as EVMS
when something cooler and neater replaces it.
-

To: Diego Calleja <diegocg@...>
Cc: jos poortvliet <jos@...>, Linus Torvalds <torvalds@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Bill Huey (hui) <billh@...>
Date: Saturday, July 28, 2007 - 5:32 pm

My argument is that schedule development is open ended. Although having
a central scheduler to hack is a a good thing, it shouldn't lock out or
supress development from other groups that might be trying to solve the
problem in unique ways.

This can be accomplished in a couple of ways:

1) scheduler modularity

Clearly Con is highly qualified to experiement with scheduler code and
this should be technically facilitate by some means if not a maintainer.
He's only a part time maintainer and nobody helped him with this stuff
nor did they try to understand what his scheduler was trying to do other
than Tong Li.

2) better code modularity

Now, cleaner code would help with this a lot. If that was in place, we
might not need (1) and pluggable scheduler. It would limit the amount
of refactoring for folks so that their code can drop in easier. There's
a significant amount of churn that it locks out developers by default
since they have to constantly clean up the code in question while another
developer can commit without consideration to how it effects others.
That's their right as a maintainer, but also as maintainer, they should
give proper amount of consideration to how others might intend to extend
the code so that development remains "inclusive".

This notion of "open source, open development" is false when working

I think that's kind of a bogus assumption from the very get go. Scheduling
in Linux is one of the most unevolved systems in the kernel that still
could go through a large transformation and get big gains like what
we've had over the last few months. This evident with both schedulers,
both do well and it's a good thing overall the CFS is going in.

Now, the way it happened is completely screwed up in so many ways that I
don't see how folks can miss it. This is not just Ingo versus Con, this
is the current Linux community and how it makes decision from the top down
and the current cultural attitude towards developers doing things that
are:

1) architecturally sign...

To: Bill Huey <billh@...>
Cc: Diego Calleja <diegocg@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 6:18 pm

I don't think anything was suppressed here.

You seem to say that more modular code would have helped make for a nicer
way to do schedulers, but if so, where were those patches to do that?
Con's patches didn't do that either. They just replaced the code.

In fact, Ingo's patches _do_ add some modularity, and might make it easier
to replace the scheduler. So it would seem that you would argue for CFS,

I don't think so.

I think you're barking up the totally wrong tree here.

I think that what happened was very simple: somebody showed that we did
badly and had benchmarks to show for it, and that in turn resulted in a
huge spurt of coding from the people involved.

The fact that you think this is "broken" is interesting. I can point to a
very real example of where this also happened, and where I bet you don't
think the process was "broken".

Do you remember the mindcraft study?

Exact same thing. Somebody came in, and showed that Linux did really badly
on some benchmark, and that an alternate approach was much better.

What happened? A huge spurt of development in a pretty short timeframe,
that totally _obliterated_ the mindcraft results.

It could have happened independently, but the fact is, it didn't. These
kinds of events where somebody shows (with real numbers and code) that
things can be done better really *are* a good way to do development, and
it's how development generally ends up happening. It's hugely
motivational, both because competition is motivational in itself, but also
because somebody shows that things can be done so much better opens
peoples eyes to it.

And if you think the scheduler situation is different, why? Was it just
because the mindcraft study compared against Windows NT, not another
version of Linux patches?

The thing is, development is slow and gradual, but at the same time, it
happens in spurts (btw, if you have ever studied evolution, you'll find
the exact same thing: evolution is slow and gradual, but it also hap...

To: Linus Torvalds <torvalds@...>
Cc: Diego Calleja <diegocg@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Bill Huey (hui) <billh@...>
Date: Saturday, July 28, 2007 - 9:00 pm

They replaced code because he would have liked to have taken scheduler code
in possibly a completely different direction. This is a large conceptual
change from what is currently there. That might also mean how the notion of
bandwidth with regards to core frequency might be expressed in the system
with regards to power saving and other things. Things get dropped often
not because of pure technical reasons but because of person preference
and the lack of willingness to ask where this might take us.

The way that Con works and conceptualizes things is quite a bit different
and more comprehensive in a lot of ways compared to how the regular kernel
community operates. He's strong in this area and weak in general kernel
hackery as a function of time and experience. That doesn't mean that he,
his ideas and his code should be subject to an either/or situation with the
scheduler and other ideas that have been rejected by various folks. He
maintained -ck branch successfully for a long time and is a very capable
developer.

I do acknowledge that having a maintainer that you can trust is more
important, but it should not be exclusionary in this way. I totally

It's not the same as sched plugin. Some folks might not like to use the
rbtree that's in place and express things in a completely different
manner. Take for instance, Tong Li's stuff with CFS a bit of a conceptual
mismatch with his attempt at expression rebalancing in terms expiry rounds
yet would be more seamlessly integrated with something like either the old
O(1) scheduler or Con's stuff. It's also the only method posted to lkml
that can deal with fairness across SMP situtations with low error. Yet
what's happening here is that his implementation is being rejected because
of size and complexity because of a data structure conceptual mismatch.

Because of this, his notion of trio as a general method of getting
aggressive group fairness (by far the most complete conceptually on lkml,
over design is a different topic altogether) may...

To: Bill Huey <billh@...>
Cc: Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 29, 2007 - 10:31 am

This is just wrong: AFAIK nobody is stopping Con or any other people from
continuing developing SD or any other scheduler, and CFS certainly is subject
to criticism. The idea that Linux can't use other innovative ideas in the scheduler

Get real: I don't the linux development has always been "friendly". The idea
of a "GNU-hippie community" where everybody is good and helps others and
shares their pots is what the Sun bloggers seem to think that opensolaris
should resemble, but it doesn't matches the real world.
-

To: Diego Calleja <diegocg@...>
Cc: Bill Huey <billh@...>, Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 29, 2007 - 4:25 pm

Absolutely.

Con quit for his own reasons. Given that Con himself has said that CFS
was _not_ why he quite, please discard this... bait. Anyone who's name
isn't Con Kolivas, who pretends to speak for him is at the very least
overstepping his bounds, and that is being _very_ generous.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Diego Calleja <diegocg@...>, Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Bill Huey (hui) <billh@...>
Date: Sunday, July 29, 2007 - 5:48 pm

I know Con personally and I completely identify with his circumstance. This
is precisely why he quit the project because of a generally perceived
ignorance and disconnect from end users. Since you side with Ingo on many
issues politically, this response from you is no surprise.

Again, the choices that have been currently made with CFS basically locks
him out of development. If you don't understand that, then you don't
understand the technical issues he's struggled to pursue. He has a large
following which is why this has been a repeated and issue between end users
of his tree and a number of current Linux kernel developers.

bill

-

To: Bill Huey <billh@...>
Cc: Diego Calleja <diegocg@...>, Linus Torvalds <torvalds@...>, jos poortvliet <jos@...>, <ck@...>, Michael Chang <thenewme91@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 30, 2007 - 1:03 am

You're still not Con, and I still think it's inappropriate for any

Hm. I don't recall entering the world of politics. Where's my cool
lapel button?

-Mike

-

To: <ck@...>
Cc: Diego Calleja <diegocg@...>, Bill Huey <billh@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael Chang <thenewme91@...>, Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>
Date: Sunday, July 29, 2007 - 2:31 pm

Actually I have seen friendly communities around Linux and free software.=20
Like the KDE project, the ck patchset mailing list community, the=20
TuxOnIce aka suspend2 community, the SGI XFS community, the Bazaar=20
community, quite some parts of the Debian community just to name a few.

So I know that it can be different. I know that its inaccurate to talk=20
about the whole Linux kernel community. I had quite friendly contacts=20
with core Linux developers like with Ingo (yes, with Ingo!;-) or Greg=20
Kroah-Hartman.

So what would be wrong with looking at how this worked out and why and how=
=20
it would be possible to avoid loosing a talented developer?

=2D-=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

To: <ck@...>
Cc: Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 6:05 am

Linus, I seen somethimg *completely* different on the CK mailinglist. Con=20
Koliva worked up to his limits and likely beyond them to fix any and all=20
issues reported. Heck, he maintained that thing out of the kernel tree=20
for a long time and the version number 1.0 does not come from nothing, it=20
has gone through at least 50 iterations.

The only thing I know of where Con did not want to "fix" a problem, was=20
with renicing X, cause he didn't want to introduce a special case in the=20
scheduler, where a simple nice would do the trick. That said I never saw=20
serious problems with X unreniced at all.

So I think your statements here are simply not accurate and also not fair,=
=20
cause I have the impression that you did not look carefully before=20
writing them.

You speak about working together, but now I ask you: Did you ever have a=20
personal word with Con, did you ever tell him that you don't trust that=20
he can maintain the SD scheduler when its mainline? Did you ever outspoke=20
your concerns to *him*?

Granted, from a health point of view and maybe also from looking at how=20
much time a maintainer will be able to spend more time on the scheduler=20
Ingo *may* can do more than Con - if he doesn't do too much else;-). But=20
looking at personal committment actually I saw no difference between Con=20
and Ingo.

So while it may be good that CFS went in from that point of view, the way=20
the decision was made was very suitable to piss off a very talented=20
developer.

Anyway, the decision is done, Con resigned already, he gave up on it. And=20
actually when I read your mail I can understand why he did so[1]. Sure,=20
he is involved as well and I think he felt hurt on some things that in my=20
perception were meant neutral or even supporting and postive, but still I=20
disagree a lot on the tone in LKML and understand exactly why Linux=20
users, Linux desktop users away from it as much as they can. Actually I=20
do not get that as you state in one of your late...

To: <ck@...>
Cc: Martin Steigerwald <Martin@...>, Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 7:06 am

It is interesting, he mentiones a lesson to learn from Microsoft:
"'Well, historically, the most important lesson from Microsoft - and one th=
ey=20
themselves seem to have forgotten - is simply 'Give your customers what the=
y=20
want'."
But as i see all the discussion here that's what's _not_ being honored. Peo=
ple=20
request swap prefetch, it wouldn't be hard to give it to them but they=20
probably won't get it (or it takes a 5 days, 200+ messages discussion(in th=
e=20
ck list alone were already 190 messages posted about this)).
Give the people plugshed so everyone can happily be using SD instead of CFS=
- =20
no way!
There sure are more examples to be given.

Dirk.

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 5:44 am

Im not saying its perfect, not at all, neither am i saying CFS is bad,
surely CFS is much better than the old one, and i agree with what that
university test you mentioned on kerneltrap says, that CFS and SD is
basically impossible to feel difference in, EXCEPT for 3d under load,
where CFS simply can not compete with SD, theres no but, this is how it
has acted on every system ive tested, and YES, others reported it too,
whether you choose to see it or not. and others people who run games on
linux tells me the exact same thing, and i have had quite a few people

And whats the point here? If you are trying to pull the old "Con just
runs away", forget it, its a certainty that he would have put the

First off, i've personally run tests on many more machines than my own,
i've had lots of people try on their machines, and i've seen totally
unrelated posts to lkml, plus i've seen the experiences people are

As i recall, there was only 1 persons reports that were attacked, and
that was because the person repeatedly reported the EXPECTED behavior as
broken, simply because it was FAIRLY allocating the cpu time, and this
did not meet with the dudes expectations. And it was after multiple

You may not have been able to trust Con, but thats because you havent
taken the time to actually really see whats been going on, if you just
read the threads for SD you'd realize that he was more than willing to
maintain it, after all, why do you think he wrote and submitted it? you

as explained earlier, its not just my particular setup, but actually

Okay, i wasnt going to ask, but ill do it anyway, did you even read the
threads about SD? Con was extremely polite to everyone, and he did work
with a multitude of people, you seem to be totally deadlocked into the
ONE incident with a person that was unhappy with SD, simply for being a

-

To: Kasper Sandberg <lkml@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 1:50 pm

Ok, good. Has anybody tried to figure out why 3D games seem to be such a
special case?

I know Ingo looked at it, and seemed to think that he found and fixed

I don't _ever_ go on specialty mailing lists. I don't read -mm, and I
don't read the -fs mailing lists. I don't think they are interesting.

And I tried to explain why: people who concentrate on one thing tend to
become this self-selecting group that never looks at anything else, and
then rejects outside input from people who hadn't become part of the "mind
meld".

That's what I think I saw - I saw the reactions from where external people
were talking and cc'ing me.

And yes, it's quite possible that I also got a very one-sided picture of
it. I'm not disputing that. Con was also ill for a rather critical period,

Hey, maybe that one incident just ended up being a rather big portion of
what I saw. Too bad. That said, the end result (Con's public gripes about
other kernel developers) mostly reinforced my opinion that I did the right
choice.

But maybe you can show a better side of it all. I don't think _any_
scheduler is perfect, and almost all of the time, the RightAnswer(tm) ends
up being not "one or the other", but "somewhere in between".

It's not like we've come to the end of the road: the baseline has just
improved. If you guys can show that SD actually is better at some loads,
without penalizing others, we can (and will) revisit this issue.

So what you should take away from this is that: from what I saw over the
last couple of months, it really wasn't much of a decision. The difference
in how Ingo and Con reacted to peoples reports was pretty stark. And no, I
haven't followed the ck mailing list, and so yes, I obviously did get just
a part of the picture, but the part I got was pretty damn unambiguous.

But at the same time, no technical decision is ever written in stone. It's
all a balancing act. I've replaced the scheduler before, I'm 100% sure
we'll replace it again. Schedulers ...

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 3:13 pm

Is it specific to 3D? I would not think so. dosbox, bochs should have
the same issue. Games with "a lot of motion" usually implement their event
handling and screen drawing in a busy loop to get the maximum possible
frame rate.

Usually, only the GL thread would need to run at full power, and reducing the
input subsystem to a simple event-based loop (for example reading a pipe in
blocking mode). This could IMO makes games a bit more responsive.

However, most games combine the input subsystem and graphics output in one
thread. Due to the way CFS works, it may mean that processes get scheduled
too fair, though I'd suspect that a GL busy loop has no interactivity bonus at
all anyway in the old scheduler or SD.

I/O is also something that can hurt games in their framerate and/or handling
(something the user cares most about). Since I have not tried 2.6.23-rc yet, I
can only speak for the old scheduler. I have always turned cron off so that
updatedb does not run, because it makes games sluggish for some reason,
even though updatedb (or subordinate processes) don't take a lot of CPU time
according to `top`. What's more, running BOINC in the background (nice 20)
while running unreal (nice 0), everything is ok.
(But not if BOINC is at nice 0).

Time to investigate...

Jan
--
-

To: Jan Engelhardt <jengelh@...>
Cc: Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 3:34 pm

Well, one thing that would be worth doing is to simply create a trace of
time-slices for both schedulers.

It could easily be some hacky thing that just saves the process name and
TSC at each scheduling event in some fairly small fixed-sized per-CPU
circular buffer, and have a /sys interface that reads it out, and then you
do

sleep 60 ; cat /sys/cpubuffer > buffer

and play the game for 60 seconds (so that you get a buffer that represents
perhaps the last 10 seconds of play).

It could *literally* just be an effect of the time quanta used, and CFS
just deciding that it's not interactive and giving things too long of a
CPU slice.

Yes, it's what "/proc/sys/kernel/sched_granularity_ns" is supposed to
tweak, but maybe there's some misfeature there, or maybe the default is
just bad for games, or whatever.

Ingo: that sysctl_sched_granularity initialization doesn't make sense. You
talk about it being in units of nanoseconds, but then you do

2000000000ULL/HZ

which is nonsensical. That value is "2 seconds" (not 2ms like the comment
says) in nanoseconds, but then divided by HZ, so what's the meaning of
that HZ thing? Nothing in the scheduler should care about jiffies, why is
that related to HZ? All the scheduler clocks are in ns.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Wednesday, August 1, 2007 - 5:21 am

Well it really is different.

Simple test:
- run Unreal Tournament 99 (nice 0, it gets 98%,99% CPU most of the time)
- in a shell, `renice 20 $$; while :; do date; done;`

The shell only produces one or two outputs per second.
This seems different from the old-2.6 behavior, where a nice-20
process seemed to get a bit more share. (Due to interactivity bonus)

Does anyone have a cpu hog test program that spreads its cpu time
over the second rather than doing 300 ms wake and 700 ms sleep cycles

Jan
--
-

To: Jan Engelhardt <jengelh@...>
Cc: Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 5:33 pm

Btw, people who actually have 3D games installed (I have exactly one:
ppracer, and I can't really say that I care about how it feels), if you
don't have CONFIG_HZ=1000, this really is worth testing.

I think Ingo probably ran with CONFIG_NO_HZ and HZ_1000, but the default
timer tick is actually 250Hz, which makes all the default scheduler values
come out four times bigger than they are documented/supposed to be.

On SMP, that scheduler granularity then gets doubled once more if you have
two CPU's, so rather than 2ms by default, it ends up being 16ns (and the
time slices themselves end up being bigger than that).

So doing some testing with a simple

echo 2000000 > /proc/sys/kernel/sched_granularity_ns
echo 1000000 > /proc/sys/kernel/sched_batch_wakeup_granularity_ns
echo 8000000 > /proc/sys/kernel/sched_runtime_limit_ns

might be worth doing (and if you vary numbers to see if it matters,
please do let people know!)

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 5:55 pm

I generally run with CONFIG_HZ=100, CONFIG_NO_HZ=n, CONFIG_PREEMPT_NONE.

Jan
--
-

To: Jan Engelhardt <jengelh@...>
Cc: Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 6:22 pm

Ok, that's HZ=100 is likely the worst case, as it effectively multiples
all the scheduler latencies by 10 (rather than by 4, which is what the
default 250Hz does).

That said, I think most testing showed that the CFS scheduler tunables
didn't have a huge amount of impact on how things felt, so that
factor-of-ten might not even matter that much. The 3D game issues may well
be totally elsewhere.

But it's certainly worth looking at.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, CK Mailinglist <ck@...>
Date: Saturday, July 28, 2007 - 2:07 pm

Yes, but the various patches i've recieved seems to not solve it, it
simply changed the load at which CFS seemed to perform well.

On irc there has been wild speculation as to whether its the
sched_yield() stuff in most 3d drivers, but my tests with stubbing it

well, as far as my tests show, the only real difference between SD and
CFS in terms of performance, is 3d, where both will deliver basically
the same FPS in a given application, SD does it smooth, which is the
best way to explain it, what happens with CFS, as i experience it, is

I really think you should try read the SD and RSDL threads on lkml
again, the only place where con havent been extremely fourthcoming was
deep in the thread where Mike was unhappy with SD not giving X more

-

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, CK Mailinglist <ck@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 3:36 am

Not even Con thought SD was perfect, so this is being more than a
little dishonest.
One of his parting comments on the ck list was a list of things that
could be fixed/improved.

My experience is vastly different to yours, perhaps because I have
been subscribed to his mailing list for many years (too many to count)
and have run his patchset in various environments in that period - and
you have not. Con was always very helpful to people experiencing
problems and did in fact work with them to get them resolved. The
list is web-archived so everyone is free to go see that for
themselves. He also tried to get others interested and involved in
kernel development at large. SD itself went through 46 revisions
because of things people encountered using it, and it would have been
more still considering what Con had in the works had he not been
pushed out.

I can see how on LKML your viewpoint differs, though to be fair in my
recollection there was only one person Con argued with, and that man
is a belligerent troll. Its my honest opinion that the problems that
troll encountered were completely made up, which is backed by the
evidence that no-one else had encountered or indeed could even
reproduce them. I recall Con himself catching the troll out in a
lie-based "proof" on one occasion. I'll hunt gmane for the link as I
believe people like that need to be exposed and stopped. There
certainly was a lot of hot air and handwaving, and now that one other
tiny portion of Con's work has been raised its still going on. Its
interesting that the same cycle repeats even when Con is no longer
involved, which proves Con could not have been the issue.

I'm sorry you in particular haven't been able to have the same
experience with Con as so many others have, especially considering who
you are and the weight your words have. You've lost a really great
asset and aren't even aware of it. That's really sad for everyone.

(fwiw the -ck list did a lot of the testing for CFS recently, and over
the...

To: <ck@...>
Cc: Matthew Hawkins <darthmdh@...>, Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 6:40 am

I fully agree to this. Being part of the ck mailing list community was fun
and I really appreciate the friendly and supporting tone here. From the
mails I ever read from the LKML - I do not read it regularily at all - I
got the impression that its members can learn *a lot* from the ck mailing
list community. And also from the TuxOnNice mailing list community.

For example on how to encourage users to send in their feedback and test
kernel subsystems. And from what I heard again and again its exactly
testing that is lacking to a great degree.

Actually even CFS was helped by the ck mailinglist community.

The tone on the ck mailinglist community encouraged me to compile kernel
patches, try out latest ck patchsets and then when CFS could not do the
same smooth music playback on my Amarok machine than SD try out a ton of
patches from Ingo Molnar to get those regressions (compared to SD) fixed.

But all the times I stayed away from LKML and still do not feel that much
motivation to read in it regularily. Actually my own perceptions matches
what Con said in his goodbye interview[1]: It *scares* me away. Its this
elitist "I know it better than you and what do you want anyway" that in
my eyes demotivate a lot of users to bring in their feedback.

There are just about 9000 bugs in the kernel bugtracker and about 150000
bugs in the KDE bugtracker. Granted KDE bugtracker includes a lot of
applications, but still I think the number of bug reports in the kernel
bugtracker is ridicolously low. And I think thats because many users
don't bother to report bugs upstream for the Linux kernel, not because
that those bugs aren't there.

I hope that the ck mailing list community will continue to be active and
possibly try to get swap prefetch and some other goodies of the ck
patchset into mainline. And I think it would also be a good idea for ck
mailing list community to report desktop related issues in the kernel
bugtracker. I think I will take the courage next time I find a...

To: Martin Steigerwald <Martin@...>
Cc: <ck@...>, Matthew Hawkins <darthmdh@...>, Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 12:10 pm

A word of caution about bugzilla.kernel.org, to those who don't know
already: By far not all maintainers and developers use bugzilla.
I don't know for which subsystems it makes sense to file a report in
bugzilla. I think your best bet is to report at the mailinglists
listed in linux/MAINTAINERS.
--
Stefan Richter
-=====-=-=== -=== ===--
http://arcgraph.de/sr/
-

To: Stefan Richter <stefanr@...>
Cc: Martin Steigerwald <Martin@...>, <ck@...>, Matthew Hawkins <darthmdh@...>, Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 12:21 pm

Hi,

Please CC all bug reports to LKML. I've got a large mailbox, but I
don't want to subscribe all linux-* mailing lists :).

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-

To: Linus Torvalds <torvalds@...>
Cc: Kasper Sandberg <lkml@...>, CK Mailinglist <ck@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, July 28, 2007 - 3:09 am

I don't really want to keep all that -ck flamewar going but this sum-up is
a little strange for me:

If Con was thinking SD was "perfect" why he released 30+ versions of it?
And who knows how many versions of his previous scheduler?

Besides Con always tried to help people and improve his code if some bugs
or problems were reported. Archives of this list prove that. I reported
several problems (on list and privately) and all were fixed very fast and
with very kind responses. I had run -ck for months and years and it was
always very stable (I remember one broken "stable" version).

I don't know what exactly are you refering to when you say about those
unaddressed reports but maybe it depends on who was asking, how and to do
what (for example - purely theoretical one, I don't remember exact emails
you refering to so I am not saying it happened - stating at the beginning
that the whole design is unacceptable and interactivity hacks are a
must-have won't make a friend from any maintainer and for sure lowers his
desire to get anything fixed for that guy). Or maybe Con had some bad day
or was depressed. Happens. But I really don't remember Con ignoring too
many valuable user reports in last 3 years...

And no - I am not thinking that SD was "perfect". Nothing is perfect,
especially not software. But it was based on months and years of Con's
experience with desktop and gaming workloads and extensively tested in
similar uses by _many_ others. In nearly all possible desktop
configurations, with most games and all video drivers. This is why it was
perfectly designed and tuned for such workloads while still being general
enough and without any ugly hacks. And because of these tests and Con's
believe that the desktop is very (most?) important all bugs and problems
in this area were probably killed long ago. I think even design was
changed and tuned a little at the early stages to help solve such
interactivity/dekstop/gaming problems.

So it does not surprise me that ...

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, <ck@...>
Date: Friday, July 27, 2007 - 7:43 am

Im still not so keen about this, Ingo never did get CFS to match SD in
smoothness for 3d applications, where my test subjects are quake(s),
world of warcraft via wine, unreal tournament 2004. And this is despite
many patches he sent me to try and tweak it. As far as im concerned, i
may be forced to unofficially maintain SD for my own systems(allthough
lots in the gaming community is bound to be interrested, as it does make
games lots better)

<snip>

-

To: Kasper Sandberg <lkml@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Sunday, July 29, 2007 - 1:06 pm

here's an update: checking whether Wine could be a factor in your
problem i just tested latest CFS against latest SD with a 3D game
running under Wine: v2.6.22-ck1 versus v2.6.22-cfsv19 (to get the
most comparable kernel), using Quake 3 Arena Demo under Wine (0.9.41).
Here are the results in a pretty graph:

http://people.redhat.com/mingo/misc/cfs-vs-sd-wine-quake.jpg

or, in text:

2.6.22-ck1 2.6.22-cfs-v19
------------------------ ------------------------
quake + 0 loops | 41 fps quake + 0 loops | 41 fps
quake + 1 loop | 3 fps quake + 1 loop | 41 fps
quake + 2 loops | 2 fps quake + 2 loops | 32 fps
quake + 3 loops | 1 fps quake + 3 loops | 24 fps
quake + 4 loops | 0 fps quake + 4 loops | 20 fps
quake + 5 loops | 0 fps quake + 5 loops | 16 fps

Quake3-under-Wine behavior under SD/-ck: framerate breaks down massively
during any kind of load. The game is completely unusable with 1 CPU loop
running already!

Quake3-under-Wine behavior under CFS: framerate goes down gently with
load, gameplay remains smooth. Framerate is still pretty acceptable and
the game is playable even with a 500% CPU overload. The graph looks good
and the framerate reduction goes roughly along the expected 1/n
'fairness curve' - so it all looks pretty healthy. [Note: quake3 keeps
its fully 41 fps even with 1 competing loop running on the CPU due to
"sleeper fairness".]

[ i've re-tested this using other SD and ck versions and other CFS
versions such as v2.6.23-rc1 and the results are the same. To get the
fps result i started a simple game scene: Single Player /
Q3DM1 / I Can Win, turned on the fps display of Quake3, and did not
move the player at all, just looked at the framerate that is
displayed. (i also tried other scenes and other gameplay sections and
they all behave consistently with the above results.) The system was
otherwise ...

To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Monday, July 30, 2007 - 7:46 pm

I believe the responsibility for my situation is both IO and cpu load. i
dont know why SD does this. my test is to make spamasassin process mails
while i have these applications running(and wine is most sensitive, the
difference is almost negligable in the native applications, but very
much noticable with wine+wow)

could perhaps be filesystem related, i have my maildir(extremely large)
on reiserfs, and /home on xfs. what my mail client will do is download
mail, spamasassin it(loading database from home), then it will put to
imap server placing it on reiserfs, and then a "local" copy in my home.

while i only see the spamasassin thread as hogging cpu, i suspect IO is

-

To: Kasper Sandberg <lkml@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Tuesday, July 31, 2007 - 2:31 am

Ooh, do you perchance have PREEMPT_BKL=y?

If so, try on another filesystem than reiserfs (or disable PREEMPT_BKL,
but that is obviously the lesser of the two choices).

Ingo traced a 1+ second latency at my end to BKL priority inversion
between tty and reiserfs.

-

To: Peter Zijlstra <peterz@...>
Cc: Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Tuesday, July 31, 2007 - 4:57 am

ah, indeed, that makes quite a bit of sense. Almost all of the Reiser3
code runs under the BKL, and the only other major kernel infrastructure
that has BKL dependencies is the TTY code. Kasper, as a debugging
matter, could you try to move that spamassassin workload off into a
non-Reiser3 filesystem and/or disable PREEMPT_BKL? If that makes a
noticeable difference (for the better ;) then we can continue figuring
out what's happening exactly.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Wednesday, August 1, 2007 - 10:35 pm

Also NFS:

$ grep -rIi lock_kernel kernel-source/linux-2.6.17/fs/nfs/ | wc -l
94

Lee
-

To: Lee Revell <rlrevell@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Thursday, August 2, 2007 - 9:03 am

All the file locking code (the nfs-related stuff in fs/lockd/, and also
the vfs code in fs/locks.c) is under the kernel lock. I doubt it's held
very long unless you have ridiculous numbers of processes requesting
locks on the same file, but I don't know.

--b.
-

To: Lee Revell <rlrevell@...>
Cc: Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Thursday, August 2, 2007 - 7:45 am

yeah - but i never saw NFS cause really big BKL latencies. IIRC it uses
the BKL mostly for archaic reasons, most of the NFS code is SMP-safe.
Almost all of the reiser3 code runs under the BKL on the other hand.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Lee Revell <rlrevell@...>, Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Thursday, August 2, 2007 - 9:39 am

We're still working on fixing the NFS case, but as everyone knows,
finding those last few obscure code sections which still depend on BKL
protection can be tedious work...

Trond

-

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <peterz@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Wednesday, August 1, 2007 - 7:43 pm

sorry late response.

nope, i run totally without preemption, i did however test with, it
seemed to not matter in terms of smoothness, but reduced the throughput

the pricess is as this:
mail client fetches mail
mail client invokes spamasassin
if spam -> spam
else filtering
if it matches certain filters, it gets put into my imap server, which is

-

To: Kasper Sandberg <lkml@...>
Cc: Peter Zijlstra <peterz@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Thursday, August 2, 2007 - 8:10 am

do you have any filesystem that is not reiserfs? If yes, could you, as a
test, check whether file activities on _that_ file system still cause
these lags, or is the lag purely connected to the reiser3 filesystem?

Ingo
-

To: Kasper Sandberg <lkml@...>
Cc: Peter Zijlstra <peterz@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Friday, August 3, 2007 - 2:31 am

i still have little debug info from you to start from: no
cfs-debug-info.sh output of the problematic workload and no kernel
.config.

i tried to reproduce your problems based on your existing description: i
did a lot of reiser3 testing yesterday and i also wrote a 'BKL latency
simulator' (which does a faux lock_kernel() + unlock_kernel() so that
the testcode runs into the BKL all the time) - but still this had no
visible effect on desktop latencies so either i have some subtle
difference in my setup or this aspect of your workload is not the cause
of the smoothness problem.

could please give us more debug info and try to simplify the "bad" case
down to something that can be pinpointed and triggered more exactly? Do
you see any particular 'ruckle' in the 3D game when you see a smoothness
problem? Anything that we could clearly label as 'anomalous latency' in
a tracer output? (in that case i'll send you tracing patches so that we
can catch a trace of that 'hickup')

You said the imap stuff could be causing the smoothness problem: as a
debugging thing could you try to renice all the imap activities (imap
daemon / mailer) to nice +19, does that make the game magically smooth
again? If yes then this is an indicator that the problem is interaction
between the game and the imap activities.

Ingo
-

To: Kasper Sandberg <lkml@...>
Cc: Peter Zijlstra <peterz@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Thursday, August 2, 2007 - 11:42 am

Kasper,

could you please try the "chew-max" latency-printing utility:

http://people.redhat.com/mingo/cfs-scheduler/tools/chew-max.c

if you start it on an idle system it prints a single line:

$ ./chew-max
pid 14506, prio 0, interval of 99984800 nsec

and prints nothing else. It continues looping and looping (using up 100%
of CPU time), and the moment it's preempted, it prints a line about that
preemption latency. Under higher load it will print something like this:

out for 63 ms [max: 66], ran for 5 ms, load 7
out for 85 ms [max: 85], ran for 4 ms, load 5
out for 7 ms [max: 85], ran for 0 ms, load 0
out for 105 ms [max: 105], ran for 3 ms, load 3
out for 174 ms [max: 174], ran for 6 ms, load 3
out for 219 ms [max: 219], ran for 3 ms, load 1
out for 78 ms [max: 219], ran for 3 ms, load 3

so that we get a picture of your latencies, could you run this tool why
you are seeing those 'bad' desktop latencies? (Since your CPU has two
cores it might make sense to run two instances of chew-max.)

record the latencies like this:

./chew-max > chew1.out &
./chew-max > chew2.out &

and send us the chew1.out and chew2.out files (bzip2 -9 compressed).
Thanks!

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <peterz@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Wednesday, August 8, 2007 - 10:38 am

First off, sorry for the late response.

bad is not the exact word, its pretty good, certainly better than old

i've attached it(bzip2'ed)

i've come to think it is IO related, but not entirely related to
reiserfs, perhaps xfs.

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Tuesday, July 31, 2007 - 5:11 am

And half the ioctls some of which trigger long code sections.

For the tty layer I'm waiting for the revoke code to get finished up and
move from -mm into Linus tree. At that point the real evil lock_kernel
related stuff in the tty layer can switch to using the revoke code for
hangup paths and then other bits can be tackled.

Alan
-

To: Alan Cox <alan@...>
Cc: Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Tuesday, July 31, 2007 - 5:13 am

the tty layer has relatively short BKL latencies, but reiser3 has long
ones, so those got transposed over to the tty layer, making it all quite
noticeable to the user.

BKL contention is not a big issue on the desktop _unless_ there's at
least one workload that creates really long BKL latencies. That
multiplexes it out to all the other BKL-using subsystems too.

the DRI/DRM BKL use was a problem some time ago, but i think it's now
using unlocked_ioctl(), correct? All the other ioctls are rare enough to
not really matter.

with PREEMPT_BKL there's also some sort of random effect of priority
inversion that makes the actual latencies depend on the scheduler - but
we dont understand that effect exactly, yet. (hopefully Kasper can help
us out with that. Peter got rid of his reiser3 partition the moment the

oh, wonderful! Alan, you are a true wizard :-) The tty layer is one of
the very few pieces of kernel code that scares the hell out of me :-)

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Alan Cox <alan@...>, Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Tuesday, July 31, 2007 - 5:19 am

Maybe it should be kept crufty then. Every kernel developer should have
at least one part of the kernel he's afraid to go into ;-)

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <peterz@...>, Kasper Sandberg <lkml@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <ck@...>
Date: Tuesday, July 31, 2007 - 5:44 am

I'm not too fond of the way it does some stuff either especially the open

floppy.c is sufficient
-

To: Linus Torvalds <torvalds@...>
Cc: <linux-kernel@...>, <akpm@...>
Date: Monday, July 23, 2007 - 2:38 pm

Managed to hit BUG_ON() in kmap_atomic_prot() three times while doing
nothing unusual for this box (two times it was under X, so I can't
guarantee, one time while trying to reproduce via ./configure in gdb
tarball)

Box has 2.5G of RAM. 2.6.22 was OK.

[dives into framebuffer console setup for complete oops]

CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_BLOCK=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_X86_PC=y
CONFIG_MPENTIUM4=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_MCE=y
CONFIG_VM86=y
CONFIG_HIGHMEM4G=y
CONFIG_VMSPLIT_3G=y
CONFIG_PAGE_OFFSET=...

To: Linus Torvalds <torvalds@...>
Cc: <linux-kernel@...>, <akpm@...>
Date: Monday, July 23, 2007 - 3:01 pm

kernel BUG at arch/i386/mm/highmem.c:38
PREEMPT DEBUG_PAGEALLOC SLAB
EIP at kmap_atomic_prot+0x32/0x93
get_page_from_freelist
__alloc_pages
cache_alloc_refill
cache_alloc_refill
kmem_cache_alloc
dst_alloc
dst_alloc
__ip_route_output_key
[some junk I don't trust]

eax: 0000000c
ebx: 00000003
ecx: c065efe0
edx: 00000003
edi: 00000163

c010cc9b <kmap_atomic_prot>:
c010cc9b: 57 push %edi
c010cc9c: 56 push %esi
c010cc9d: 53 push %ebx
c010cc9e: 89 c6 mov %eax,%esi
c010cca0: 89 d3 mov %edx,%ebx
c010cca2: 89 cf mov %ecx,%edi
c010cca4: b8 01 00 00 00 mov $0x1,%eax
c010cca9: e8 dd 1b 00 00 call c010e88b <add_preempt_count>
c010ccae: e8 b1 ac 0e 00 call c01f7964 <debug_smp_processor_id>
c010ccb3: 6b c0 0d imul $0xd,%eax,%eax
c010ccb6: 8d 14 03 lea (%ebx,%eax,1),%edx
c010ccb9: 8d 04 95 00 00 00 00 lea 0x0(,%edx,4),%eax
c010ccc0: 8b 0d 30 a1 3e c0 mov 0xc03ea130,%ecx
c010ccc6: 29 c1 sub %eax,%ecx
c010ccc8: 83 39 00 cmpl $0x0,(%ecx)
c010cccb: 74 04 je c010ccd1 <kmap_atomic_prot+0x36>
c010cccd: 0f 0b ud2a

-

To: Alexey Dobriyan <adobriyan@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 4:24 pm

On Mon, 23 Jul 2007 23:01:52 +0400

Yeah, I hit this several times a few days ago. Same story: it just
randomly went splat in response to no obvious stimulus. Reported it to

I had more complete info: http://article.gmane.org/gmane.linux.network/66966

You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.

I haven't worked out where that kmap_atomic() call is coming from yet.
Both traces point up into the page allocator, but I _think_ that's stack
gunk.
-

To: Andrew Morton <akpm@...>
Cc: Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 6:01 am

My box bugged during boot the first time I booted 23-rc1, but nothing
made it to the console, and I didn't have a serial console running. I

I just enabled all debug options, and was just rewarded with the below.

[ 119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
[ 119.558867] ------------[ cut here ]------------
[ 119.572197] kernel BUG at arch/i386/mm/highmem.c:38!
[ 119.585804] invalid opcode: 0000 [#1]
[ 119.598013] PREEMPT SMP DEBUG_PAGEALLOC
[ 119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore snd_page_alloc i2c_i801 fan thermal processor
[ 119.698063] CPU: 1
[ 119.698065] EIP: 0060:[<c011cd2d>] Not tainted VLI
[ 119.698067] EFLAGS: 00010006 (2.6.23-rc1-smp #75)
[ 119.736358] EIP is at kmap_atomic_prot+0xa7/0xab
[ 119.749647] eax: 3d07f163 ebx: c166db80 ecx: c0750e60 edx: 00000007
[ 119.765417] esi: 00000022 edi: 00000163 ebp: c069dcd4 esp: c069dcc8
[ 119.781273] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[ 119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 task.ti=f477d000)
[ 119.804068] Stack: c166db80 00000000 c166db80 c069dcdc c011cd3f c069dd40 c015b6e0 00000001
[ 119.822272] 00000044 00000163 00000000 00000001 c165f4e0 00000001 c165f4e0 00000001
[ 119.840762] 00000000 00028020 c061e71c c166db80 00000046 00000080 00000001 c011e4de
[ 119.859389] Call Trace:
[ 119.881302] [<c0105144>] show_trace_log_lvl+0x1a/0x30
[ 119.896319] [<c01051ff>] show_stack_log_lvl+0xa5/0xca
[ 119.911171] [<c0105420>] show_registers+0x1fc/0x343
[ 119.925756] [<c0105689>] die+0x122/0x249
[ 119.939241] ...

To: Mike Galbraith <efault@...>
Cc: Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>, Christoph Lameter <clameter@...>
Date: Tuesday, July 24, 2007 - 12:28 pm

See, networking's kmem_cache_alloc(..., __GFP_ZERO) ended up calling into
the page allocator with __GFP_ZERO. This is the bug - slab isn't supposed
to do that: the __GFP_ZERO is supposed to be removed.

Now, it's not a highmem page, so prep_zero_page() won't actually establish
a kmap, but it will check that the kmap slot is presently unused on this
CPU.

But networking calls in here from softirq context (illegal for KM_USER0)
and sometimes that KM_USER0 slot *will* be in use, so kmap_atomic_prot()
will go BUG.

I must say it's really really scary that such a low-level function as
prep_zero_page() is using KM_USER0. I don't think it has enough debugging
checks in there to prevent Bad Stuff from going undetected.

I guess this was the bug:

--- a/mm/slab.c~a
+++ a/mm/slab.c
@@ -2776,7 +2776,7 @@ static int cache_grow(struct kmem_cache
* 'nodeid'.
*/
if (!objp)
- objp = kmem_getpages(cachep, flags, nodeid);
+ objp = kmem_getpages(cachep, local_flags, nodeid);
if (!objp)
goto failed;

_

I don't see why you later got fs corruption - afacit we won't actually
-

To: Andrew Morton <akpm@...>
Cc: Mike Galbraith <efault@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>, Christoph Lameter <clameter@...>
Date: Tuesday, July 24, 2007 - 2:25 pm

Looks very likely to me. Mike, Alexey, does this fix things for you?

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>, Christoph Lameter <clameter@...>
Date: Wednesday, July 25, 2007 - 1:09 am

I don't have very much runtime on it yet, but yes, it seems to have.

-Mike

-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, <netdev@...>, Christoph Lameter <clameter@...>
Date: Tuesday, July 24, 2007 - 4:05 pm

Yeah, box is running for more than hour, survived LTP, gdb testsuite,
portage sync and so on.

What new to me is that someone decided TSC is unstable but that's
probably OK on full debug kernel:

Clocksource tsc unstable (delta = 91724418 ns)
Time: pit clocksource has been installed.

-

To: Alexey Dobriyan <adobriyan@...>
Cc: LKML <linux-kernel@...>
Date: Wednesday, July 25, 2007 - 1:44 pm

[Alexey Dobriyan - Wed, Jul 25, 2007 at 12:05:28AM +0400]
| On Tue, Jul 24, 2007 at 11:25:22AM -0700, Linus Torvalds wrote:
| > On Tue, 24 Jul 2007, Andrew Morton wrote:
| > > I guess this was the bug:
| >
| > Looks very likely to me. Mike, Alexey, does this fix things for you?
|
| Yeah, box is running for more than hour, survived LTP, gdb testsuite,
| portage sync and so on.
|
| What new to me is that someone decided TSC is unstable but that's
| probably OK on full debug kernel:
|
| Clocksource tsc unstable (delta = 91724418 ns)
| Time: pit clocksource has been installed.
|

Hi Alexey,

actualy I've tsc unstabled even with 2.6.21.3 ;)
Someone think that is 'cause of dynamic clock speed
changing

http://ussg.iu.edu/hypermail/linux/kernel/0707.1/0873.html

Cyrill

-

To: Andrew Morton <akpm@...>
Cc: Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 6:37 am

Hm. I just also experienced filesystem corruption when I tried to send
from that kernel, and it bugged in the process. My mount table ended up
in /etc/resolv.conf along with some binary goop, making nscd rather
unhappy after reboot. fsck time.

.Mike

-

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 4:40 pm

Ahh, you suspect networking.

Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
receiving ~20 junk packets per second, one gathering netconsole output
and ssh to it, no conntracks and fancy stuff.

[reboots with cables physically unplugged]

-

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 5:01 pm

OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
small files) with both cables unplugged. It all went fine for ~5 minutes
after that it crashed exactly same way after 10 secs after plugging one
of them.

-

To: Alexey Dobriyan <adobriyan@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 5:11 pm

On Tue, 24 Jul 2007 01:01:53 +0400

It'd be nice to get a clean trace. Are you able to obtain the full
trace with CONFIG_FRAME_POINTER=y?
-

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 6:04 pm

Sorry, no camera shot, finding camera requires wakening up M. :)

It took longer that usual, but here it is

kmap_atomic
get_page_from_freelist
__alloc_pages
cache_alloc_refill
__alloc_pages
cache_alloc_refill
kmem_cache_alloc
dst_alloc
ip_route_input
ip_rcv
netif_receive_skb
rtl8139_poll
net_rx_action
__do_softirq
do_softirq
irq_exit
do_IRQ
common_interrupt
handle_mm_fault
do_page_fault
error_core

much more loaded x86_64 box near also running 2.6.23-rc1 with debugging
turned on, using atl1 driver doesn't experience any crashes.

And I found 2.6.22-b91cba52e9b7b3f1c0037908a192d93a869ca9e5-x entry on
top of grub config which means b91cba52e9b7b3f1c0037908a192d93a869ca9e5
_without_ any debugging was OK.

-

To: Alexey Dobriyan <adobriyan@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 6:27 pm

On Tue, 24 Jul 2007 02:04:46 +0400

I worked out that the crash I saw was in

BUG_ON(!pte_none(*(kmap_pte-idx)));

in the read of kmap_pte[idx]. Which would be weird as the caller is using
a literal KM_USER0.

So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
unable to reproduce it now).

If that BUG_ON _is_ triggering then it might indicate that someone is doing
a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.

If they're holding an atomic kmap then they'll be running in_atomic so it
is unlikely that they accidentally added __GFP_WAIT because lots of people
would be getting lots of might_sleep() warnings.

Hence that first VM_BUG_ON in prep_zero_page() _should_ be triggering.

Do you have CONFIG_DEBUG_VM enabled?

Also, it might be useful to apply -mm's kmap_atomic-debugging.patch. it
will detect lots of abuse.

-

To: Andrew Morton <akpm@...>
Cc: Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>, <mark.fasheh@...>
Date: Tuesday, July 24, 2007 - 4:17 am

Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the
address. I've seen both of those end up triggering that BUG_ON() in a
later kmap.

Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in
ocfs2 at least. But you are probably not using that, so I'll keep
looking...

---

[PATCH] ocfs2: bad kunmap_atomic()

kunmap_atomic() takes the virtual address, not the mapped page as
argument.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 5727cd1..c4034f6 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2153,7 +2153,7 @@ static int ocfs2_splice_write_actor(struct pipe_inode_info *pipe,
src = buf->ops->map(pipe, buf, 1);
dst = kmap_atomic(page, KM_USER1);
memcpy(dst + offset, src + buf->offset, count);
- kunmap_atomic(page, KM_USER1);
+ kunmap_atomic(dst, KM_USER1);
buf->ops->unmap(pipe, buf, src);

copied = ocfs2_write_end(file, file->f_mapping, sd->pos, count, count,

--
Jens Axboe

-

To: Andrew Morton <akpm@...>
Cc: Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>, <mark.fasheh@...>, <dan.j.williams@...>
Date: Tuesday, July 24, 2007 - 4:22 am

What about the new async crypto stuff? I've been looking, but is it
guarenteed that async_memcpy() runs in process context with interrupts
enabled always? If not, there's a km type bug there.

In general, I think the highmem stuff could do with more safety checks:

- People ALWAYS get the atomic unmaps wrong, passing in the page instead
of the address. I've seen tons of these. And since kunmap_atomic()
takes a void pointer, nobody notices until it goes boom.
- People easily get the km type wrong - they use KM_USERx in interrupt
context, or one of the irq variants without disabling interrupts.

If we could just catch these two types of bugs, we've got a lot of these
problems covered.

--
Jens Axboe

-

To: Jens Axboe <jens.axboe@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>, <mark.fasheh@...>
Date: Tuesday, July 24, 2007 - 9:55 am

Currently the only user is the MD raid456 driver, and yes, it only
performs copies from the handle_stripe routine which is always run in
process context with interrupts enabled. However this is not
documented. Would it be advisable to add a WARN_ON for this
--
Dan
-

To: Jens Axboe <jens.axboe@...>
Cc: Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>, <mark.fasheh@...>, <dan.j.williams@...>, Nelson, Shannon <shannon.nelson@...>
Date: Tuesday, July 24, 2007 - 4:34 am

yeah, it's a real trap. For a while I had a patch which converted
kmap_atomic() to return a char*, and kunmap_atomic() to take a char*, so
misuse got compile warnings. But it was a pig to maintain so I tossed it.
It'd be somewhat easier to do now we've converted a lot of callers to

Here's the -mm debug patch:

diff -puN arch/i386/mm/highmem.c~kmap_atomic-debugging arch/i386/mm/highmem.c
--- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
+++ a/arch/i386/mm/highmem.c
@@ -30,7 +30,44 @@ void *kmap_atomic(struct page *page, enu
{
enum fixed_addresses idx;
unsigned long vaddr;
+ static unsigned warn_count = 10;

+ if (unlikely(warn_count == 0))
+ goto skip;
+
+ if (unlikely(in_interrupt())) {
+ if (in_irq()) {
+ if (type != KM_IRQ0 && type != KM_IRQ1 &&
+ type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
+ type != KM_BOUNCE_READ) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ } else if (!irqs_disabled()) { /* softirq */
+ if (type != KM_IRQ0 && type != KM_IRQ1 &&
+ type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
+ type != KM_SKB_SUNRPC_DATA &&
+ type != KM_SKB_DATA_SOFTIRQ &&
+ type != KM_BOUNCE_READ) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ }
+ }
+
+ if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ ||
+ type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ) {
+ if (!irqs_disabled()) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
+ if (irq_count() == 0 && !irqs_disabled()) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ }
+skip:
/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
pagefault_disable();

_

-

To: Andrew Morton <akpm@...>
Cc: Jens Axboe <jens.axboe@...>, Alexey Dobriyan <adobriyan@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>, <mark.fasheh@...>, Nelson, Shannon <shannon.nelson@...>
Date: Tuesday, July 24, 2007 - 10:00 am

On 7/24/07, Andrew Morton <akpm@linux-foundation.org> wrote:

I am looking after the async_tx API, I will send a patch to update
MAINTAINERS shortly.

--
Dan
-

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 1:20 am

I hit it only once with this patch applied, but there were no additional
warnings.

-

To: Andrew Morton <akpm@...>
Cc: Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 5:28 pm

If you are talking about

http://userweb.kernel.org/~akpm/dsc03659.jpg

then I think that _is_ a full trace. It's certainly not very messy, and it
seems accurate. It's just that inlining makes it much harder to see the
call-graphs, but that's what inlining does..

For example, missing from the call graph is

get_page_from_freelist ->
buffered_rmqueue -> [ missing - inlined ]
prep_new_page -> [ missing - inlined ]
prep_zero_page -> [ missing - inlined ]
clear_highpage -> [ missing - inlined ]
kmap_atomic -> [ missing - tailcall ]
kmap_atomic_prot

but I'm pretty sure the call trace is good (and I'm also pretty sure gcc
is overly aggressive at inlining, and that it causes us pain for
debugging, but whatever)

The earlier part of the trace looks fine too.

The only odd part I see is the existence of "dput()" there, so maybe it's
not *quite* clean and enabling frame pointers might get rid of a few bogus
entries, but it looks pretty close to clean.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 1:59 pm

For prep_zero_page() and clear_highpage() we can't blame gcc since we
force gcc to always inline them.

buffered_rmqueue() and prep_new_page() are static functions with only
one caller each, and for the normal non-debug case it's a really nice
optimization to have them inlined automatically. But it might make sense
to add -fno-inline-functions-called-once to the CFLAGS depending on some

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 2:14 pm

I'm not at all sure I agree.

Inlining big functions doesn't actually tend to generally generate any
better code, so if gcc's logic is "single callsite - always inline", then
that logic is likely not right.

Linus
-

To: Jeff Garzik <jeff@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Thursday, July 26, 2007 - 2:09 am

Only up to a threshold, as far as I know.

-hpa
-

To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 2:28 pm

On Tue, 24 Jul 2007 11:14:21 -0700 (PDT)

fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
vmlinux .text from 928360 up to 955362 bytes (27k larger).

A surprisingly large increase - I wonder if it did something dumb. It
appears to still correctly inline those things which we've manually marked
inline. hm.

It would be nice to defeat the autoinlining for debug purposes though.
-

To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 3:15 pm

I think inlining small enough functions is worth it, and the thing is, the
kernel is actually pretty damn good at having lots of small functions.
It's one of the few things I really care about from a coding style
standpoint.

So I'm not surprised that "-fno-inline-functions-called-once" makes things
larger, because I think it's generally a good idea to inline things that
are just called once. But it does make things harder to debug, and the
performance advantages become increasingly small for bigger functions.

And that's a balancing act. Do we care about performance? Yes. But do we
care so much that it's worth inlining something like buffered_rmqueue()?

So I would not be surprised if "-fno-inline-functions-called-once" will
disable *all* the inlining heuristics, and say "oh, it's not an inline
function, and it's only called once, so we won't inline it at all".

So "called once" should probably make the inlining weight bigger (ie
inline *larger* functions than you would otherwise), it just shouldn't
make it "infinite". It's not worth it.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 3:40 pm

When using CONFIG_CC_OPTIMIZE_FOR_SIZE=y we even actively tell gcc that

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 3:48 pm

In this case, it was a pain to just even try to find the call chain, or
read the asm.

I would encourage lots of kernel hackers to read the assembler code gcc
generates. I suspect people being aware of code generation issues (and
writing their code with that in mind) is a *much* bigger performance
impact than gcc inlining random functions.

So maybe I'm old-fashioned and crazy, but "readability of the asm result"
actually is a worthwhile goal. Not because we care directly, but because
I'd like to encourage people to do it, due to the *indirect* benefits.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Thursday, July 26, 2007 - 2:07 pm

Optimization versus debugging is a common issue...

As I said, it might make sense to disable this optimization depending on

This would lead to people trying to optimize code for one gcc version -
and the code might stay this way for 10 years.

People should write readable C code. This also has the best chances of
resulting in good performance with the next gcc version on the next

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Thursday, July 26, 2007 - 2:19 pm

No. The fact is, code that is easy to optimize is easy to optimize.

It has _nothing_ to do with gcc versions.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 4:27 pm

There's probably a --param where it can be tweaked exactly. The
problem is that --params tend to be very gcc version specific
and might do something completely different on a newer or
older version. So it's better not to use them.

-Andi
-

To: Andi Kleen <andi@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Tuesday, July 24, 2007 - 3:45 pm

I agree wholeheartedly with that sentiment. We've tried at times (because
some gcc snapshots made some *truly* insane choices for a while), and
maybe we still have some around. Not worth the pain.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Alexey Dobriyan <adobriyan@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, July 23, 2007 - 5:37 pm

mm/page_alloc.c:static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
include/linux/highmem.h:static inline void clear_highpage(struct page *page)

So at least two was explicit marked inline.
Now if that made I change i dunno.

Sam
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, <linux-acpi@...>, <len.brown@...>
Date: Monday, July 23, 2007 - 12:43 pm

I get some ACPI Exception.

...

[ 33.075429] ACPI Exception (processor_throttling-0084): AE_NOT_FOUND, Evaluating _PTC [20070126]
[ 33.075437] ACPI Exception (processor_throttling-0147): AE_NOT_FOUND, Evaluating _TSS [20070126]
[ 33.075490] ACPI Exception (processor_throttling-0084): AE_NOT_FOUND, Evaluating _PTC [20070126]
[ 33.075497] ACPI Exception (processor_throttling-0147): AE_NOT_FOUND, Evaluating _TSS [20070126]
[ 33.075529] ACPI Exception (processor_throttling-0084): AE_NOT_FOUND, Evaluating _PTC [20070126]
[ 33.075536] ACPI Exception (processor_throttling-0147): AE_NOT_FOUND, Evaluating _TSS [20070126]
[ 33.075563] ACPI Exception (processor_throttling-0084): AE_NOT_FOUND, Evaluating _PTC [20070126]
[ 33.075570] ACPI Exception (processor_throttling-0147): AE_NOT_FOUND, Evaluating _TSS [20070126]

...

Config attached.

Regards,

Gabriel C

To: Gabriel C <nix.or.die@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-acpi@...>, <len.brown@...>
Date: Monday, July 23, 2007 - 12:57 pm

Same here, I was about to blame my holy Vaio, but latest ACPI merge is to
blame instead.

Regards,
ismail

--
Perfect is the enemy of good
-

To: Ismail Dönmez <ismail@...>
Cc: Gabriel C <nix.or.die@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-acpi@...>, <len.brown@...>
Date: Monday, July 23, 2007 - 4:44 pm

Add me too, Dell D610, 2.6.23-rc1 on top of Fedora 7.

--alessandro

"Did you get married but forgot to get divorced ?"

(Danny and Dusty, 'The Good Old Days')
-

To: Alessandro Suardi <alessandro.suardi@...>
Cc: Ismail <ismail@...>, Gabriel C <nix.or.die@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-acpi@...>, <len.brown@...>
Date: Tuesday, July 24, 2007 - 10:49 am

Ignore it -- it is a new patch looking for optional hooks,
and it is simply too verbose when it doesn't find them.
the verbosity will be gone in rc2.

thanks,
-Len
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Jeremy Fitzhardinge <jeremy@...>
Date: Monday, July 23, 2007 - 11:52 am

Subject: xen: fix process_msg() use-after-kfree
From: Ingo Molnar <mingo@elte.hu>

fix an obvious use-after-kfree bug in Xen.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
drivers/xen/xenbus/xenbus_xs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/drivers/xen/xenbus/xenbus_xs.c
===================================================================
--- linux.orig/drivers/xen/xenbus/xenbus_xs.c
+++ linux/drivers/xen/xenbus/xenbus_xs.c
@@ -782,8 +782,8 @@ static int process_msg(void)
msg->u.watch.vec = split(body, msg->hdr.len,
&msg->u.watch.vec_size);
if (IS_ERR(msg->u.watch.vec)) {
- kfree(msg);
err = PTR_ERR(msg->u.watch.vec);
+ kfree(msg);
goto out;
}

-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, <venkatesh.pallipadi@...>, <davej@...>, <jhoblitt@...>, <auke-jan.h.kok@...>
Date: Monday, July 23, 2007 - 5:50 am

This was seen on a machine on test.kernel.org;

Unable to handle kernel NULL pointer dereference at 0000000000000000
RIP:
[<ffffffff8037379b>] acpi_processor_throttling_seq_show+0xa7/0xd6
PGD 3bd9e067 PUD 3bc6a067 PMD 0
Oops: 0000 [1] SMP
CPU 3
Modules linked in: video output button battery floppy ac lp parport_pc
parport nvram amd_rng rng_core i2c_amd756 i2c_core
Pid: 1522, comm: head Not tainted 2.6.23-rc1-autokern1 #1
RIP: 0010:[<ffffffff8037379b>] [<ffffffff8037379b>]
acpi_processor_throttling_seq_show+0xa7/0xd6
RSP: 0018:ffff81003c4a5e48 EFLAGS: 00010246
RAX: 0000000000000020 RBX: ffff810037ea1800 RCX: 0000000000000000
RDX: 000000000000002a RSI: ffffffff80599c02 RDI: ffff810037c6a9c0
RBP: ffff810037c6a9c0 R08: ffff81003c3e3051 R09: ffff810037c6a9c0
R10: ffffffffffffffff R11: ffffffff80373e66 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00007fffd59dcd40
FS: 00002b7ad50e36f0(0000) GS:ffff81003ee56b40(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000003bd9b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process head (pid: 1522, threadinfo ffff81003c4a4000, task ffff81003ee81040)
Stack: ffff81003eed7180 ffff810037c6a9c0 0000000000000001 0000000000000001
0000000000002000 ffffffff802a77b3 ffff81003c4a5f50 ffff81003e2f8ec0
ffff810037c6a9f0 ffff81003de65000 0000000000000000 fffffffffffffffb
Call Trace:
[<ffffffff802a77b3>] seq_read+0x105/0x28e
[<ffffffff802a76ae>] seq_read+0x0/0x28e
[<ffffffff802c8501>] proc_reg_read+0x80/0x9a
[<ffffffff8028eb3d>] vfs_read+0xcb/0x153
[<ffffffff8028eed9>] sys_read+0x45/0x6e
[<ffffffff8020bc6e>] system_call+0x7e/0x83
FATAL: Error inserting acpi_cpufreq
(/lib/modules/2.6.23-rc1-autokern1/kernel/arch/x86_64/kernel/cpufreq/acpi-cpufreq.ko):
No such device

Full oops is at [ message continues ]

" title="http://tes...">http://tes...

To: Mel Gorman <mel@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <venkatesh.pallipadi@...>, <davej@...>, <jhoblitt@...>, <auke-jan.h.kok@...>, <linux-acpi@...>
Date: Monday, July 23, 2007 - 1:15 pm

try this,
thanks,
-Len

Subject: fix oops due to typo in new throttling code
From: Luming Yu <luming.yu@gmail.com>

Signed-off-by: Luming Yu <luming.yu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

---

drivers/acpi/processor_throttling.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

Index: linus/drivers/acpi/processor_throttling.c
===================================================================
--- linus.orig/drivers/acpi/processor_throttling.c
+++ linus/drivers/acpi/processor_throttling.c
@@ -658,18 +658,20 @@ static int acpi_processor_throttling_seq
pr->throttling.state_count - 1);

seq_puts(seq, "states:\n");
- if (acpi_processor_get_throttling == acpi_processor_get_throttling_fadt)
+ if (pr->throttling.acpi_processor_get_throttling ==
+ acpi_processor_get_throttling_fadt) {
for (i = 0; i < pr->throttling.state_count; i++)
seq_printf(seq, " %cT%d: %02d%%\n",
(i == pr->throttling.state ? '*' : ' '), i,
(pr->throttling.states[i].performance ? pr->
throttling.states[i].performance / 10 : 0));
- else
+ } else {
for (i = 0; i < pr->throttling.state_count; i++)
seq_printf(seq, " %cT%d: %02d%%\n",
(i == pr->throttling.state ? '*' : ' '), i,
(int)pr->throttling.states_tss[i].
freqpercentage);
+ }

end:
return 0;
-

To: Len Brown <lenb@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, <venkatesh.pallipadi@...>, <davej@...>, <jhoblitt@...>, <auke-jan.h.kok@...>, <linux-acpi@...>
Date: Tuesday, July 24, 2007 - 6:37 am

This works. When applied, the output to console is

FATAL: Error inserting acpi_cpufreq
(/lib/modules/2.6.23-rc1-autokern1/kernel/arch/x86_64/kernel/cpufreq/acpi-cpufreq.ko): No such device

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-

To: Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 23, 2007 - 3:14 am

Compared to 2.6.22>

# alpha/defconfig: broke

LD .tmp_vmlinux1
arch/alpha/kernel/built-in.o(.text+0xcdf8): In function `module_frob_arch_sections':
include/linux/slub_def.h:154: undefined reference to `__kmalloc_size_too_large'
arch/alpha/kernel/built-in.o(.text+0xcdfc):include/linux/slub_def.h:154: undefined reference to `__kmalloc_size_too_large'
arch/alpha/kernel/built-in.o(.text+0x190d8): In function `srmcons_get_private_struct':
include/linux/slub_def.h:154: undefined reference to `__kmalloc_size_too_large'
arch/alpha/kernel/built-in.o(.text+0x190dc):include/linux/slub_def.h:154: undefined reference to `__kmalloc_size_too_large'
arch/alpha/kernel/built-in.o(.init.text+0x948): In function `register_cpus':
include/linux/slub_def.h:154: undefined reference to `__kmalloc_size_too_large'
arch/alpha/kernel/built-in.o(.init.text+0x94c):include/linux/slub_def.h:154: more undefined references to `__kmalloc_size_too_large' follow
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [_all] Error 2

# i386/allmodconfig: broke

CC [M] drivers/misc/asus-laptop.o
drivers/misc/asus-laptop.c: In function `asus_led_exit':
drivers/misc/asus-laptop.c:1076: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1076: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1077: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1077: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1078: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1078: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1079: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1079: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1080: error: structure has no member named `class_dev'
drivers/misc/asus-laptop.c:1080: error: structure has no member named `class_dev'
make[3]: *** [drivers/misc/asus-laptop.o] Error 1
make[2]...

To: Jan Dittmer <jdi@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 23, 2007 - 9:57 am

This is the HOTPLUG one that GregKH knows about and has a patch for I
think. Not PPC specific.

josh
-

To: Josh Boyer <jwboyer@...>
Cc: Jan Dittmer <jdi@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 23, 2007 - 10:02 am

Yes CONFIG_HOTPLUG=n

Regards,

Gabriel C
-

To: Jan Dittmer <jdi@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 23, 2007 - 3:56 am

tion 'vio_enable_interrupts'

Patch sent to Paulus today.

--=20
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 22, 2007 - 10:48 pm

allyesconfig has a lot 'Section mismatch' warnings

...

LD vmlinux.o
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to .init.text.1:start_kernel (between 'is386' and 'check_x87')
WARNING: vmlinux.o(.data+0x4b38): Section mismatch: reference to .init.text.3:powernow_cpu_init (between 'powernow_driver' and 'minimum_speed')
WARNING: vmlinux.o(.data+0x4c2c): Section mismatch: reference to .init.text.3:longhaul_cpu_init (between 'longhaul_driver' and 'numscales')
WARNING: vmlinux.o(.data+0x4cf4): Section mismatch: reference to .init.text.3:longrun_cpu_init (between 'longrun_driver' and 'max_duration')
WARNING: vmlinux.o(.data+0x57f4): Section mismatch: reference to .init.text.3:native_smp_prepare_boot_cpu (between 'smp_ops' and 'call_lock')
WARNING: vmlinux.o(.data+0x57f8): Section mismatch: reference to .init.text.3:native_smp_prepare_cpus (between 'smp_ops' and 'call_lock')
WARNING: vmlinux.o(.data+0x5800): Section mismatch: reference to .init.text.3:native_smp_cpus_done (between 'smp_ops' and 'call_lock')
WARNING: vmlinux.o(.data+0x6c00): Section mismatch: reference to .init.text.5:machine_specific_memory_setup (between 'paravirt_ops' and 'reserve_ioports')
WARNING: vmlinux.o(.data+0x6c08): Section mismatch: reference to .init.text.3:native_init_IRQ (between 'paravirt_ops' and 'reserve_ioports')
WARNING: vmlinux.o(.data+0x6c0c): Section mismatch: reference to .init.text.3:hpet_time_init (between 'paravirt_ops' and 'reserve_ioports')
WARNING: vmlinux.o(.data+0x6c10): Section mismatch: reference to .init.text.4:native_pagetable_setup_start (between 'paravirt_ops' and 'reserve_ioports')
WARNING: vmlinux.o(.data+0x6c14): Section mismatch: reference to .init.text.4:native_pagetable_setup_done (between 'paravirt_ops' and 'reserve_ioports')
WARNING: vmlinux.o(.data+0x6c18): Section mismatch: reference to .init.text.3:default_banner (between 'paravirt_ops' and 'reserve_ioports')
WARNING: vmlinux.o(.data+0x6cdc): Section mismatch: reference to .init.text.3:se...

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, <corentincj@...>
Date: Sunday, July 22, 2007 - 9:20 pm

allmodconfig is broken

...

drivers/misc/asus-laptop.c: In function 'asus_led_exit':
drivers/misc/asus-laptop.c:1076: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1076: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1077: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1077: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1078: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1078: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1079: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1079: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1080: error: 'struct led_classdev' has no member named 'class_dev'
drivers/misc/asus-laptop.c:1080: error: 'struct led_classdev' has no member named 'class_dev'
make[2]: *** [drivers/misc/asus-laptop.o] Error 1
make[2]: *** Waiting for unfinished jobs....

...

Regards,

Gabriel C
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Greg KH <greg@...>
Date: Sunday, July 22, 2007 - 9:23 pm

Some of the driver model changes that went in result in a link error:

CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
make: *** [.tmp_vmlinux1] Error 1

Haven't bisected it yet, but I suppose it's pretty obvious to whoever made the
changes. ;-)

.config follows:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc1
# Mon Jul 23 10:02:46 2007
#
CONFIG_SUPERH=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
# CONFIG_HOTPLUG is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SH...

To: Paul Mundt <lethal@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 23, 2007 - 12:11 am

Yes, the patch is on the list (and been pointed out already) and is in
my queue to send to Linus in the next few days.

thanks,

greg k-h
-

To: Paul Mundt <lethal@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Greg KH <greg@...>
Date: Sunday, July 22, 2007 - 9:27 pm

CONFIG_HOTPLUG=n :)

Try this patch :
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/...

Regards,

Gabriel
-

To: Gabriel C <nix.or.die@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Greg KH <greg@...>
Date: Sunday, July 22, 2007 - 9:40 pm

Yup, that fixes it. I'll just enable it across the defconfigs for now, thanks.
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>, Roland McGrath <roland@...>
Date: Sunday, July 22, 2007 - 7:33 pm

I'm fairly sure this is already known about on SPARC64 (see David Miller's
email ""build-id" changes break sparc64"), but I just thought I'd let people
know the warnings are also visible on x86_64:

"ld: warning: Cannot create .note.gnu.build-id section, --build-id ignored."

gcc (GCC) 4.1.3 20070718 (prerelease) (Debian 4.1.2-14)
GNU assembler (GNU Binutils for Debian) 2.17.50.20070718

The kernel boots and works fine, however. The above tools are from Debian
unstable.

--
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.

-

To: Alistair John Strachan <alistair@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>
Date: Sunday, July 22, 2007 - 7:51 pm

This ld build, whatever it is, is suspect.
I have no idea what code is in there.

Thanks,
Roland
-

To: Roland McGrath <roland@...>
Cc: Alistair John Strachan <alistair@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>
Date: Sunday, July 22, 2007 - 8:07 pm

That's the Debian unstable package of binutils containing what was on
20070718 in the upstream binutils CVS (the version number comes from

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Alistair John Strachan <alistair@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>
Date: Sunday, July 22, 2007 - 8:31 pm

At what time on July 18? Before or after the commits I made that day?
You see, I can't tell from the information at hand.
-

To: Roland McGrath <roland@...>
Cc: Alistair John Strachan <alistair@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>
Date: Sunday, July 22, 2007 - 9:43 pm

The information comes directly from bfd/version.h

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>
Date: Sunday, July 22, 2007 - 6:10 pm

Does overlapping sections count as a new feature? ;)

gcc -m elf_x86_64 -nostdlib -fPIC -shared -Wl,-soname=3Dlinux-vdso.so.1 =
-Wl,-z,max-page-size=3D4096 -Wl,-z,common-page-size=3D4096 -Wl,-T,arch/x86_=
64/vdso/vdso.lds arch/x86_64/vdso/vdso-start.o arch/x86_64/vdso/vdso-note.o=
arch/x86_64/vdso/vclock_gettime.o arch/x86_64/vdso/vgetcpu.o arch/x86_64/v=
dso/vvar.o -o arch/x86_64/vdso/vdso.so
/usr/bin/ld: section .text [ffffffffff700500 -> ffffffffff7007e3] overlaps =
section .gnu.version_d [ffffffffff7004d8 -> ffffffffff70050f]
collect2: ld returned 1 exit status
make[3]: *** [arch/x86_64/vdso/vdso.so] Error 1
make[2]: *** [arch/x86_64/vdso] Error 2
make[1]: *** [_all] Error 2
make: *** [all] Error 2

This is gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)

=2Econfig below.

Regards
Andre

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc1
# Mon Jul 23 00:04:38 2007
#
CONFIG_X86_64=3Dy
CONFIG_64BIT=3Dy
CONFIG_X86=3Dy
CONFIG_GENERIC_TIME=3Dy
CONFIG_GENERIC_TIME_VSYSCALL=3Dy
CONFIG_GENERIC_CMOS_UPDATE=3Dy
CONFIG_ZONE_DMA32=3Dy
CONFIG_LOCKDEP_SUPPORT=3Dy
CONFIG_STACKTRACE_SUPPORT=3Dy
CONFIG_SEMAPHORE_SLEEPERS=3Dy
CONFIG_MMU=3Dy
CONFIG_ZONE_DMA=3Dy
CONFIG_QUICKLIST=3Dy
CONFIG_NR_QUICK=3D2
CONFIG_RWSEM_GENERIC_SPINLOCK=3Dy
CONFIG_GENERIC_HWEIGHT=3Dy
CONFIG_GENERIC_CALIBRATE_DELAY=3Dy
CONFIG_X86_CMPXCHG=3Dy
CONFIG_EARLY_PRINTK=3Dy
CONFIG_GENERIC_ISA_DMA=3Dy
CONFIG_GENERIC_IOMAP=3Dy
CONFIG_ARCH_MAY_HAVE_PC_FDC=3Dy
CONFIG_ARCH_POPULATES_NODE_MAP=3Dy
CONFIG_DMI=3Dy
CONFIG_AUDIT_ARCH=3Dy
CONFIG_GENERIC_BUG=3Dy
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST=3D"/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=3Dy
CONFIG_LOCK_KERNEL=3Dy
CONFIG_INIT_ENV_ARG_LIMIT=3D32

#
# General setup
#
CONFIG_LOCALVERSION=3D""
CONFIG_LOCALVERSION_AUTO=3Dy
CONFIG_SWAP=3Dy
CONFIG_SYSVIPC=3Dy
CONFIG_SYSVIPC_SYSCTL=3Dy
CONFIG_POSIX_MQUEUE=3...

To: Andre Noll <maan@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 22, 2007 - 6:22 pm

Does this patch fix it?

-Andi

Increase VDSO_TEXT_OFFSET for ancient binutils

For some reason old binutils genertate larger headers so
increase the text offset of the vdso to avoid linker errors.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/x86_64/vdso/voffset.h
===================================================================
--- linux.orig/arch/x86_64/vdso/voffset.h
+++ linux/arch/x86_64/vdso/voffset.h
@@ -1 +1 @@
-#define VDSO_TEXT_OFFSET 0x500
+#define VDSO_TEXT_OFFSET 0x600
-

To: Andi Kleen <ak@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 22, 2007 - 7:23 pm

Nope, with 0x600 I still get the same error. But it helped to further
increase VDSO_TEXT_OFFSET to 0xc00. I tried 0x700, 0x800,... and 0xc00
is the smallest value in this series that makes the error go away, i.e.
the patch below works for me.

Thanks
Andre

diff --git a/arch/x86_64/vdso/voffset.h b/arch/x86_64/vdso/voffset.h
index 5304204..61667d5 100644
--- a/arch/x86_64/vdso/voffset.h
+++ b/arch/x86_64/vdso/voffset.h
@@ -1 +1 @@
-#define VDSO_TEXT_OFFSET 0x500
+#define VDSO_TEXT_OFFSET 0xc00

--=20
The only person who always got his work done by Friday was Robinson Crusoe

To: Andre Noll <maan@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Sunday, July 22, 2007 - 7:31 pm

Can you send (privately) readelf -a output from your vdso.so ?
Your linker must be doing something weird.

0xc00 is quite wasteful.

-Andi
-

To: Andi Kleen <ak@...>
Cc: Andre Noll <maan@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Monday, July 23, 2007 - 2:07 am

I think Roland's --build-id doesn't create very big section, the likely
culprit would be a hacked up ld that e.g. defaults to --hash-style=both.
Can you retry with --hash-style=sysv? vdso really has to include the
traditional .hash section, otherwise it wouldn't be compatible with
old glibcs, and an additional .gnu.hash might be an overkill for it
- doesn't the vdso define only very few symbols?

Jakub
-

Previous thread: Git tree for old kernels from before the current tree by Jon Smirl on Sunday, July 22, 2007 - 4:49 pm. (32 messages)

Next thread: please pull from the trivial tree by Adrian Bunk on Sunday, July 22, 2007 - 5:08 pm. (1 message)