what's brewing in x86.git for v2.6.26?
too many topics to list them all - there are 884 patches from 74 authors
at the moment.a few highlights:
- 4096 CPUs support. (Yes, such big boxes exist, and they run Linux.)
- mmiotrace feature: trace accesses to hw components to help figure out
how they are programmed.- kmemcheck feature: Valgrind for the native Linux kernel in essence -
detects access to uninitialized memory.- ftrace plugin for sysprof
- fixed StackProtector security feature (these fixes were too intrusive
for v2.6.25)- lazy FPU allocation/speedup - offloaded/large FPU/SSE state support
- SMP-boot, mpparse, DMA ops unification
- PAT support - first step towards phasing out MTRR's for cache
attribute control- enable GBPAGES - faster TLB misses on CPUs that support it
- debug helper: view kernel pagetable layout via debugfs
- lots of paravirt work, unification and cleanups
- generalized bitops - they are faster and smaller.
- tons of code cleanups either via the unifications or via explicit
cleanup patches - the metrics [via checkpatch] look like this today:errors lines of code errors/KLOC
[pre-unification:]
v2.6.23 arch/i386/ 5954 83593 71.2
v2.6.23 arch/x86_64/ 2899 31830 91.0[mechanic unification:]
v2.6.24-rc1 arch/x86/ 8695 117423 74.0[post-unification cleanup work-in-progress:]
v2.6.24-x86.git arch/x86/ [21 Nov 2007] 5190 117156 44.2
v2.6.24-x86.git arch/x86/ [18 Dec 2007] 4057 117213 34.6
v2.6.24-x86.git arch/x86/ [ 4 Feb 2008] 3334 133542 24.9
v2.6.25-x86.git arch/x86/ [21 Feb 2008] 2724 136963 19.8
v2.6.25-x86.git arch/x86/ [ 1 Mar 2008] 2155 136404 15.7
v2.6.25-x86.git arch/x86/ [ 7 Apr 2008] 1868 136793...
> - PAT support - first step towards phasing out MTRR's for cache
> attribute controlI just grabbed your tree and I see that there is no
pgprot_writecombine() for x86. I see that pci_mmap_page_range() handles
write combining but there's no way for a PCI driver to allow userspace
to mmap() part of a BAR with WC enabled. Doing this gives a big
performance boost for running low-latency apps on InfiniBand hardware
driven by the mlx4 driver (look for the FIXME about it in
drivers/infiniband/hw/mlx4/main.c). Are there any plans to handle this
somehow?Thanks,
Roland
--
Yes. We are planning to fix this by tracking the attribute usage specified
in the pgprot field of remap_pfn_range(). Once the attribute is tracked
properly, we can add pgprot_writecombine() for WC case.This should also help the other drivers like video fb etc which specify
the UC/WC attribute for user level mmap.thanks,
suresh
--
Faster in a 100% bogus benchmark with the most unrealistic input data
set one can imagine with some effort. They might be faster or they
might be slower, nobody really knows currently.-Andi
--
On Wed, 16 Apr 2008 22:50:51 +0200, "Andi Kleen" <andi@firstfloor.org>
Hello Andi, Ingo,
The input for the first 'benchmark' was indeed completely unrealistic.
They did show a very convincing speedup, though. This program was
really written to verify the implementation and was later converted
to a benchmark. Many benchmarks are unrealistic. I also wrote a
benchmark for find_first_bit and find_next_bit:
http://heukelum.fastmail.fm/find_first_bitMy conclusion would be: the speed of the generic bitmap implementation
is either better than or at least comparable to the current private
implementations in i386/x86_64. The generic version is out-of-line,
while the private implementation of i386 was inlined: this causes a
regression for very small bitmaps. However, if the bitmap size is
a constant and fits a long integer, the updated generic code should
inline an optimized version, like x86_64 currently does it.I think the change is a good one.
Greetings,
Alexander
--
Alexander van Heukelum
heukelum@fastmail.fm--
http://www.fastmail.fm - The professional email service--
I think a realistic benchmark would be by running a real kernel
and profiling the input values of the bitmap functions and then
testing these cases.I actually started that when I complained last time by writing
a systemtap script for this that generates a histogram, but for some
reason systemtap couldn't tap all bitmap functions in my kernel and
missed some completely and I ran out of time tracking that down.My gut feeling is the only interesting cases are cpumask/nodemask sized
(which can be one word, two words but now upto 8 words on a NR_CPU=4096Ok.
Yes it should probably. cpumask walks are relatively common.
I remember profiling mysql some time ago which did bad overscheduling
due to dumb locking. Funny was that the mask walking in the scheduler
actually stood out. No, i don't claim extreme overscheduling is an
interesting case to optimize for, but then there are more realistic
workloads which also do a lot of context switching.BTW if you do generic work on this: one reason the generated code for
for_each_cpu etc. is so ugly is that the code has checks for
find_next_bit returning >= max size. If you can generize the
code enough to make sure no arch does that anymore these checks
could be eliminated.-Andi
--
On Thu, 17 Apr 2008 12:51:09 +0200, "Andi Kleen" <andi@firstfloor.org>
Hi,
The version that is in x86#testing _will_ do this optimization. For
32 node SMP on x86_64 this results in:<__first_cpu>:
mov $0x20,%edx (inlined...)
mov $0x100000000,%rax
or (%rdi),%rax
bsf %rax,%rax (... find_first_bit)
cmp $0x20,%eax (superfluous paranoia...)
cmovg %edx,%eax (... for broken find_first_bit)
retqfor_each_cpu code looks fine:
mov $cpumapaddress,%rdi
callq <__first_cpu>
jmp end_of_body
start_of_body:
...
end_of_body:
mov $cpumapaddress,%edi ($mapaddress often cached in register)
callq <__next_cpu>
cmp $0x1f,%eax
jle start_of_bodyOn the other hand it would be nice to change __first_cpu and
__next_cpu into inline functions. If all implementations of
find_first_bit and find_next_bit would reliably return max_size
if no bits were found, that would be a good thing to do. The
generic one does return max_size.Greetings,
--
Alexander van Heukelum
heukelum@fastmail.fm--
http://www.fastmail.fm - One of many happy users:
http://www.fastmail.fm/docs/quotes.html--
quite so. Your change was promising from the get go and the latest
iteration was definitely a good one and is very much mergable. That it
also helps improve other architectures is the icing on the cake.Andi will have to prove his points by coming up with competing benchmark
results - you certainly did your fair share to back up your change with
numbers. (and Andi, if/when you do so, please Cc: Alexander too in the
future. Dont you want him to be able to reply to your complaints ASAP?)I dont really understand the negativism that comes from Andi - he was
very much aware of the various iterations and benchmarks you did when
developing this rather cool feature: he participated in those threads
and was Cc:-ed as well. The "100% bogus benchmark with the most
unrealistic input data set one can imagine" remark from Andi with no Cc:
was a nasty and unprovoked hit below the waistline.Ingo
--
My point was really: "don't merge based on bogus benchmarks" or
perhaps better put: every time you see a benchmark result turn on your
brain and make sure it is really measuring something that makes sense
and also "don't put results from bogus benchmarks into change logs"I actually don't have a big issue with the patches themselves (they
seem reasonably clean so they don't make the code worse, although I
don't think they are a significant improvement over the previous codeThe initial "1...n" benchmark after which you merged the patch
definitely fit my "bogus" description. If there was a later better one I
had missed that indeed, sorry and I don't remember being cc'ed on one
such (except in Alexander's latest answer which satisfied me)-Andi
--
Why can't I find this patch on the mailing list?
It's an MM patch which touches sched.h. It should not be in git-x86.
It generates over a megabyte of warnings on sparc64.
--
From: Andrew Morton <akpm@linux-foundation.org>
If you're going to touch generic code, test build on other
platforms or get it into a tree where such build testing is
done for you.
--
we do an automatic build test (of both vmlinux and of modules) of over
80 non-x86 architecture configs, amongst them are sparc64 and sparc
configs:http://www.tglx.de/autoqa-cgi/index?run=86&tree=1
for the latest version of our trees all the 80 non-x86 configs (and all
96 configs in total) built successfully.if in the sparc64 row you click on the green "OK" button (which signals
that the build was successful) you can get the build log and see all 7
warnings that get triggered on the sparc64 defconfig with
sched-devel/latest [which embedds x86/latest].so i'm not sure where those "over a megabyte of warnings" come from.
Ingo
--
allmodconfig.
In file included from include/linux/mm.h:39,
from include/linux/scatterlist.h:6,
from include/asm/dma-mapping.h:4,
from include/linux/dma-mapping.h:52,
from include/asm/pci.h:6,
from include/linux/pci.h:945,
from drivers/ata/ata_generic.c:21:
include/asm/pgtable.h: In function `set_pte_at':
include/asm/pgtable.h:670: warning: `init_mm' is deprecated (declared at include/linux/sched.h:1616)
--
Why is is deprecated anyway? I think we all agreed that killing the
export is good, but I don't see any way how we could kill the core
useage.
--
From: Christoph Hellwig <hch@infradead.org>
set_pte_at() is done inside drivers and what-not, you're not
going to be able to get rid of this export so easily.
--
ok, good point, i zapped this change [it was in the x86/testing section
anyway, i.e. not to be pushed upstream as an x86 change] - and if it's
resubmitted it should be sent to -mm anyway. Arjan, what do you think?Ingo
--
probe_kernel_address() should be removed and reimplemented using the new
probe_kernel_read().--
How much of this has not been in -mm?
There were serious objections that this is weaker than and duplicative of
oprofile which were not adequately addressed.Also, I (and apparently only I) actually reviewed the implementation and
found it to be riddled with bugs and shortcomings. afacit this was
completely ignored and you propose to merge it anwyay?I obviously don't have time to go through it all, but I'm afraid I cannot
be very confident in it. All I can say is "the parts which have been in
-mm seem to compile and run". A quick grep indicates that only 644 of
these 884 patches are in -mm. And a lot of them only turned up a week or
two ago.
--
On Thu, Apr 17, 2008 at 10:25 AM, Andrew Morton
No, I fixed some of those and I think Ingo/Soren did the rest. But it
needs another round of review on LKML for sure.
--
afaict none of my review comments have been addressed in the sysprof.c which
is in -mm. It has the following commits:x86, sysprof: merge header to compilation unit
x86, sysprof: clean up stack frame copying
x86, sysprof: remove dead code
sysprof: user pointer verification
sysprof: minor cleanups
x86: add the debugfs interface for the sysprof toolI received no email even acknowledging the review comments.
--
Oh sorry, I guess I fixed up problems I spot myself. There indeed seem to
be some comments by Andrew that have not been addressed:http://lkml.org/lkml/2008/2/23/68
Pekka
--
--
at a quick glance most of the fundamental ones are addressed (it now
uses per-cpu hrtimers, not the timer hook, etc.), but i'll go over it
with a fine comb as well to make sure everything Andrew pointed out is
addressed.Ingo
--
Apparently that code is not being proposed for merge and it all got
reimplemented and moved elsewhere. I knew nothing of this.
--
It's not just you, I haven't heard anythinh either :) I'm also still
not comfortable with adding another quick-hacker interface instead of
making sure the existing one doesn't suck.--
it's all in debugfs so no stable interface worries. Not much has changed
since we last posted it to lkml. Posting the full patchset is
impractical as it's in excess of 100 commits (but we did post various
versions of it) - you can pick up the latest from:http://people.redhat.com/mingo/sched-devel.git/README
check out kernel/trace/. Usage: check out
/sys/kernel/debug/tracing/README :-)Ingo
--
afaik the sysprof-vs-oprofile issue still hasn't been settled. Maybe it's
no longer a relevant question with the new code - I just don't know.
Everything went all quiet and then this stuff happened.
--
i dont think there's any big issue here. Sysprof is a time and stack
system-wide tracer/profiler, oprofile profiles CPU events - deep
stacktracing is an afterthought there. And how do you set up oprofile to
do precise time events?with sysprof you can do:
cd /sys/kernel/debug/tracing
echo sysprof > current_tracer
cat trace_pipeand you'll see the trace events go by, live. The user-space bits of
sysprof have been ported over to ftrace/sysprof already and it's a
really nice tool that shows a deep stack-trace based hierarchical
"vertical" profile instead of the usual finegrained profile.It certainly helps that the author of the tracer plugin (Soeren
Sandmann) is the author of the userspace app too - so there's a rather
well-working feedback loop here ;-)With oprofile all these things are rather indirect, the API is more
complex, it forces per-CPU buffers, etc. etc. I think for
instrumentation the driving force must be usability, and sysprof/ftrace
is hands down more usable - to me at least.Ingo
--
Does it meany Linux give up implementing DTrace way of
tracing/instrumntation ? In last time I observe more and more signs
inroducing parallel ways of tracing/instrumentations infrasctructures in
Linux kernel where all this can be rolled into only one .. common. Strange
but some of this tracing/instrumentations does not uses "zero cost probes"
but "near zero cost probes" (like this) and this will result only more and
more bloated kernel code with statically injected instrumentations.(DTrace have very hermecic source code. In this case it mean DTrace have
*very* limited point of entry to all other source code. In case Linux
numbers of points of entry to instrumented code seems constantly growing
by introducing sometimes duplicationg instrumentations infrastructures:
oprofile, Text editor, sysprof/ftrace, utrace, blktrace .. what will be
next ?).Is this any explanation this (looks like completly) ad hoc/haotic Linux
way ?kloczek
--
-----------------------------------------------------------
*Ludzie nie maj
the goal of having more generic markers is still possible and being
aimed for - for in-kernel utilization like SystemTap, lttng, utrace,
ftrace and similar. The latest iteration of markers looks rather
promising in terms of giving us near-zero-cost probe points.(and last i checked dtrace was not capable of doing something like
mmiotrace - so it's a different thing.)so dont worry :)
Ingo
--
On Thu, 17 Apr 2008 20:51:58 +0200
Well that's all good to hear but I don't know where you're getting your
information from. In the past month and a half I've seen zero email from
Soeren and a single ftrace-related patch.So right now I do not have enough information to understand what ftrace
does, let alone to compare it with oprofile.And this is a problem. I, probably more than anyone else, work with bug
reporters on kernel problems and I am not in a position to be able to
direct them to use an important new tool. There isn't even a documentation
file I can point them at.I'd imagine that a large number of the current kernel development team only
vaguely know of ftrace's existence, let alone how to use it and what itsWell. You know how to use it.
John, Phillippe: have you had a chance to take a look at the latest ftrace
code?Thanks.
--
... well, this all originates from the latency tracer in -rt. Which goes
back years, it's frequently utilized and it helped us fix many bugs.
Ftrace has been frequently posted to lkml and is being developed in
sched-devel.git - and we dont repost the series on lkml because it's
lots of patches now (and the concept didnt change much anyway, we just
got more plugins). The URL to monitor sched-devel.git changes is:http://people.redhat.com/mingo/sched-devel.git/README
(see the shortlog below) [ Note: the commit logs are not tidied up and
backmerged yet - and the x86 commits are there too - this will get
cleaner once the first, largest phase of x86.git goes upstream. ]Regaring utility, off the top of my head here are a few recent
fixes/improvements we did with the help of ftrace's scheduler tracer
component or with other ftrace components:| commit f540a6080a092e2ab69fd146c308022db7347b0a
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Sat Mar 15 17:10:34 2008 +0100
|
| sched: wakeup-buddy tasks are cache-hot| commit 4ae7d5cefd4aa3560e359a3b0f03e12adc8b5c86
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Wed Mar 19 01:42:00 2008 +0100
|
| sched: improve affine wakeups| commit aa2ac25229cd4d0280f6174c42712744ad61b140
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Fri Mar 14 21:12:12 2008 +0100
|
| sched: fix overload performance: buddy wakeups| commit bead9a3abd15710b0bdfd418daef606722d86282
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Wed Apr 16 01:40:00 2008 +0200
|
| mm: sparsemem memory_present() fixand that's just me. So i'm not worried at all whether it will be used.
Ingo
------------------->
Adrian Bunk (1):
x86: remove the write-only timer_uses_ioapic_pin_0Akinobu Mita (6):
x86: avoid redundant loop in io_apic_level_ack_pending()
x86: use ioapic_read_entry() and ioapic_write_entry()
x86: remove unnecessary memset()
x86: re...
all of these are in linux-next, and most of them are in -mm.
the for-akpm branch has 646 commits at the moment (these are the ones
that are in -mm), out of 890 patches. These are the "pure arch/x86"
topic patches, and which will be offered in the first wave of pull
requests.Of the remaining patches, they'll be offered under different topics, in
different temporary branches (or trees) depending on which subsystem
they interact with. There will be no "take it or leave it" big pull
request.Some patches are later in the queue because they depend on generic
infrastructure.Some wont be offered for a pull at all because they belong into other
subsystems and we just track them via x86.git because it's some
important topic or dangerous-looking patch we'd like to see the effects
of first-hand.[ sorry about not having described this in detail in my mail - i spent
the last 3 work days on a 2.6.25 regression almost non-stop, so x86
queue cleanup lagged behind a bit and my description of the changesnone.
but we do much more testing than just getting code into other trees. We
cross-build 96 different configurations on other non-x86 architectures:http://www.tglx.de/autoqa-cgi/index?run=81&tree=1
last night's run was: 96 out of 96 configs built successfully.
This covers: alpha, arm, mips, powerpc, sparc64, x86, m32r, powerpc,
xtensa, mips, sh, sparc, parisc, powerpc. We test the various branches
(amongst them for-akpm) and combination trees as well.and the backbone of arch/x86 QA we do are the build, boot and stress
tests we do on x86: we ran and booted thousands of x86 randconfigs in
the past few days alone. x86/latest boots and works from the smallest
boxes up to a 64-way testbox. On the 64-way box i did a 1 week burn-inyou mean the original hack? Sure, that had a number of problems and we
are not offering that for a merge.But have you seen the latest code we are offering for merge? Check out
sched-devel/latest and kernel/tr...
That's a relief. Please keep it this way - I plan on basing -mm on
That's nowhere as useful as it could be.
By keeping all this code out of -mm you haven't solved any of the
merge/integration problems which we had in 2.6.24-rcX. They're all still
there. All you did was to push them out of the two-month
integrate-and-test period and put them into the 2.6.25 merge windowWould prefer to not have to go fishing in git trees to find code to review.
--
... hm, i think there's really no problem here at all: most of the merge
problems you cited were due to clearly out-of-tree patches that sit in
x86.git/testing for the convenience of our testers and contributors,
that we have no intention to push upstream.(Again, sorry about the terse shortlog which might have contributed to
this misunderstanding, that's all i could do yesterday evening.)Ingo
--
Are those out-of-tree patches also in linux-next? The page-flags and prctl
changes are there. And those are planned for 2.6.26, aren't they?--
you mean kmemcheck? Yes, that's planned. We've been working 4 months
non-stop on kmemcheck to make it mergeable and usable, it's at version 7
right now, and it caught a handful of real bugs already (such as
63a7138671c - unfortunately not credited in the log to kmemcheck). But
because it touches SLUB (because it has to - and they are acked by
Pekka) i never had the chance to move it into the for-akpm branch.i guess this will all sort itself out when you rebase -mm to linux-next.
Stephen Rothwell is doing an excellent job of resolving interactions
between trees.Ingo
--
btw: ftrace. I didn't pay much attention to the early patches when they
flew past a couple of months ago and there's been basically no email about
it since (unless it's on some other list?)Changelogs don't seem to explain it much and it seems to be undocumented.
It's nicely commented, but it would be useful to at least present the
kernel<->userspace API in a way which can be reviewed.You'd think that uninlining ftrace_ip_in_hash() would save a few bytes, but
it saves zero. Even noinline doesn't change it. Weird.<checks>
hm, gcc has gone and ignored the `inline'. heh.
--
Does it really really really need to consume one of our few remaining page
Yep I expect it'll help in several ways. (That's why I suggested it!)
--
No we're not. Just the (imho always misguided) "encode zone/node number
into flags" optimization has to be removed again or made 64bit only.
Then there will be plenty of flags again.Really I see no real reason this can't be done with a small hash table
again like x86-64 originally did.-Andi
--
hm. Or we add a new nid&zone field to the pageframe for 32bit NUMA. Just
don't tell Paul Mundt ;)Need to work out what's going on with ia64's use of the upper 32 bits too.
I have a feeling it's using less than it used too but at 3AM I can't beHow did that work? A pfn->zone-id hash table would be huge?
--
Most (all?) NUMA archs have some way to get from phys->nid. Getting from
pfn->nid is then easy.Originally this was all optimized for text size when this stuff was
still inlined, but at some point they were all out of lined anyways
(unless on FLAT iirc) so a lot of the old design decisions became obsolete.BTW I should disclose that my mask allocator that I'm still planning
to push needs one flag bit on 32/64bit and another one on 64bitSorry i meant it used the hash table to look up the node. In fact
that code is still in there, although used less because a lot of these
lookups are resolved from the flags.Once you have the node from the pfn it is a at most three range checks
to get to the zone (usually less). The most efficient way to do that is
to just open code it in code.-Andi
--
On Thu, Apr 17, 2008 at 12:36 PM, Andrew Morton
FYI, the initial version of kmemcheck didn't have a separate page flag
(it abused SLUB internals) but it got really hairy and I think I
finally convinced Vegard to switch over to page flags after some
hair-pulling when we hit a bug. So yes, from SLUB maintainer point of
view, we _really, really_ want to use a page flag here.
--
Thank you whoever wrote kmemcheck.txt
How come slub uses one byte to track the status of each byte when it could
use a single bit?We (still!) have not made the decision whether to proceed with slab or
slub. How hard would it be to port kmemcheck into slab?--
Hi Andrew,
On Thu, Apr 17, 2008 at 1:33 PM, Andrew Morton
That would be Vegard.
On Thu, Apr 17, 2008 at 1:33 PM, Andrew Morton
Single bit is not enough as we track use after free as well (after
we've added delayed freeing to the beast).On Thu, Apr 17, 2008 at 1:33 PM, Andrew Morton
Not hard.
--
i think slab is clearly out, unless some catastrophic regression is
found. But Nick's SLQB might replace SLUB ;-)Ingo
--
Hey, we all have our favorite replacements for kmalloc():
http://www.kernel.org/pub/linux/kernel/people/penberg/patches/binalloc/2...
But it seems unrealistic to expect any of them to replace SLUB or SLAB
in the near future.Pekka
--
Obviously that one won't because it is totally unsuitable to replace
SLAB. It may be a good choice to replace SLOB (if it is found to be
technically better). Just the same as SLQB might replace SLAB if it
is found to be technically better. That's (one main criteria for) how
we merge/decide between things.
--
On Thu, 17 Apr 2008 12:38:00 +0200
SLUB still has that several percent TPC-C regression....
Christoph has a small reproducer testcase.--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
btw., several percent of TPC-C is bad... so i take back the 'SLAB is
out' observation for now :-/Ingo
--
any URL to that small reproducer testcase?
Ingo
--
well AFAICS the shortage really mostly affects 32-bit platforms. And
there we've got 19 bits used, out of 23 available, right?whether we track a page or not is rather fundamental to kmemcheck, i
dont see any easy way to get rid of that usage. (and since kmemcheck is
a transparent add-on, i dont see any obvious other candidate like
page->private either - all those fields might be utilized)if we run out of that in the future: the high bits get used by sparse
section and numa node ID bits, worst-case we could live with restricting
the max number of NUMA nodes on 32-bit from 64 to 32? [NUMA on 32-bit is
an afterthought anyway.] Or we could do a CONFIG_KMEMCHECK=y only
page->flags_debug.Ingo
--
Unfortunately the answer to that question is surprisingly hard to generate
and it probably has changed over time. Christoph sat down and worked it
all out a few months ago.One surprising problem is ia64 which uses (or used to use?) basically all
Yes, I think it's only NUMAQ and one of the old IBM machines. I don't
think numaq ever went beyond 8 nodes. superh is (or will) use NUMA, butYes, that'd be OK. We could do that now, or just pop a comment in there.
--
On Thu, Apr 17, 2008 at 11:36 AM, Andrew Morton
Actually it doesn't. I attach a patch which gets rid of the page flag,
and we rely instead on the PTE flag for page-trackedness.The reason we didn't do this at once is that the making of kmemcheck
has been pretty much my first introduction to SLUB, x86, page flags,
etc., and the actual semantics of the various introduced flags have
varied since the first version of kmemcheck. At this point, the struct
page flags weren't actually needed anymore, but they were convenient.My apologies for not inlining the patch -- I don't have a mail client
that won't mess up whitespace. It can also be downloaded at:
http://folk.uio.no/vegardno/linux/0001-kmemcheck-remove-use-of-tracked-p...The patch has received minimal amount of testing, but I've
double-checked the logic. It boots fine on my laptop, boot log at:
http://folk.uio.no/vegardno/linux/kmemcheck-20080417.txtIngo, will you take this for some additional testing?
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
thanks Vegard, i've applied it - looks good to me too.
Ingo
--
x86.git randconfig testing found a build bug - fix below.
Ingo
------------>
Subject: kmemcheck: fix build
From: Ingo Molnar <mingo@elte.hu>
Date: Thu Apr 17 21:20:43 CEST 2008Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/kmemcheck.h | 3 +++
1 file changed, 3 insertions(+)Index: linux/include/linux/kmemcheck.h
===================================================================
--- linux.orig/include/linux/kmemcheck.h
+++ linux/include/linux/kmemcheck.h
@@ -1,6 +1,8 @@
#ifndef LINUX_KMEMCHECK_H
#define LINUX_KMEMCHECK_H+#include <linux/types.h>
+
#ifdef CONFIG_KMEMCHECK
extern int kmemcheck_enabled;@@ -24,6 +26,7 @@ void kmemcheck_mark_uninitialized_pages(
#ifndef CONFIG_KMEMCHECK
#define kmemcheck_enabled 0
static inline void kmemcheck_init(void) { }
+static inline bool kmemcheck_page_is_tracked(struct page *p) { return false; }
#endif /* CONFIG_KMEMCHECK */#endif /* LINUX_KMEMCHECK_H */
--
Oh, oops. Of course... Thanks, you shouldn't have had to do that :-(
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
On Thu, 17 Apr 2008 20:47:06 +0200
kmemcheck= is documented in at least three places, which is nice, but it
isn't mentioned in the place where we document kernel-parameters:
Documentation/kernel-parameters.txt. A brief section there which directs
the user to the extended docs would be fine.early_param() is unusual - we normally use __setup(). I assume there's a
reason for using early_param(), but that reason cannot be discerned fromThis is not the preferred way of laying out function declarations but I've
(void *)address
Perhaps we should get all this code onto the list(s) for re-review. It's
been a while..--
Hi Andrew,
Thank you very much for the review of this patch. Those are hard to
come by, and I've posted kmemcheck to LKML already 3 or 4 times, with
relatively sparse response. I mean, the fact that they were ALL
whitespace damaged, but discovered by nobody, quite plainly tells me
that nobody actually tried to apply it (except perhaps Daniel Walker,
but we never realized it was whitespace damage causing the problems).
The patches that Ingo took into x86 were probably sent as an
attachment...On Thu, Apr 17, 2008 at 9:43 PM, Andrew Morton
Yes, sorry. I had actually already e-mailed this as a separate patch
to Ingo before sending this to the list, so he should know. But itThe reason is that we need to set this before kmalloc() is ever
called. A comment will come.But it seems that __setup() is what is really missing a comment. I
don't know what it is or how it works, and the comments around theThis will be turned into unsigned long with 64-bit support. (Hopefully
we can get that working too.)Changing these to match the rest of the kernel is no problem for me.
It is not the way I would write it, but Pekka and Ingo has already
forced me to write if () instead of if(), so there should be no reasonI'm not sure it would make much of a difference, except perhaps for
you, if you want to review it all. (My latest post to LKML had 0
replies in total. Well, except private e-mail exchange with Ingo and
Pekka; they should know the code already. Once again, thanks to them
for helping me.) Do you still want me to post it again?Thank you.
Vegard
PS: And it's not that I do that much testing/reviewing myself. But I
do think I have the excuse of being a newbie at this :-)--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
On Thu, 17 Apr 2008 22:39:55 +0200
It makes my head spin too. Reading through the first bit of
These things are OK as-is, I think. It'd be somewhat less nice in
situations where newly-added code was inconsistent with surroundingmm... I wouldn't mind taking a closer look at it all. That documentation
file makes it _much_ easier to review the code, and the review becomes moreYou're in good company ;)
--
From: Andrew Morton <akpm@linux-foundation.org>
I think things would have been a lot easier if you had adopted
basing -mm on top of linux-next from the very start.The responsiveness to problems and merge hassles has been so much
superior and efficient from the linux-next folks for my networking
stuff, for example. It's never been like that with -mm.
--
linux-next ramped up across the 2.6.25-rc window and I wanted to give it
I don't know what you mena by this. But linux-next integrates only the
other subsystem trees and they have rarely caused me integration problems
against git-net.
--
Well. git-lblnet was a big problem but it was a once off.
git-wireless used to cause problems but that's now merging into git-net
more often.git-netdev-all was sometimes a problem but that's gone altogether.
So git-net integration because much simpler during 2.6.25-rcX.
--
From: Andrew Morton <akpm@linux-foundation.org>
Fair enough, thanks for the explanation.
--
There are maybe as many as 100 "subsystem trees" hosted in -mm. Stuff like
md, ipmi, tty, elf, keys, procfs, char drivers, nbd, fbdev, aoe, fuse, edac
and the list goes on.Once I get -mm based on linux-next, the next step is to somehow feed those
trees (well, the "stable" parts thereof) back into linux-next while not
losing track of all the patches. I haven't a clue how I'll do this ;)But I haven't thought about it much yet.
--
I did a test merge.
- About 13 patch rejects agaisnt git-kvm
- Multiple minor rejects aginst the IDE tree, the PCI tree
- Minor bustage of git-semaphore
- Several core MM patches which I also had merged. I didn't check fully
whether they are the same.- extensive damage to the page-flags patches
Did you check that all architectures and configurations still have
sufficient page flags for us to be able to consume another one for
kmemcheck? The MM developers have put much, much effort into avoiding
running out of flags over numerous years and afaik none of them even know
that this debug feature is using one of the few remaining ones.What do we do when we run out?
- rejects in capabilities-implement-per-process-securebits.patch
- The proposed PR_GET_TSC and PR_SET_TSC have the same values as
PR_GET_SECUREBITS and PR_SET_SECUREBITS. Because we never knew this
before we didn't get to discuss which one needs to be altered as we
normally would.- several rejects in
x86-olpc-add-one-laptop-per-child-architecture-support.patch- several bitops patches (use-__fls-for-fls64-on-64-bit-archs, etc) were
also in -mm. I did not check for differences between the two versions.These are not x86 patches.
- maybe ten-odd minor rejects in other places.
So not as bad as it might have been. kvm and page-flags are the major
problems. Of course, none of this has been compiled and this proposed code
combination has never been tested by anyone at runtime.--
Hi,
Would it be feasible to add another unsigned long to struct page? I
mean, extending such a common structure always sucks, but for
emergency...#define PageFoobar(page) test_bit(PG_foobar, &(page)->flags2)
Of course the essential core flags should always be in ->flags but
perhaps we could have a symbol CONFIG_NEED_EXTRA_PAGE_FLAGS that gets
selected by kmemcheck (and other candidates that are unlikely to be
enabled most of the time) and then #ifndef ->flags2 out.Hannes
--
Yes, but I think that only applies to PG_tracked.
We may be able to reclaim PG_buddy by putting various fields in the
pageframe to idiotic otherwise-cant-happen states. Likestatic inline bool PageBuddy(struct page *page)
{
return page->mapping == (long)&page->private;
}or something. But these things are so overloaded it gets tricky.
--
Once git-x86 is pulled, I will rebase kvm.git against -linus, retest,
and submit (have to wait for s390 and ia64 anyway).--
error compiling committee.c: too many arguments to function--
Did this ever got posted publically somewhere?
--
People will sometimes send me offlist patches and I'll usually notice it and
will always ask them not to, often requesting a resend. This one was on
lkml, subject "[PATCH 3/3] OLPC: add One Laptop Per Child architecture
support"
--
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Jared Hulbert | [PATCH 00/10] AXFS: Advanced XIP filesystem |
| Linus Torvalds | Linux 2.6.27-rc8 |
git: | |
| David Miller | [GIT]: Networking |
| Antonio Almeida | HTB accuracy for high speed |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
