This message has been generated automatically as a part of a report
of recent regressions.The following bug entry is on the current list of known regressions
from 2.6.24. Please verify if it still should be listed.Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10391
Subject : 2.6.25-rc7/8: Another resume regression
Submitter : Mark Lord <lkml@rtr.ca>
Date : 2008-04-03 15:06 (6 days old)
References : http://lkml.org/lkml/2008/4/3/283--
Today I've been using 2.6.25-rc8 with an old embedded build system here
for my empegs. One shell script calls out to /usr/bin/ftp to transfer
an image to a remote system, and then read it back again and compare.The compare is failing, most (but not all) of the time,
but only on 2.6.25-rc8, not on 2.6.24. Verified by switching
back and forth between kernel versions for a short spell.The ftp client is netkit-ftp 0.17-16 on Kubuntu feisty.
Switching to ncftpput/ncftpget avoids it on 2.6.25,
but I wonder where the problem is.Too many things in the chain to easily debug.
-ml
--
..
Now verified that the data loss occurs in the outbound direction.
The readback data is the same, regardless of which client s/w is used.So something in 2.6.25 is incompatible with the ftp client binary, or libs,
that are installed here. Or some other problem.??
--
Or maybe it uses sendfile, and that is broken?
Also, try using ethtool to turn off TSO and/or checksumming on your NIC
(if it is not wireless), and see if behavior changes...Jeff
--
..
No, it uses read()/write() calls (from the strace).
..The failing FTP client software issues a close() on the socket after
the final data write(). This close seems to be propagated to the other
end before the data is fully received.I suppose a wireshark capture is next, once I dig out my ancient hub
so we can sniff it from an independent box.-ml
--
..
Meanwhile, here is the strace of the FTP client from the host side.
Nothing strange -- it opens the socket, the file, and does read()/write()
pairs to move all of the data down the line to the remote.
It then close()s both of them, and gets the "426 Connection failed"
response after the remote end sees a premature -EPIPE from sock_recvmsg().Cheers
execve("/usr/bin/ftp", ["ftp", "10.0.0.26"], [/* 39 vars */]) = 0
brk(0) = 0x9b36000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f70000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=104744, ...}) = 0
mmap2(NULL, 104744, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f56000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libreadline.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\317\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=196560, ...}) = 0
mmap2(NULL, 199764, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f25000
mmap2(0xb7f51000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2c) = 0xb7f51000
mmap2(0xb7f55000, 3156, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f55000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\362"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=268600, ...}) = 0
mmap2(NULL, 273860, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7ee2000
mmap2(0xb7f1c000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x39) = 0xb7f1c000
close(3) ...
Or, you could do "git-bisect" if it is reproducible.
--yoshfuji
--
..
If I had the time right now, maybe.
But it would be far more useful for whoever has been working on the stack
to suggest some possible/likely commits to look at instead.-ml
--
'git bisect run <cmd>' will automatically find your problem for you, if
it's reproducible, scriptable, and you have a second box.Jeff
--
From: Mark Lord <lkml@rtr.ca>
Personally all I see is that one side closes the socket before all
data packets received have been read into the application, resulting
in a (correct) reset going out.I can't think of any change we've made over the course of this
release that would change behvaior in that area.So you will likely need to bisect.
--
..
Or I can ignore it, like the net developers, since I have a workaround.
And then we'll see what other apps are broken upon 2.6.25 final release.Really, folks. Bug reports are intended to *help* the developers,
not something to be thrown back in their faces.There do seem to have been a *lot* of changes around the tcp closing/close
code (as I see from diff'ing 2.6.24 against latest -git).*Somebody* is responsible for those changes.
That particular *somebody* ought to volunteer some help here,
reducing the mountain of commits to a big handful or two.Cheers
--
Sure, if you count in all whitespace/indentation/code moving changes to
I might help if would add netdev on cc list in case you really want to
Those touching fin/close are mostly whitespace/move things, so I doubt
that you find these useful but in case you insist, here's the list:056834d9f6f6eaf4cc7268569e53acab957aac27 [TCP]: cleanup tcp_{in,out}put.c style
058dc3342b71ffb3531c4f9df7c35f943f392b8d [TCP]: reduce tcp_output's indentation levels a bit
490d5046930276aae50dd16942649bfc626056f7 [TCP]: Uninline tcp_set_stateIn addition, there's this one (...though I have read it number of times
through and still cannot catch something that would cause the wrongness
you're seeing):e870a8efcddaaa3da7e180b6ae21239fb96aa2bb [TCP]: Perform setting of common
control fields in one placeThere's very little really on interesting side I can think of, mostly
thinks are congestion control related changes... ...maybe either one of
these could cause something unpleasant in some corner case:bd515c3e48ececd774eb3128e81b669dbbd32637 [TCP]: Fix TSO deferring
0e3a4803aa06cd7bc2cfc1d04289df4f6027640a [TCP]: Force TSO splits to MSS boundaries...e.g., if the latter causes a return with zero limit under some
conditions, tso_fragment might generate, well, interesting packets and
never finish if the condition persists but.--
i.
--
..
Oh.. I didn't know about that list. How does that differ from linux-net ?
..That matches my own assessment there, too: lot's of whitespace changes,
and not much real code difference on most paths. Bummer. :)-ml
--
On Thu, 10 Apr 2008, Mark Lord wrote:
> Ilpo J
From: Mark Lord <lkml@rtr.ca>
It's a two way street, we asked for a bisect which helps us a lot.
In fact, lately I notice a strong unwillingness to bisect on your
part, in particular.
--
Bisecting is a time-consuming process. If unwillingness to bisect
is unacceptable in a bug reporter then people who don't have the
time to bisect must stop reporting the problems they encounter.--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On 10/04/2008, Tilman Schmidt <tilman@imap.cc> wrote:
I hope that was a joke and that I just don't get it.
Are you really saying that if somebody find a bug they shouldn't
bother reporting it unless they are willing to spend hours and hours
of work to get it fixed?The way I see it, the burden of debugging and fixing bugs is mainly on
the developers of the code that breaks. You can't blame users for
using the code, triggering bugs and then reporting the breakage.
Users who report bugs are doing us all a great service regardless of
their ability or willingness to do more work than just the initial
report.If bugs don't get reported they'll never get fixed. Even a bad bug
report with no follow up at all still allows us to use it to gauge how
often a specific bug is being hit and thus how important it may be to
fix it.You can't expect users to know how to debug a problem or even bisect
it. A user may not even be able to compile a custom kernel but she may
still hit a bug and do us the favour of reporting it. It should be the
job of the developer of the code to investigate the bug following a
users report.Sure it's great when users can bisect, provide test cases, debug the
problem completely themselves or even provide a patch, but you can't
expect that. And in my oppinion you certainly can't just hide behind
"the user doesn't want to bisect so I won't fix this" and use that as
an excuse for the code being buggy. I hope most people take bug
reports more seriously than that.When people discover bugs in my own code I thank them and feel a bit
ashamed that I didn't do my work properly and it then becomes very
important to me to make sure I squash the bug. The more the user can
help the better, but if they cannot help beyond telling me what broke
and how, then that's fine too. I still want to nail the bug and I'll
just have to do more work myself, but it becomes a matter of personal
and professional pride to hunt down the bug.We need to be grateful to users who r...
From: "Jesper Juhl" <jesper.juhl@gmail.com>
[ The person you are replying to was being sarcastic, BTW. ]
That's not the case we're talking about in this specific instance. In
this particular case the user is more than capable of bisecting, he
just isn't willing to invest the time.And I'm supposed to be willing to invest the time to analyze the TCP
dumps or whatever to diagnose the problem? And I guess I should do
this for every single networking bug report or issue? Who is
going to clone me and the rest of the core networking developers
so that this is actually tenable?That's ludicrious, I don't have a reproducer, this person does. And
if they bisect, we'll know _exactly_ what change introduced the
problem. Then I can use my brain to figure out the correct way
to resolve the problem.Bisecting is a mindless activity that saves developers tons of time.
What people don't get is that this is a situation where the "end node
principle" applies. When you have limited resources (here:
developers) you don't push the bulk of the burdon upon them. Instead
you push things out to the resource you have a lot of, the end nodes
(here: users), so that the situation actually scales.
--
..
Duh.. more like, "If I take 5-8 hours to attempt a bisect (which may not
even work), then that's 5-8 hours I do not get paid for."Gotta eat, dude.
Anyways, here's five hours of free consulting for you:
git-bisect start
# bad: [7180c4c9e09888db0a188f729c96c6d7bd61fa83] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
git-bisect bad 7180c4c9e09888db0a188f729c96c6d7bd61fa83
# good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
# bad: [e5dfb815181fcb186d6080ac3a091eadff2d98fe] [NET_SCHED]: Add flow classifier
git-bisect bad e5dfb815181fcb186d6080ac3a091eadff2d98fe
# good: [00e0b8cb74ed7c16b2bc41eb33a16eae5b6e2d5c] b43: reinit on too many PHY TX errors
git-bisect good 00e0b8cb74ed7c16b2bc41eb33a16eae5b6e2d5c
# good: [42d545c9a4c0d3faeab658a40165c3da2dda91b2] x86: remove depends on X86_32 from PARAVIRT & PARAVIRT_GUEST
git-bisect good 42d545c9a4c0d3faeab658a40165c3da2dda91b2
# good: [6232665040f9a23fafd9d94d4ae8d5a2dc850f65] Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
git-bisect good 6232665040f9a23fafd9d94d4ae8d5a2dc850f65
# good: [e5723b41abe559bafc52591dcf8ee19cc131d3a1] [ALSA] Remove sequencer instrument layer
git-bisect good e5723b41abe559bafc52591dcf8ee19cc131d3a1
# good: [461e2c78b153e38f284d09721c50c0cd3c47e073] [ALSA] hda-codec - Add Conexant 5051 codec support
git-bisect good 461e2c78b153e38f284d09721c50c0cd3c47e073
# good: [1987e7b4855fcb6a866d3279ee9f2890491bc34d] [AX25]: Kill ax25_bind() user triggable printk.
git-bisect good 1987e7b4855fcb6a866d3279ee9f2890491bc34d
# good: [58a3c9bb0c69f8517c2243cd0912b3f87b4f868c] [NETFILTER]: nf_conntrack: use RCU for conntrack helpers
git-bisect good 58a3c9bb0c69f8517c2243cd0912b3f87b4f868c
# good: [32948588ac4ec54300bae1037e839277fd4536e2] [NETFILTER]: nf_conntrack: annotate l3protos with const
git-bisect good 32948588ac4ec54300bae1037e839277fd4536e2
# bad: [e83a2ea850bf0c0c81c6754440809...
From: Mark Lord <lkml@rtr.ca>
Thanks Mark.
Pavel can you take a look? I suspect that the namespace
changes or gets NULL'd out somehow and this leads to the
resets because the socket can no longer be found. Perhaps
it's even a problem with time-wait socket namespace
propagation.
--
..
My system here is now set up for quick/easy retest, if you have any
suggestions or patches to try out.Thanks guys.
--
Please try this, from net-2.6.26 tree.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
----
From 8d9f1744cab50acb0c6c9553be533621e01f178b Mon Sep 17 00:00:00 2001
From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Fri, 21 Mar 2008 04:12:54 -0700
Subject: [PATCH] [NETNS][IPV6] tcp - assign the netns for timewait socketsCopy the network namespace from the socket to the timewait socket.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/inet_timewait_sock.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 876169f..717c411 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -124,6 +124,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int stat
tw->tw_hash = sk->sk_hash;
tw->tw_ipv6only = 0;
tw->tw_prot = sk->sk_prot_creator;
+ tw->tw_net = sk->sk_net;
atomic_set(&tw->tw_refcnt, 1);
inet_twsk_dead_node_init(tw);
__module_get(tw->tw_prot->owner);
--
1.4.4.4--
YOSHIFUJI Hideaki @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
--
Too late, but still
Acked-by: Pavel Emelyanov <xemul@openvz.org>Sorry, guys, but my timezone does not allow me to react in time
to found bugs :( So, when I wake up in the morning I usually just
find out that someone has caught a BUG made by me and someone--
..
Works perfectly, thanks. Looks obvious, too.
Push it out to Linus now for 2.6.25.Thanks!
--
From: Mark Lord <lkml@rtr.ca>
Will do, thanks for testing.
--
From: Mark Lord <lkml@rtr.ca>
And if I invest my spare time on your bug how does this statement
apply to me? Or does it only apply to you?Every single argument you make that supports why you should not be
investing the necessary time into the bug applies equally to the
very developers you are so quickly to quip at and want help from.
--
I think you got it backwards. Mark and other bug reporters (including,
at times, yours truly) are helping you and other developers to make
Linux better. Most of the times I report a bug, I am not asking for help
- I have no personal need to get it fixed, as I can easily avoid it, and
I only report it to give developers like you a chance to fix it before
it really hurts someone - and I gather that Mark has been in a similar
position wrt to the bug in question.So what would you have us do? Not report the bugs we find so that you
don't have to invest your spare time on "our" bugs? Report them and
accept a rebuke for our "unwillingness" to do even more benevolent work
than we already did? Report only those for which we really need a fix,
and are consequently willing to invest additional time?Thanks,
Tilman
From: Tilman Schmidt <tilman@imap.cc>
I appreciate the bug reports, believe me.
The issue is which of the limited developer resources get put onto
which bugs.A developer who does this for fun is going to prioritize to things
that are pleasant and interesting to work on, and also a good
effective use of their time.So people prioritize.
Therefore, my point is, the net result is that user have a direct
influence on which bugs get worked on with the highest priority and
thus get fixed faster. And those are the ones that have the most
information available, and in particular bisec results when
appropriate.
--
..
It's not "my bug". I'm just the first person to notice,
take time to report it, and even hand it to you on a platter (bisect).It's *your* bug -- you signed off on the commit.
Cheers
--
From: Mark Lord <lkml@rtr.ca>
I sign off on basically every networking commit, does that mean I have
to fix every networking bug and every networking bug is "mine"?Of course not, that doesn't scale at all. What does scale is a
combination of good fully formed bug reports from users combined with
the efforts of the global developer pool.Linus signs off on every patch from Andrew Morton he puts into the
tree, which is a lot, but does Linus work on every bug introduced by
one of those patches and are such bugs "his" bugs? Of course he
doesn't, and of course not. They get pushed up to the person
who wrote the patch once identified as such, and the patch is
reverted if the developer is unresponsive and this will have
consequences for patches they submit in the future.I still think you have a very self-centered attitude about things.
This is about distributing effort, not forcing it upon individuals
or a constrained resource.If I get hit by a bus, networking bugs would still get fixed if
handled properly.And it's a win-win situation. The incentive for a capable user to do
a bisect or whatever else is that if they do it their bug gets fixed
quickly. That is the free market economy of Linux kernel bug
reporting.It addresses the issue that in reality we'll never fix all bugs, and
therefore we prioritize. And therefore if there is a bisected bug
report and also another one from a user who refuses to do that, guess
which bug gets worked on with a higher priority and which bug gets
fixed first?
--
this argument is a fallacy because it assumes that the Linux kernel is a
closed ecosystem and i'm really surprised to see you advance this
economic argument.i remind you: Linux is very much not a closed ecosystem.
... and hence, your "free market economy of bugs" that in essence
strongly suggests users to do bisections when they find bugs in
networking, works exactly the way you did not intend it to work: it
pushes users towards other OSs.It pushes them towards Solaris, FreeBSD, MacOS and even Windows. That
happens because the barrier to getting bugs fixed is _increased_ - and
users might find it easier to participate in the ecosystem of other OSs
- instead of having to compete with "each other" for the attention of
the head honcho (you).You have a unique position within Linux: through a decade of hard and
excellent work you have built a quasi-monopoly to all things networking
commits: if you say about something that it should go into networking it
will, if you say that it should stay out, it wont go in.So it is fundamentally _you_ who determines the feature/fix ratio in the
networking code, and it is _you_ who determines the amount of bugs users
have to find! There's no real competition for your position - it would
take years for anyone to replace you. (and it would be a shame and a
loss - you do your job so well)No doubt about it: bisection is very nice, it's one of the best things
that happened to Linux debuggability in the past 2 years, i use it
heavily myself, but please do _not_ require it from testers and users.
They dont have nice 32-way Niagara's to build a kernel in 1 minute. They
dont have nice virtualization to do easy bisection. Take bisection as an
additional gift/tool but dont make it a semi-required aspect of your
subsystem. Pretty please.And _PLEASE_ realize that the networking bug-count has been created
primarily by _you_, because it is you who throttles the amount of new
code in new kernel releases. If you cannot cop...
From: Ingo Molnar <mingo@elte.hu>
I don't. I ask for a bisection when it is appropriate and I
think other avenues will not bear fruit in a reasonable
amount of time.Thanks for the arbitrary diatribe about my contributions over
the years and accusations that I have some kind of monopoly
over the networking code and fixes to it. I really appreciate
that.
--
i'm glad i misunderstood you. My impression from reading this thread was
that you preferred reporters who do bisection (which is fine so far), toyou certainly do have a fair amount of exclusivity in determining the
dosage of networking commits. Dont get me wrong, you earned it and you
deserve it - not the least because you do it best.Ingo
--
..
Absolutely, though to a varying degree. That's the responsibility
that goes with the role of a subsystem maintainer. I once had
such a role, and gave it up when I felt I could no longer keep up.You still keep refering to it as "your (my) bug".
It's not. I had nothing to do with it, other than stumbling over it.When people stumble over a libata bug, I look hard to see if my code
could possibly cause it. Jeff looks even harder, because he's the
current subsystem dude for libata.I never suggest a user search through a mountain of unrelated commits
for something I've screwed up on. I give more directed help, patches
to collect more relevant information, and patches to try and resolve it.The last thing I'd ever do, is diss the reporter.
Regards.
--
Like it or not, when you're the owner of the only box that can reliably
reproduce an error condition, it's your bug.Been there, done that, plenty of times.
Thanks for the advice. I'll keep it in mind next time I have to decide
whether to report a bug I'm stumbling over.T.
Well, the fact is, reporting bugs is always welcome.
However, it may not be immediately obvious what causes the bug to appear
as well as the bug need not be readily reproducible on any other system than
yours, at least at the moment.In which case whether or not the bug will be fixed depends on the reporter.
Namely, if the reporter wants and has the time to provide developers with
additional information, the bug has a good chance to be fixed. Otherwise,
it'll probably stay there until there's a more persistent reporter or it's
fixed as a result of a related change.So, if people ask you to do a bisection, they probably mean "we don't see
what the problem is and can't reproduce it, so please get us more information,
otherwise we won't know how to fix it". In that case, you could provide them
with a reproducible test case just as well.That said, there may be some developers who just don't want to spend time on
analysing code and put the burden of finding the offending change on the
reporter, but I don't think it's common practice.Thanks,
Rafael
--
Very true. One other thing which might get confusing/frustrating on the
user side is that currently, Linux is the *only* product which requires
the bug reporter to find the fault change (yes, I know, it's scalable).
All other products the reporter uses work differently: the reporter
contacts the editor/author/support/... and briefly describes his problem.
Support asks him for a bit more details, remains silent for some time,
then comes up with a patched version to confirm that the bug is fixed.So it is understandable from the user's standpoint that Linux appears
quite complex to report bugs. But we should remind users that LKML is
*not* a place to get free kernel support, but it's a *development*
mailing list, and that it is somewhat expected that developers ask
reporters for more development related contribution.But if the reporter does not want to/cannot do much more, we should
not aggress him, and point it to other places instead (eg: at least
create an entry in bugzilla so that their report is not lost, and
they have a chance to get contacted when the fix is known).Regards,
Willy--
It's a pretty common procedure for compilers (gcc, llvm) too, although
they have the advantage that given a test case usually someone else
can run the bisect procedure because they do not depend on the underlying
hardwareThat's unfortunately not the case for most kernel bugs, although
sometimes it is possible given a hardware independent test case. And
while most of the kernel code is drivers and arch, a lot of it is
still pretty hardware independent, so at least in some cases it is
possible to submit test cases and then let someone else (like a bug
master) do the bisect.Of course it is unclear if producing a submittable test case will be
actually any faster than just running bisect for the user.That said I agree it's a big burden to run bisect for everything
because it can take very long (especially if the problem
is not trivially reproducable)It would be fair at least if maintainers always gave some candidate
commit ids when asking for bisect for likely changes that could
have matched the bug. Then those could be checked quickly first
before doing the full run.While that will not always work it would be still a useful short cut
and save a lot of time for the reporter.-Andi
--
And most of all, the reporter would not feel like the bisection is
Willy
--
Well it is proportional to the quality of the bug report. If it
very vague enough often there is no other good answer. If it
comes with already some debugging or good logs or a good test case
etc. I agree just saying "please bisect" is not very nice (but
sometimes it might be still needed if code review doesn't find
anything)Perhaps there should be a document somewhere explaining this which
can be easily pointed to.-Andi
--
That's not true, for several regressions I reported to the Wine Bugzilla
I had been asked to git bisect for the commit that broke it.And I'd actually assume that it's quite common for git using open source
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
hm, who does this - i've seen networking folks do it but does anyone
else do it? Such cases are _clear_ abuse of users and they'll do the
obvious thing: vote with their feet.I only ask people to bisect it when all other avenues fail - and even
then i try to make it clear that bisection is just something they can
_optionally_ do to speed things up (it's never required), and that it's
a pure opt-in.doing _kernel_ bisection is totally hard at the moment - it disrupts the
user way too much and causes many hours of work for most users. [
Requiring bisection for userspace projects might be more doable. (but
even there's it's wrong when it's not automated completely and where a
failure pattern is not deterministic.) ]Ingo
--
It depends. Sometimes the bisection can be done in qemu/kvm/xen or
similar tools. At least if the problem is not too hardware
dependent. And more and more people actually run in such environments.I can also do it faster with autoboot or nfs root/powerswitch, but
admittedly that's a very specialized setup most people don't have.Still I agree with your basic point that it should be only
last resort.-Andi
--
Hi.
Bugs are bugs, they either depend on hardware or do not.
There is no perfect world where after reporting subtle bug it will be
fixed. It is not Linux, it is everywhere. Bugs are only fixed when
they have major impact. Only. Either by having exploit, or crash,
or good testcase. Or bisect result.This just a tool to help both parties. And a huge help for regressions.
Yeah, spent two weeks kicking all possible stuff around and eventually
drop that namespace patch at all to find where the problem was. We
started to move further.Bisect is just a tool. It is not something developers throw into user
when they do not want to work. This _is_ a help, which allows both to
solve problem in the fastest way.If the same would be done on developers machine and huge patches would
be sent to jump between changesets, that would be a real 'work closely
with the reporter working out why the reporter's failure was occurring'?You pointed it yourself: several days of back-and-forth.
With this helping automation tool called bisect bug was resolved in 15There is also global warming tendency. IIRC.
Bugs _are_ fixed, Andrew. And developers did not change suddenly to
selfish bastards who do not care for users. They just developed a tool,
which greatly helps to both and saves lots of users time, since
regression gets fixed with this tool really quickly. Bisect is not asked
to be performed without a reason. For subtle bug it is the fastest way,
but otherwise there might be a long conversation. And even in this
really subtle case there was a dialog.Bisect automation does not add kind relations though, but we can ask
Linus to add couple of smiles into the output.--
Evgeniy Polyakov
--
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
In fact, this is what Andrew's so-called "back and forth with the bug
reporter" used to mainly consist of. Asking the user to try this patch or
that patch, which most of the time were reverts of suspect changes.Which, surprise surprise, means we were spending lots of time
bisecting things by hand.We're able to automate this now and it's not a bad thing.
--
To be honest, at least in one case no one reacted to my report(s) until I ran
a bisection and then it turned up an obviously broken patch. The breakage
was so obvious that if anyone had actually looked at the code in question, he
would have see it immediately.Things like this are very disappointing and have a very negative impact on bug
reporters. We should do our best to avoid them.Thanks,
Rafael
--
Shit happens. This is a matter of either bug report or those who were in
the copy list. There are different people and different situations, in
which they do not reply.--
Evgeniy Polyakov
--
Well less shit would happen if developers would take the time to at least test
their patches before they were submitted. It like we will just have the poor
user do our testing for us. What kind of testing do developers do. I been a
linux user and have followed the LKML for a number of years and have yet to see
any test plans for any submitted patches.My $.02
Steve Clark--
"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety." (Ben Franklin)"The course of history shows that as a government grows, liberty
decreases." (Thomas Jefferson)--
You haven't looked closely then. While it's not very common there
is a non trivial number of patches who describe how they got tested
in the patch description.-Andi
--
cross-posted to git for the suggestion at the bottom
I've been reading LKML for 11 years now, I've tested kernels and reported
a few bugs along the way.the expectation is that the submitter should have tested the patches
before submitting them (where hardware allows). but that "where hardware
allows" is a big problem. so many issues are dependant on hardwre that
it's not possible to test everything.there are people who download, compile and test the tree nightly (with
farms of machines to test different configs), but they can't catch
everything.expecting the patches to be tested to the point where there are no bugs is
unreasonable.bisecting is a very powerful tool, but I do think that sometimes
developers lean on it a bit much. taking the attitude (as some have) that
'if the reporter can't be bothered to do a bisection I can't be bothered
to deal with the bug' is going way too far.if a bug can be reproduced reliably on a test system then bisecting it may
reveal the patch that introduced or unmasked the bug (assuming that there
aren't other problems along the way), but if the bug takes a long time to
show up after a boot, or only happens under production loads, bisecting it
may not be possible. that doesn't mean that the bug isn't real, it just
means that the user is going to have to stick with an old version until
there is a solution or work-around.even in the hard-to-test situations, the reporter is usually able to test
a few fixes, but there's a big difference between going to management and
saying "the kernel guru's think that this will help, can we test it this
weekend" 2-3 times and doing a bisection that will take 10-15 cycles to
find the problem.it's very reasonable to ask the reporter if they can bisect the problem,
but if they say that they can't, declaring that they are out of luck is
not reasonable, it just means that it's going to take more thinking to
find the problem instead of being able to let the mechanical bisect
pr...
[...]
Agreed. The difficulty is that only the developer knows how confident
he is in his code. Even the subsystem maintainer does not know, which
is the real issue since as long as the code is not identified, he does
not know whom to ping.And I think that it might help if we could add a "Trust" rating to the
patches we submit, similarly to "Tested-By" or "Signed-off-by". We could
use 1 to 5. Basically, when the patch was completed at 3am and just builds,
it's more likely 1/5. When it has been stressed for 1 week, it would be
4/5. 5/5 would only be used in backports of known working code, for some
wide-used external patches, or for trivial patches (eg: doc/whitespace
fixes). The goal would clearly not be to just trust patches with a high
rate (since they might break when associated with others), but for the
subsystem maintainer to quickly check if there are some of them the
author does not 100% trust, in which case he could ping the author to
check if his patch *may* cause the reported problem.What makes this rating system delicate is that the rate cannot be changed
afterwards. But after all, that's not much of a problem. A bug may very
well reveal itself one year after the code was merged, so it's really the
developer's estimation which matters.For this to be efficiently used, we would need git-commit to accept a
new "-T <rating>" argument with the following possible values :0: untested (default)
1: builds
2: seems to be working
3: passed basic non-regression tests
4: survived stress testing at the developer's
5: known to be working for a long time somewhere elseI'm sure many people would find this useless (or in fact reject the
idea because it would show that most code will be rated 1 or 2),
but I really think it can help subsystem maintainers make the relation
between a reported bug and a possible submitter.Willy
--
On Mon, Apr 14, 2008 at 06:39:39AM +0200, Willy Tarreau wrote:
I have a related proposal: let us require all patches to be stamped
with Discordian *and* Eternal September dates. In triplicate. While
we are at it, why don't we introduce new mandatory headers like, say
it,X-checkpatch: {Yes,No}
X-checkpatch-why-not: <string>
X-pointless: <number from 1 to 69, going from "1: does something useful" all
the way to "68: aligns right ends of lines in comments">
X-arbitrary-rules-added-to-CodingStyle: <number> (should be present if
and only if X-pointless: 69 is present).Come to think of that, we clearly need a new file in Documentation/*,
documenting such headers. Why don't we organize a subcommittee^Wnew maillist
devoted to that? That would provide another entry route for contributors,
lowering the overall entry barriers even further...Seriously, looks like Andi is right - we've got ourselves a developing
beaurocracy. As in "more and more ways of generating activity without
doing anything even remotely useful". Complete with tendency to operate in
the ways that make sense only to beaurocracy in question and an ever-growing
set of bylaws...
--
No. The problem we're discussing here is the apparently-large number of
bugs which are in the kernel, the apparently-large number of new bugs which
we're adding to the kernel, and our apparent tardiness in addressing them.Do you agree with these impressions, or not?
If you do agree, what would you propose we do about it?
--
Does that mean you're not going to take patches that align the right end of
lines in comments? :-(Rene.
--
On Mon, 14 Apr 2008 21:13:41 +0200
erm, was that ":-(" supposed to be a ":-)"?
I don't like to merge patches which fix typos and spellos and grammaros
in comments, simply because I'd be buried in the things. I do take such
fixes for user-visible text (Documentation/, kerneldoc comments and
printks).Right-justification of comments would fall rather a long way below spelling
fixes.--
The ":-(" was supposed to add to the implicitly obvious ":-)". That is, was
You, particularly, seem to be very good at picking up trivia. I've posted
completely trivial patches from time to time for small things I encounter
while looking at something else. Things at the "are people going to look
funny at me for even bothering or..." level but you picking them up means
it's still useful to post, so I sometimes do.Now, in fact, Linux as a _whole_ doesn't seem bad at accepting that kind of
small janitorial stuff but I have been noticing some backlash to it as well.
I'm not sure it's worse or better than historically, but the "checkpatch
syndrome" certainly triggers more of it.Al specifically wanted more new eyes but the way to reward those new eyes is
accepting their small changes. Al also specifically doesn't like those small
changes when at the level of the automated and semi-brainless checkpatch level.I believe the janitorial work has been over-organized, both through the
kernel-janitors and checkpatch since while these are very useful in guiding
a newbie in _what_ to do they cause "automated" huge tree-wide trivia storms
which people then don't react overly favourable to and the new eyes who did
all that work of generating it all dim again...Frankly, the kernel really is fairly complex these days when starting at 0.
Much more complex certainly than, say, back in 2.0 or 2.2 days and while
Al's scenario of per-subsystem reviews might be good, I don't believe it's
very realistic. Companies don't pay to have those done and for newbies it's
generally too complex since understanding most parts of the kernel fully,
requires understanding most of the rest kernel rather well also.So you get the really promising newbies? Yeah, that, or you don't get anyone
and if some promising newbies are building up 137 part checkpatch inspired
patchsets that don't help none.So, what am I saying (what _am_ I saying?!?) ...
I seemed to observe somewhat of an interna...
In addition to obvious "we need testing and something better than bugzilla
to keep track of bugs"? Real review of code in tree and patches getting into
the tree.And the latter part _must_ be done on each entry point. Any git tree
that acts as injection point really needs a working mechanism of some
sort that would do that; afterwards it's too late, since review of
the stuff getting into mainline on a massive merge is sadly impractical.I don't know any formal mechanism that could take care of that; no more
than making sure that no backdoors are injected into the tree. It really
has to be a matter of trust for tree maintainers and community around
the subsystem.Git is damn good at killing the merge bottleneck. Too good, since it
hides the review bottleneck. And we get equivalents of self-selected
communities that had been problem for "here's our CVS, here's monthly
dump from it, apply" kind of setups. It _is_ better, since one can
get to commit history (modulo interesting issues with merge nodes and
conflict resolution). But in practice it's not good enough - the patches
going in during a merge (especially for a tree that collects from
secondaries) are not visible enough. And it's too late at that point,
since one has to do something monumentally ugly to get Linus revert
a large merge. On the scale of Great IDE Mess in 2.5...linux-next might help with the last part, but I don't think it really
deals with the first one. It certainly helps to some extent, but...We need higher S/N on l-k. We need people looking into the subsystem
trees as those grow and causing a stench when bad things are found,
with design issues getting brought to l-k if nothing else helps. We
need tree maintainers understanding that review, including out-of-community
one, is needed (the need of testing is generally better understood - I
_hope_).We need more people reading the fscking source. Subsystem by subsystem.
Without assumption that code is not broken. With mechanism collating
...
There is currently little incentive for developers to perform review.
It's difficult work, and is generally not rewarded or recognized, except
in often quite negative ways. There is a small handful of people who do a
lot of review, but they are exceptional in various ways.OTOH, writing code is relatively simple, and is much more highly rewarded:
- People tend to get paid to write kernel code, but not so much to review
it.- Things like "who made the kernel" statistics and related articles ignore
code review.- Creating new features is perceived as the highest form of contribution
for general developers, and likely important as career currency
(similar to the publish or perish model in the academic world).I don't know how to solve this, but suspect that encouraging the use of
reviewed-by and also including it in things like analysis of who is
contributing, selection for kernel summit invitations etc. would be a
start. At least, better than nothing.- James
--
James Morris
<jmorris@namei.org>
--
Would it be hard to keep count of the number of errors introduced by
author and reviewer?
--
I'm not subscribed to the kernel mailing list, so please include me in
the cc if you don't reply to the git list (which I am subscribed to).Git is participating in Google Summer of Code this year and I've
proposed to write a 'git statistics' command. This command would allow
the user to gather data about a repository, ranging from "how active
is dev x" to "what did x work on in the last 3 weeks". It's main
feature however, would be an algorithm that ranks commits as being
either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
that can aid in determining this, a commit msg along the lines of
"fixes ..." being the most obvious.)
In the light of this recent discussion, especially the part on
"keeping count of the number of errors introduced by
author and reviewer?", I thought it might for the kernel mailing list
to be aware of this. Also mentioned in this thread was that reviewers
don't get enough credits. As long as patches are signed with, say,
'reviewed-by:', 'acked-by:' or 'signed-off-by:' the command I suggest
to implement would be able to give more accurate statistics on who
"works on the kernel". This way reviewers get the credit they deserve.
The knife cuts on both sides of course, if someone reviews a patch
that is later determined to introduce a bug, they can be recorded to
have acked a buggy commit. This is especially interesting in
determining who are the good reviewers, but also in determining who
are the good contributors. A distinction could be made between parts
of the source, say, a maintainer might excel in patches related to
driver foo, but when they submit a patch for driver bar it usually
contains bugs . Armed with these statistics reviewers might decide to
be more careful before acking a patch from that maintainer if it's on
driver bar, but when that same maintainer sends in a patch from driver
bar it is probably ok and needs less attention.
My application, and a more extended description, can be found here:
http://alturin.googlepages.com/gsoc2008
...
On 16/04/2008, Sverre Rabbelier <alturin@gmail.com> wrote:
Interresting. Just be careful results are produced for the big picture
One thing I thought of is that the more "Acked-by", "Reviewed-by" and
"Signed-off-by" lines a patch has, the better reviewed we can probably
assume it to be and thus the probability of it having introduced a bug
probably drops slightly compared to other less-reviewed patches... or
maybe not, but at least it's something to think about :-)--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
--
If there are individuals at whom a finger needs to be pointed, this
system will highlight them, and fingers will (and should) be pointed.
Contributors of poor-quality code need to be weeded-out.
Finger-pointing, in these extreme cases, gives incentive to improve
quality. It's a positive thing.
--
Sorry, but I have to disagree. Negative finger-pointing is never a good thing.
Also, it doesn't give any incentive to anyone. It only makes people feel bad
and finally discourages them from contributing anything.If you want to give poeple incentives, reward them for doing things you'd like
them to do.Thanks,
Rafael
--
Correct, but let's be careful here. The original suggestion was,
effectively, to get better metrics on the quality of contributions.
Those metrics *could* be used for finger pointing, or (my preference)
they could be used to direct and allocate our scarce resources: code
reviews and mentoring.There's no way to know what the metrics will tell us until we have
them. Arguing against metrics because they *may* be used to point
fingers at people is a silly argument; anything can be subverted to do
that.Let's get some measurements and see what they say. In the meantime,
try to believe that they could be put to good purposes, such as
identifying code areas that are tricky for contributors to get right
(independent of contributor), or contributors that could benefit from
code reviews, etc.
--
There already is one: reputation with people working on the tree,
be it actively modifying/reviewing/bug hunting/etc. _We_ _already_ _know_;
generally one gets a decent idea of what to expect pretty soon.And frankly, that's the only thing that matters anyway; I suspect
I'd do rather well by proposed criteria, but you know what? I don't give
a flying f*ck through the rolling doughnut for self-appointed PHBs and
their idea of performance reviews.Think of it as a modified Turing test: convince me that you are
not a script piped through an Eng.Lit. wanker or an MBA, then I might care
for your opinion.Al, who never had problems with pointing fingers and laughing, but
likes an informed human brain to be the source of it...
--
Sigh. No, you already know. I don't. This is not a rhetorical point.
I've just bid out another project that'd involve getting linux running
on another embedded hardware platform. If that happens, I get to spend
paid time to work on the kernel, and as a by-product spend more time
looking at patches and code coming across the list.(Geez, conflate the issue much?) No one is saying you should. But
also, I haven't seen anyone saying it'd be used for performance<shrug> Shockingly enough, I actually don't care. I'm just trying to
scratch my own itch, which is figure out where in the kernel (if
anywhere!) it'd be best to donate my time.And your point is likely about the metrics, and yes, they'll be
computer generated. So? Perhaps they'll be crap. Who knows until we
look at them and match them up with what everyone already knows? If,
by some one in a thousand chance, they turn out to be good and useful,
then it'll either be a one-off eye-opener, or perhaps something useful
more than once.Who knows? And to the larger point, why put effort into stopping
<shrug> Shame and Guilt, two major motivators of human behavior, it's
true. But, one last time, *you're* the one saying the stats would be
used for finger pointing at people. Perhaps, instead, the stats will
show that we should all collectively point our fingers at some random
area in the tree, where everyone, despite their track record, ends up
making mistakes.Let the kid find out, that's all I'm saying.
--
|| If there are individuals at whom a finger needs to be pointed, this
|| system will highlight them, and fingers will (and should) be pointed.
|| Contributors of poor-quality code need to be weeded-out.Not really. Unless you are trying to imply that David is my sock puppet, that
is...
--
Ah, I failed reading comprehension, yet again. Well, sounds like you
have a beef to take up with David, then. That's still not an argument
against trying to gather statistics and to see if they're worthMomentarily amusing to think so, but no :-).
--
This especially is an area that I plan to focus on and should be very
reliable when finished. As can be read in my application, I plan to
look at how often a piece of code is changed, in what timespan and by
how many different authors.Thanks for the reply!
Cheers,
Sverre
--
At least with the data we have currently in git it's impossible to
figure that out automatically.E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11
(ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine
automatically that it is a bugfix, and the commit that introduced
the bug?You can always get some data, but if you want to get usable statistics
you need explicit tags in the commits, not some algorithm that triescu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
yes, and doing that would get back to the bureaucracy some people are
trying to reduce in order to save time to do the real work.However, in another project of mine, I've got used to systematically
indicate the type of change in the subject line. It does not get any
slower for the author, and it appears in shortlogs. And quite amazingly
the principle has immediately been adopted by several contributors :-----
Note to contributors: it's very handy when patches comes with a properly
formated subject. Try to put one of the following words between brackets
to indicate the importance of the patch followed by a short description:[MINOR] minor fix, very low risk of impact
[MEDIUM] medium risk, may cause unexpected regressions of low importance or
which may quickly be discovered
[MAJOR] major risk of hidden regression. This happens when I rearrange large
parts of code, when I play with timeouts, with variable
initializations, etc...
[BUG] fix for a minor or medium-level bug.
[CRITICAL] medium-term reliability or security is at risk, an upgrade is
absolutely required.
[RELEASE] release a new version
[BUILD] fix build issues. If you could build, no upgrade required.
[CLEANUP] code cleanup, silence of warnings, etc... theorically no impact
[TESTS] added regression testing configuration files or scripts
[DOC] documentation updates, no need to upgrade
[LICENSE] licensing updates (may impact distro packagers)Example: "[DOC] document options forwardfor to logasap"
-----Nothing is mandatory, and I (as the maintainer) can still choose to
adjust the prefix if I want. But in fact, I only had to to it when
contributors did not classify their patch themselves. Several other
tags may be added for LKML, such as "RFC" which is already used,
etc...The advantages of this usage are multiple. Nothing needs to be changed
in the tools, no header needs to be added, it's still very compatible
with the mailing-list usages ...
I don't quite agree, as I explained in my proposal there are several
ways to detect that a commit was a bugfix. From thereon you can deduct
that if it was a bugfix, that the commit that introduced the fixed
change was a bug! From thereon you can start sifting and get more
confirmations. Junio has made several suggestions as to how this could
be implemented and I'm confident that and algorithm can be devised
that is at least capable of 'guessing' what type a commit is. Aside
from the guessing part I think a lot of information can be gathered
from commit msgs.Of course, some commits might not be able to be typed (as there might
not be any 'follow up' information on them). Those commits can be
marked as 'unknown' and be ignored. Agreed, should all commits be
'unknown' then the command wouldn't be very useful, but especially on
large repos there is a very large dataset. As the size of the dataset
increases I estimate that the correlation between commits increases
(less commits that add new code which then is never changed
therafter). The higher the degree of correlation between individualWell, a dead giveaway would be:
As said above, I don't agree, you can 'guess' very reliably on a large
dataset. Also, most commits are already 'tagged' in some way or
another. The trick is to find the pattern in this tagging and use it.I hope this clears things up a bit,
Cheers,
Sverre Rabbelier
--
I hope you are aware of the non-technical implications if the results
don't match reality?E.g. I am proud that my commits do virtually never introduce bugs, so
any results someone publishes about what I do should better be rightcu
Adrian[1] my actual reaction might only be an angry email, but I hope you
get the point that wrong results can really piss off people--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
To avoid any misunderstandings:
This is not in any way meant against you personally.
But saying things like " X% of your commits introduced bugs" is not a
friendly thing, and wrong data could be quite hurting.Especially in the open source world where much motivation comes from
people being proud of their work.Even correct data can do harm.
And bad data can have really bad effects.
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
Yes, it could be, and I agree that conclusions shouldn't be based on
the details, but on the bigger picture. Also, I think it should (at
first) be used mainly as an indicator, of where attention might be
required. I mean, if it points out that one contributor almost always
commits buggy code, you don't have to present them with those
statistics right away. Instead you can ask the program where it bases
it's conclusions on, and research them yourself. If it does indeed
turn out that they are slacking that much you have good ground to haveYes, that is very true, I very much agree with that, but on the other
hand it might also point out contributors that are particularly
skillful in a certain section that was previously not noted. As with
all statistics, it's up to interpretation, misinterpreting statisticsTrue, both, but as said, if properly interpreted it could be very useful.
Cheers,
Sverre Rabbelier
--
Sorry, I was a bit overreacting since I see too often people putting
some data into some statistics or graph and drawing conclusins withoutI would assume that in all projects the main maintainers already have an
impression of how good the quality of the patches of each main
contributor is.Sooner or later someone will run the program for the Linux kernel,
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
On Wed, 16 Apr 2008 16:26:34 +0300
Well yes. One outcome of the project would be to tell us what changes we'd
need to make to our processes to make such data gathering more effective.Of course, we may not actually implement such changes. That would depend
upon how useful the output is to us.
--
The interesting (and answerable) questions are:
1) How many bugs one non-merge commit brings on average
2) What is average time between buggy commit entering Linus's tree and
fix entering the same tree.
3) Graphs of #1 and #2 over time.
4) rough division of bugs a-la refcounting, locking, hw, hw workaround.
5) if other OS have such statistics, comparison with them
(little finger for this)#1 alone can shred OSDL and LWN induced PDFs into innumerable pieces!
--
On Wed, 16 Apr 2008 12:02:47 -0700
also.. "what is a bugfix" is an interesting thing... for some things it's very easy.
For others.. it's really hard to draw a solid line where bugs stop and features start.
(for example, is a missing cpu id in oprofile a bugfix ("oprofile doesn't work") or
a feature ("new cpu support"). This one is one of the more simple ones even...)--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
That you can add this information through tags is clear, but according
to his SoC application that's not what he wants to do.According to his application he wants to determine automatically whether
a commit was a fix or whether a commit introduced a bug by doing stuff
like tracking whether a changed line was modified again shortly after a
commit.This plan of him will simply not result in accurate numbers.
Sure, you will get some numbers, but if anyone would e.g. wrongly accuse
me that 2% of my commits last year introduced bugs I would get
***really*** angry.cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
They won't be completely accurate, but who knows, maybe they'd turn out
to have a higher rate of accuracy than we'd expect. (I assume you could
do a closer manual study of a small random sample of the results toIt's just an experiment; reasonable people won't take it as the final
word.--b.
--
Take e.g. [1] as an example how git statistics about the Linux kernel
cu
Adrian[1] http://digitalvampire.org/blog/index.php/2008/04/11/lies-d-oh-forget-it/
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
On Wed, Apr 16, 2008 at 9:02 PM, Andrew Morton
I defenitly agree here, the command's reliability could be increased
by always specifying bugfixes in a certain way. 'fixed-bug:' forAh yes, free will and whatnot. Then again, everybody already does
'signed-off-by:', if there's an easy command in git to mark a bugfix,
it would increase the odds of people using it. Perhaps something like
'git commit -b 10256" which would then automagically append a
predefined message to the commit users would feel more inclined?Cheers,
Sverre Rabbelier
--
I've found quite a few errors in kernel-userland APIs, but I'm not
sure that this sort of negative statistic would be helpful -- e.g.,
more productive developers probably also introduce more errors.--
I'll likely only see replies if they are CCed to mtk.manpages at gmail dot com
--
We can already see which developers are more active. What we can't see
is who is careless, which would be useful to know. It would also be
useful to know who is careless in approving changes, because they share
responsibility for those changes. It would be a good thing if this
highlighted that some people are behind frequent buggy changes.
--
Well, even if someone introduces bugs relatively frequently, but then also
works with the reporters and fixes the bugs timely, it's about okay IMO.The real problem is when patch submitters don't care for their changes any
more once the patches have been merged.Thanks,
Rafael
--
This really is not okay. Even if bugs are fixed a version or two later,
the impact those bugs have on users makes the system look bad and drives
them away. We do not, I believe, want Linux to top the list for "most
bugs". It's unprofessional, unreliable and quite undesirable.
--
that's what -rc are for, and it's unprofessional to use them in production :-)
--
Exactly.
And BTW, by saying "timely" I meant "in -rc" or "before the next major release".
Thanks,
Rafael
--
timely frequently means the code was merged in -rc1/2 and was fixed before
the final release of the same version.given the huge variety of hardware and workloads, it's just too easy for
there to be cases where any trade-off you make (code size, performance,
memory usage, common case definitions) can turn around and bite you. In
addition frequently hardware doesn't work quite the way the design specs
say that it should (completely ignoring the fact that many drivers are
reverse engineered). what's most important is that when a case shows up it
gets addressed promptlyI'd rather have a developer/maintainer who introduces and fixed 100 bug,
but fixes them promptly, as opposed to one who only introduces one bug,
but refuses to consider fixing the code 'because they don't make mistakes
like that' (usadly a common attitude from people who produce very
good code much of the time)best of all is a developer/maintainer who writes very good code and is
willing to accept the fact that they make mistakes and fixes the code
promptly, but those people are extremely rare, and usually they emerge
from the pool of people who make more mistakes and fix them promptly,
which is an added reason I'm more tolerant of that group.David Lang
--
Having been a Linux user since the late 90's the problem I see is that
developers decide to re-design stuff that is already working and then things
that used to work don't work anymore.Libata is a good example. I had an older laptop that eventually got working
again - but the old ide stuff wasn't studied enough to find out what had to be
brought forward and supported in libata.Regards,
Steve
--"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety." (Ben Franklin)"The course of history shows that as a government grows, liberty
decreases." (Thomas Jefferson)--
And I'd rather be able to see that that person introduced 100 bugs than
to have no idea. As has been said before, the current situation rewards
people for sloppy work.
--
A common issue in the kernel is code who works with a wide
range of hardware and firmware with varying quality. The original
code is written to spec but then in the real world the hardware
and firmware has all kinds of interesting quirks not quite
matching the spec that need additional updates to handle. I don't think
it's fair to say in this case the original developer was sloppy.Then there is also code which is just hard to tune. Examples for this
are the CPU scheduler and the VM, but also other areas. They have to
handle a lot of different workloads with often subtle side effects.
Lots of people have put a lot of excellent work into tuning these
subsystems as users report issues with their workloads. Would you say
the original developers were sloppy? I don't think that would be a fair
description. Some problems are just hard and need many
iterations to get right. And then often also the requirements change over
time and need further updates.There are more such examples in kernel.
Grading programers is a hard problem and I don't think the software
industry has really solved it so far, even though there was a lot of
effort trying to do it over several decades. I doubt it will be solved
for the Linux kernel either.-Andi
--
From: James Morris <jmorris@namei.org>
Note the apparent irony in that the person who ends up often on the
top of those lists, Al Viro, is also someone who also does a
significant amount of code review.I think this is no accident.
--
On Mon, 14 Apr 2008 15:01:05 -0700 (PDT)
"who made the kernel" was an interesting and useful exercise, but if you
like irony then...- The way to boost your commit count is to submit buggy patches and to
then fix your own bugs.- The way to lower your commit count is to fix things in other people's
patches, then fold your fix into the base patch. I've lost over 1000
commits that way. Unless they are counting '^ [akpm' as a commit.
--
And if Dave speaks about these stats : http://lwn.net/Articles/237768/
then Al does not even appear in it, which proves your point.Willy
--
Stats such as those above, while useful, are flawed.
IMO James Morris has (probably more than anybody else) hit on the core
issue. To extend his view: theres more than just code review that
deserves respect. Testing is one. Commenting, not necessarily on code,
but on architecture is another. Documenting. Yes, running sparse or even
Lindent or checkpatch.
In the old/current Linux thinking (pun intended) work equates to
churning code. That thought process derives from Linus actually then
propagates down stream to other folks.
I think the Linus approach is still excellent - but its definition of
"work" is no longer valid. Work must include all these other things
and visible credit is important if the revolution is to continue.If you look at it from a software engineering or production resource
management, the Linux development model has gotta be one of the most
inefficient[1] - with a reward system geared to developers mostly.
If you want to look it from an investment of time (ROI perspective),
developers get way too much credit riding on everybody elses back.
Why should Mark Lord report another bug to us?
Put yourself in his shoes:
- he is a clever guy who has already worked around the bug. So a proper
fix is only a convinience for him.
- Blessed as he was - he got to do more and more work after reporting.
- he got slapped for claiming he had to go and get lunch and therefore
didnt have time to do more bisect for a bug that wasnt just unique to
his setup.
- he spent a gazillion electrons responding to people and justifying his
stance
- he got no credit for his time whatsoever when the bug was fixed (he
wont be showing up on lwn list).I think perspective and credit for peoples time needs to change.
cheers,
jamal[1] With current momentum, theres an infinite resources of developers
and testers and documenters in Linux, i.e
resource management is only valid as a metric if you had finite
resources. So the point i am making is moot - but I do strongly believe
the momentum ...
Swapping out bugzilla for something else wouldn't help. We'd end up with
lots of people ignoring a good bug tracking system just like they were
ignoring a bad one.(And I don't think developers and maintainers _should_ spend time mucking
in bug-tracking systems. They should have helpers who do all the
triaging/tracking/routing/closing work for them, and then provide other
developers with the results, letting them know what they should be spendingThat all sounds good and I expect few would disagree. But if it is to
happen, it clearly won't happen by itself, automatically. We will need to
force it upon ourselves and the means by which we will do that is process
changes. The thing which is being disparaged as "bureaucracy".The steps to be taken are:
a) agree that we have a problem
b) agree that we need to address it
c) identify the day-to-day work practices which will help address it (as
you have done)d) identify the process changes which will force us to adopt those practices
e) implement those process changes.
I have thus far failed to get us past step a).
--
I for one do not agree that we have a problem.
Based on actual data on oopses (which very clearly excludes other kinds of bugs, so I know I only see part of the story)
we are doing reasonably well. Lets look at the 2.6.25 cycle.
We got a total of roughly 2700 reports of oopses/warn_ons from users. (This may sound high to those of you only reading
lkml, but this includes automatically collected oopses from Fedora 9 beta testers).
Out of these 2700, the top 20 issues account for 75% of the total reports.Out of these 20 issues, 9 were from still out of tree drivers (wireless.git and drm.git included in F9). These were
caught before they even got close to mainline.
The remaining 11 issues can be split in
1) The ones we caught and fixed
2) TCP/IP warnings that DaveM and co are chasing down hard (but have trouble finding reproducers)
3) An EXT3 bug that in theory can cause data corruption, but in practice seems to happen after you yank out a USB stick
with an EXT3 filesystem on (so it can't corrupt the disk data). Ted is working on this
4) A bug (double free) that hits in the skb layer, probably caused by a bug in the ipv4 code
(a first analysis + potential patch was mailed to netdev this weekend)
5) sysfs "existing file added" warning, mostly in the USB stack
(gregkh claims he fixed this recently, I'm not entirely sure he got all cases)And when I look beyond the first 20, the same pattern arises, we fixed the majority of the issues before -rc9.
At position 25 we have less than 20 reports per bug. At position 35 we have less than 10 reports per bug.
At position 50 we have less than 5 reports per bug. Conclusion there: the bugs people actually hit fall of dramatically;
there's a core set of issues that gets hit a lot, the rest quickly gets reduced to noise levels.To me this does not sound like we have a huge quality problem because
1) The distribution of the bugs is such that there is a relatively small set of core issues
that are widely hit, and then there's a near e...
Well OK. But I don't think we can generalise from oops-causing bugs all
the way to all bugs. Very few bugs actually cause oopses, and oopses tend
to be the thing which developers will zoom in on and pay attention to.If we had metrics on "time goes backwards" or anything containing "ASUS",
things might be different.
--
Even oopses have pitfalls, like in 25-rcs where those WARN_ON TCP
backtraces were due to three different bugs (there might be fourth one
still remaining). ...kerneloops.org didn't even make difference between
different WARN_ONs in a function though that would have helped only little
in the case of 25-rc TCP because of different bugs causing failures in the
same invariant.--
i.--
On Mon, 14 Apr 2008 10:51:52 -0700
Sounds really like we need to add more strategic WARN_ON's and other diagnostics in
the kernel to track these issues down.Because another thing that I found so far is that what hits LKML is by far not representative
on what happens for users. The most obvious example was the whole input layer refcounting disaster
in 2.6.25-rc; this was about 1/3rd of TOTAL reports for a few weeks in a row, but there
was hardly an LKML posting for it (in fact there was only 1 half one).
We need diagnostics and stuff the kernel spits out so that automated tools can detect these,
otherwise we'll very likely not get good information on what is actually wrong with the kernel.In case you want to see the 2.6.25-rc data, the top 100 list is at
http://www.kerneloops.org/twentyfive.html(I'm still working on annotating the individual items, but since there's 100
that does take time)--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
Speaking as the one who was for a few years going again and again
through all open bugs in the kernel Bugzilla:The manpower problem isn't in handling the bugs in Bugzilla.
I'd claim that even if all bugs in the kernel would be reported in the
kernel Bugzilla I alone would be able to do all the handling of incoming
bugs, bug forwarding and doing all the cleanup stuff like asking
submitters whether a bug is still present in the latest kernel.The manpower problem is at the developers and maintainers who could
actually debug the problems.One problem are unmaintained areas.
Do we have anyone who would debug e.g. APM bugs?
And if I want to be really nasty, I'll ask whether we have anyone who
understands our floppy driver... ;)And who would debug problems with old and unmaintained drivers, e.g.
some old net or SCSI driver?Note that I do not blame James or Jeff or whoever else for the latter -
they might simply not have the time to spend a day or two for debugging
some obscure problem on some obscure hardware.And it could happen everywhere that maintainers simply don't have
the time to cope with all incoming bug reports.We have many people who write new bugs^Wcode.
But too few people who review code.
And too few people willing to maintain the existing code.cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
From: Andrew Morton <akpm@linux-foundation.org>
A lot of people, myself included, subconsciously don't want to
get past step a) because the resulting "bureaucracy" or whatever
you want to call it is perceived to undercut the very thing
that makes the Linux kernel fun to work on.It's still largely free form, loose, and flexible. And that's
a notable accomplishment considering how much things have changed.
That feeling is why I got involved in the first place, and I know
it's what gets other new people in and addicted too.Nobody is "forced" to do anything, and I notice you used the
word "force" in d) :-)And I realize this relaxed attitude goes hand in hand with reduced
quality and occaisionally more bugs. In many ways, I'm happy with
that tradeoff at least wrt. how that works out for the subsystems
I'm responsible for.We can ask more subsystem tree maintainers to run their trees more
strictly, review patches more closely, etc. But, be honest, good luck
getting that from the guys who do subsystem maintainence in their
spare time on the weekends. The remaining cases should know better,
or simply don't care.
--
OK, I was going to let this pass, but I changed my mind.
You carefully deleted my text so that you could misquote it, thereby
flagrantly misrepresenting everything I said.Forcing a discipline upon oneself is totally different from having it
forced upon you by someone else.Each step will need general agreement and buyin, otherwise none of it will
(or should) work.--
From: Andrew Morton <akpm@linux-foundation.org>
The "force" is to "us" which is a group.
And I imagine that newcomers will be expected to adopt these
"practices". So in effect, they will be "forced" into the process
changes as well.I'm getting more and more sensitive to issues on this level over time,
because I realize that the fundamental issue in all human group issues
is getting people to "want" to do things. And "force", in any form,
tends to be incompatible with "want". And in particular, people will
often even shun things they "want" when it is "forced" to them.--
Just wanted to add my 2c by mentioning my favorite example of
"virtual Tom Sawyering" as far as a tedious review process goes:
http://en.wikipedia.org/wiki/Knuth_reward_checkWhich is also quite cheap too -- AFAIK very few of those have ever
been cashed.Thanks,
Roman.--
I think Al's point was that we need far more "free form, loose and
flexible" work for reviewing code. As in people going over trees and
just checking it for anything suspicious and going over existing code
and checking it for anything suspicious and going also over mailing
list patch posts. And also maintainers who appreciate such review.And checking it for anything suspicious does not mean running
only checkpatch.pl or even just sparse, but actually reading it
and trying to make sense of it.I don't see that really as conflicting with your goals.
It would be some more work for the maintainers to handle more such
feedback because they would need to process comments from such "free
form reviewers". Some of them will undoutedly be wrong and that will
take some time away from processing features (and bugs) but I suspect
it would be still worth it.On the other hand it would also take some work away from
processing bugs, but as Andrew mentions earlier it looks
like significant parts of the boring areas of bug reports
(like getting basic information from reporter etc.)
could be "out-sourced" to bug masters.And I think being a bug master is an excellent way for someone who isn't
a great coder to contribute in excellent ways to Linux
(far more than someone e.g. running checkpatch.pl ever could)The challenging thing is also to make sure that the quality of
comments stays high. That means more focus on logic and functionality
than on form. If the reviewer just goes over the coding style or
trivialities I don't think that will improve Linux really. I think the
problem is often that people think kernel code must be very
complicated and they don't even dare try to understand it. But
frankly a lot of the kernel code is not really that complicated logic
wise and also doesn't need too specialized knowledge to understand.
So I am optimistic that there are a lot of people out there who would
be qualified to do some logic review.Really Linux needs a better "reviewing cultur...
If you really want to get more such review, then it would be very
useful when someone asks about some obtuse portion of kernel code
or makes a suggested improvement, that the reviewer then not be
flamed as being dense for not understanding the code or some kernel
coding concept. It would be much better to treat it as an oppurtunity
to educate rather than belittle, thus eventually enlarging the base
of people who can assist with various aspects of kernel development.
For what's supposed to be an open, engaging community, and which
generally is, there sometimes seems to be some level of dismissal
of newcomers (not sure it's intended that way but nevertheless it
can tend to discourage newcomers from getting more involved).-Bill
--
Actually my impression is that spare-time maitainer produce much better
code and subsystem trees than corporate-drones. But of course there's
a lot of shades between those two extremes.
--
PS: net/* is actually pretty sane in that respect - the huge volume
being what it is, of course, but still, my impression is that it's
pretty far from the worst sources of crap. OTOH, I might be missing
secondary tree problems - e.g. net/sctp is much worse off in that
respect, AFAICT; there might very well be more of such areas.
--
From: Andrew Morton <akpm@linux-foundation.org>
I think things are improving.
I wrote or merged in ~10 bugs in the last hour, for example.
And I also agree with Al's point, which was embedded in his humorous
and obviously sarcastic suggestions, in that adding beurocracy isn't
the answer. We already have too much and it scares developers away.Sure you don't want crap getting into the tree (for too long), but it
is important to be careful to define crap properly. For example,
inundating patch submitters with more requirements, especially ones
involving automatons like checkpatch, is in the end bad.We can improve the quality of stuff going in and be flexible at the
same time.
--
From: David Miller <davem@davemloft.net>
Bug fixes! I meant "fixes" I swear!
That's quite a Freudian slip if I ever saw one.
--
Errr... the synopisis of git-bisect contains the following:
git bisect start [<bad> [<good>...]] [--] [<paths>...]
so you can limit bisection to commits affecting specified subsystem.
P.S. Unfortunately git currently doesn't deal with directory renames,
so if there was sime big code restructuring one has to provide all
historic pathspecs.--
Jakub Narebski
Poland
ShadeHawk on #git
--
The (apparent) lack of test plans doesn't imply the patches not being tested,
actually.My experience indicates that they are tested in the majority of cases. Still,
sometimes they are not and that's when the most damage is done.Thanks,
Rafael
--
From: Andrew Morton <akpm@linux-foundation.org>
The ratio of bug reports to developers was significantly different
back then.I pine for the "good ole' days" of kernel development sometimes too,
but I try to be realistic and understand why things are different now.
--
IMHO we should try to make that difficult.
--
This bug is perfect example where bisect clearly was useful :-). Nobody
But it is ok for you to ask an innocent net developer to do that (even
with your terms as I hadn't signed off _anything_ related to that one),
hmm?...Sure I could use similar words, but you might use the not-mine
bug approach again to deflect... :-( ...No, I don't mind really :-).
I well understand that I occassionally end up chasing things
which are bugs that other people have caused, that's part of theNow that you have, as stated earlier, first looked the diffs (tcp*.c stuff
mainly I suppose?!?), and the bisected it and found the breaker, and even
patch is available already... Seriously, knowing all what's now available,
how could we have solved _this particular case_ without that very useful
help (bisect) from your side?Yes, I went through the commit list (maybe you did as well), I'm not sure
if Dave did as well. In addition, I checked a number of individual diffs
too but this just isn't something very obvious (I have to admit though
that I don't really understand all those namespace things, so I didn't
even know how to look them too carefully).--
i.
--
..
That's not demanding, that's quite relaxed. I had a good workaround,
and didn't really care any more at that point. Just though it was rather
odd that none of the developers seemed interested in tracking it down.
I offered tons of help, gave it, and said I didn't have time for a full
bisect at that juncture.For that, I get repeatedly slammed by the netdev folks.
Even after I put aside *paid* work to submit to your demands.Next time around, I won't bother reporting bugs to you folks,
that's for damned sure.Cheers
--
Actually that will be the best decision from evolutional point of view.
Bugs, which 'are thrown back to your face' like what you did with this
one, are useless. Developers already know, that bugs exist.If you do not care about bug, why do you ever bothered filling it?
You expected that anyone will start running to fix it for you.
You were wrong. Developers only fix bugs, which do not require
mind-reading and magnetic quantification of your brain.If you do not want to help fixing it, do not expect it will be fixed at
all. Sentence, that you will probably understand better: no one get paid
to fix it.No one get fun fixing something with description you provided
(not sure about David though, probably he has some masochistic
propensities doing that and trying to get some bits of information
from reportes for years).
You were suggested some simple checks, they did not help.
Developers can not remotely control electrons in your wires, so next
sugestion was bisecting, which ended up with some crap from your point.If you want bug got fixed, provide info and if it is not enough, help by
trying what you are being suggested, if you do not want, stop.--
Evgeniy Polyakov
--
=20
So I was right after all? Bug reports from people who (for whatever
reason, including having to earn their living) cannot do a bisect are
T.
From: Tilman Schmidt <tilman@imap.cc>
You need to qualify this with: when a bisect is asked of them
You seem to be quite eager to harp on this specitic point, to make it
seem as if a bug report is useless if the person cannot or will not
bisect in all cases. And that simply is not what we are saying here.
--
You got it wrong.
If bug is subtle and developers can not reproduce it, there are only two
ways out of the problem: to help developers or not to help.In the latter case bug report is useless (except that to show that it
exists, since practically no one can fix it until some new details
added).In the former case there is a discussion between developers and
reporters, so things have progress. In this particular case there were
no healthy discussion, that is why all this is about.Bisection was just an example of the help, reporter can provide. In this
case there were no other suggestions remotely useful or they were
already tried. If you can not proceed with what was suggested, then do
not piss anyone off because you were told to do something to help.If you go to the doctor because of aching throat and he asks you to
open a mouth, you will not blame him for asking you to do that.--
Evgeniy Polyakov
--
Looks like you're saying I was right after all. Useless bug reports
shouldn't be submitted.So please answer this simple question: If I know beforehand that I won't
have the time to do a bisect (or other similarly time-consuming task the
maintainers might ask from me), should I report the bug, or should I
keep my knowledge to myself?This question is not theoretical. It's a situation I find myself in
quite regularly, because I allow myself the luxury of building most rc
kernels and even the odd mm kernel just for fun even though I have a
daytime job and a family to feed. It would be quite easy to look the
other way if I encounter a problem in one of those, hoping someone else
with more time on his or her hands will also come across it and report
it. So far my conscience told me not to do that. But if reporting it
without being able to follow up on it is considered useless then my
conscience was apparently wrong. Just say the word, and I'll stop whatSure. It's not about bisection specifically, but about the time a
reporter is able to invest in addition to what went into the report
already. But bisection is is a good example, because it's the mostIf a polite "sorry, I don't have the time" already counts as pissing
off, the only choice left is to avoid the situation in which I'd have to
say that. IOW, don't report bugs if I don't have the time to followThat analogy is wrong on so many accounts. It is not my throat that's
aching. A doctor would not insult me for not wanting to open my mouth
but rather ask if there was perhaps a valid reason for that. Not to
mention that opening my mouth takes substantially less time than a Linux
kernel bisection ...A better analogy would be if I see an object lying on the highway, and I
stop at the next service area to call the police and alert them about
the possible danger. If they'd ask me to drive back to the place where I
saw it in order to describe precisely where it lay and what it looked
like, I think I might indeed become a b...
...No, useless bug reports don't lead to a solution, ie., that particular
bug won't get fixed as a result of the report! That's what these people
are trying to say. Sure the point of bug reports is to get the bugs fixed,
don't you think? :-/ ...Or do you thing it's only secondary to get themI'm asking the same thing from you as I did from Mark (it still remains
unanswered)... What's your suggestion, how should we have solved this
particular case? Do you join those that ask for developers to "invest"
time to repeatedly go through the commits that are not guilty? ...One
would never find the solution by that method :-/.Yes, I'm fine that you don't want to help (or would want but cannot help
like have been with many of nearly impossible to reproduce bugs with TCP
lately) but the sole consequence is that the bug remains unsolved, it's
plain simple. That's until somebody else is affected and reports and we
get the necessary information. Or alternatively somebody just reads the
offending code (possibly much later) and begins to wonder why there's this
particular thing missing there (this is in fact not related to the bug
reports at all, many bugs are found this way but it's not a thing onePlease reread the thread, this couldn't be farther from the truth...
...Dave had suggested Mark would have to bisect, I suppose this was after
founding out that there wasn't anything particular that should cause this
kind of behavior, or at least he couldn't find anything even suspicious
looking. Mark, with rather demanding tone, was _also_ asking for that
"somebody" who did all those TCP fin/closing changes (that would be me) to
be responsible over them, ie., those parts that Dave had checked and found
not suspicious (and bisect also proved them innocent later on). Yes, I
then went through that "mountain of commits" which Mark "was not willing
to do" himself. I invested the time even after Dave had also come to the
same conclusion as I again did, that there is nothing wron...
We want bug report, but we definitely do not want, when we ask for
additionl help, to listen crap that reporter does not have to help andIt is your throat, since doctor's one is ok, and no one else came with
the same problems. Doctor will not insult you if you will not piss himYeah, they first asked how it looked and where it was, then they asked
to move here, and you told them, that it is they who have to do that,
that it is exactly their problem, that you are not paid to do that.Did I miss something, yeah, probably part, where you then tell, that you
care about highways, so you moved there and did what was asked.
Although, no, you first tell to police, that you spend your paid time
and will teach them how to do thing or similar crap. Only few hours
later, to some other people, you will tell that you care about highways,
global warming, Uganda childs and adron collider danger.Ugh, just removing that object, when you were there 'takes
substantially less time than a Linux kernel bisection'.
Fortunately flooding developers with tons of urine during the whole day
is much more comfortable and ego-boosting, since it allows to close eyes
and do not see, that this took much more time than bisection.Hope you got it right: we want bug reports and help. If you do not want
to provide some help, do not expect bug will be fixed, although bug
eistence is significant sign. So, be cool, and everything will be ok.--
Evgeniy Polyakov
--
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
ROFL
--
..
Because I care, about Linux's reputation and performance.
I care about basic networking operations, and knew that this bug
would probably affect other applications once widely deployed.Where the hell did I *ever* say that?
I did nothing but offer help, and respond quickly.The one thing I did not have time for initially,
was a painstaking blunt instrument binary search of
every commit since v2.6.24.There are other ways to debug things and find the causes
quickly, with less impact upon the reporters of bugs.The current generation of kernel "code submitters" here
seems to have never learned those. Bummer.Cheers
--
The nice thing about binary search is that it's by definition an
O(log2(N)) operation, which isn't bad at all as far as algorithms go.The truly blunt instrument here would be a linear search of every commit...
Blah-blah-blah, you care so much, that pissed people off which suggested
you how to really help Linux. And then you returned with besiectOr I can ignore it, like the net developers, since I have a workaround.
And then we'll see what other apps are broken upon 2.6.25 final release.*Somebody* is responsible for those changes.
That particular *somebody* ought to volunteer some help here,
reducing the mountain of commits to a big handful or two.-----
If you do not know math, binary search takes log2(N), so you
would only need to check at most around dozen commits. That's lot
of time to run 'git bisect good/bad', especially for man, whoI know one. It is guessing.
I will start: did you start hearing voices after 2.6.24 upgrade?
Next time I will ask soothsayer, she really knows how to debug network
bug with following description: "it worked before I changed a kernel
version. you have to return my puppets back".--
Evgeniy Polyakov
--
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Every time I see someone play the "I care about Linux" card, they are
typically being a hypocrit. It's a knee jerk, defensive gesture, andThanks for your support Evgeniy, it is truly appreciated.
We had Mark's bug fixed in 15 minutes once the bisect result was
known, even after Ilpo and myself had scanned through the changesets.This proves the utility of bisect and in fact that trying to intuit
the cause by continuing to study changesets and code would have been a
complete waste of time.Yes, Mark, we used to do things that way for every bug in the kernel.
And as a result many bugs sat unfixed for weeks if not months. Many
of us have left the cave, feel free to join us.
--
Hi guys,
I've read quite a bunch of this thread, and I think there's some
misunderstanding between both parts, as well as inappropriate
expectations in both cases.We should be very careful about git-bisect. First, it does not necessarily
point to the bug, but to the commit which exhibits the bug, so simply
reverting the commit might just hide the bug again. I want to ensure that
people do not forget that it does not replace a brain, it enhances your
eyes by pointing to a change related to the problem.While it is a powerful tool, we must accept that it cannot efficiently
work in some circumstances, such as :- the machine cannot be rebooted often. I've been used to work for
customers who plan changes once a week, and change absolutely
nothing on their production if unplanned. This means one bisect
step per week. Often, those people even require that your changes
pass through a week of non-regression testing on a pre-production
system (which was my case), with no overlapping between changes,
so then you can count on one git-bisect iteration every two weeks.- the problem only happens in peak traffic hours on production, and
the loss of service has already gone far beyond the annual quota.
The only case they will accept an upgrade if you engage your full
responsibility that it will definitely fix the problem. I've already
been in such a situation, you say to the guy in front of you that
you're putting your balls on the table, it will work (and sometimes
you're only 90% confident). You obviously cannot do this to just
check if the current bisect exhibits the problem or not.- the reporter has very few spare time. I do have friends in this
situation. Basically, when your schedule is full of customers
visits one month ahead, it's very hard to find several consecutive
hours to track the problem down. Sometimes you're happy if you can
spend two hours on it in a week. BTW, many developers are also in
...
From: Willy Tarreau <w@1wt.eu>
Everyone is well aware of all of this, that's why I specifically asked
for a bisect, because I knew it would be crucial to pinpointing this
particular bug.And lo' and behold in 15 minutes after the bisect results were
available it got fixed.Yes it takes judgment, and nobody ever suggested that a revert
is the way to go. Git bisect results must be parsed by a human
brain. Nothing else was ever implied in any way shape or form.
--
I can't find the fix, btw. Can you please point me to it?
Thanks,
Rafael
--
there you go
http://lkml.org/lkml/2008/4/10/409-sergio
--
Thanks a lot,
Rafael
--
No problemo :)
--
Evgeniy Polyakov
--
I don't know that anyone has even collected packet captures.
--
I know that lack of developers is a problem and that the more users
(especially capable ones) help the better. Aparently I misunderstood
the post and read it as really saying "if users don't want to do the
work then we don't want the bug reports" and that's what made me
Again I can''t do anything but agree with you. You are right. When
it's possible to do the work this way everyone wins.
I was just trying to say that when it can't be done that way or the
user won't, then the bug report still has value and still deserves to
be taken seriously (although it probably goes lower in the pile than
the bugs where the end users actually do bisect or whatever).--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
--
From: "Jesper Juhl" <jesper.juhl@gmail.com>
Absolutely. If, for example, someone has a clean OOPS I usually won't
request a bisect, that's stupid.However if the OOPS is hard to diagnose, a bisect might be necessary
still.
--
I am an end user, I do not know precisely what bisecting means, but I
have spent some time on bug 8895, I suppose I have totally bisseced it,
but it seems that it has been lost.
It is clearly a bug and I am still patching every kernel to avoid the
fib6 crash, obviously I am the only one to get it.It is true that kernel developper's time is more important than user's
one, but staying modest and respectfull of the brainless bisesting
users is a must! Our time is just as yours, not extensible.Anyway, keep on the good kernel work, we all need it.
--
"bisect" refers to the "git bisect" command of the git tool.
For information on git, look here: http://git.or.cz/
For information on "git bisect" look here:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
and here: http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#using-b...Basically what bisect does is this; you tell it about your
last-known-good kernel version and your first-known-bad kernel
version. Then git finds all the changesets between those two version,
cuts the set in half and produces the kernel source matching the
middle point. You can then build and test that kernel and then you
tell git if it was good or bad - it'll then use that good/bad info to
cut the set of patches in half again etc etc until you eventually end
up with the exact changeset that caused your problem. It's very
powerful and can often narrow a problem down to a single commit, but
it does require that your problem is completely reproducible so that
you can reliably test it on each kernel 'git bisect' produces for you.[please don't top-post : http://www.catb.org/~esr/jargon/html/T/top-post.html ]
<...snip...>
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
--
From: vincent-perrier <vincent-perrier@club-internet.fr>
I remember this bug.
The analysis is incorrect and the patch adds new errors.
--
Even if the patch is not good, the line dst_free(&rt->u.dst);
when rt is still in tree leads to a crash, but when you do not
do the dst_free, when rt is in tree, then it may have hidden
other bugs, but at least I can keep working.I never said my patch was good, but it does the minimum to avoid my bug:
if (fn->leaf == NULL) {
bug_8895_clownix_provisional_workaround = 1;
fn->leaf = rt;
atomic_inc(&rt->rt6i_ref);
}
...ip6_fib.c, line 796:
if (!bug_8895_clownix_provisional_workaround)
dst_free(&rt->u.dst);That way at least it does not crash.
I cannot provide more than the line and the reason for the crash, I am
one of the numerous brainless users.--
From: vincent-perrier <vincent-perrier@club-internet.fr>
Date: Fri, 11 Apr 2008 01:32:14 +0200[ Please use netdev@vger.kernel.org so that this discussion
Now that the discussion has reached the mailing list, it won't die in
bugzilla like most such bugs do, and very likely will get fixed
quickly as a result.Thank you.
--
Thanks to you, and also to Jesper for the "git bisect" explanation,
you have powerfull tools, it is all for the best, millions of users
are relying on you!--
..
No, that's a one-way street, where the developers insist that the bug discover
do 100% of the work of finding/fixing the bug.A two-way street is when we help each other, where the developers
might point out some likely commits, and the bug discover takes
additional time to patch/retest around them.Years ago, Linus suggested that he opposed an in-kernel debugger
mainly because he preferred that we *think* more about the problems,
rather than just finding/fixing symptoms.This 100% reliance upon git-bisect is worse than that.
It has people now just tossing regressions into the code left and right,
knowing that they can toss all of the testing back at the poor folks
..I just happen to find/report a lot of regressions.
And every single damn one so far has taken days of my
time to track down, and in most cases fix by myself.All of them up until this one, have not been bisectable.
Yes, this one does seem to reproduce reliably now,
so it could be bisected if I wanted to take another day
out of my life for it.I did bisect one other issue back in 2.6.24 (or .23?),
and that took about 9 hours in total.I'd really like a hand or two with some of these bugs
at some point from the folks who keep breaking things.Cheers
--
Could you do a poor-man's bisect and test 2.6.25-rc1 and -rc2, that
would probably help a lot to narrow it down.Harvey
--
Seeing as how your Canadian.. any way to make sure it's not ISP (bell or
bell wholesale) / rogers traffic shaping ?Gerhard
--
Gerhard Mack<>< As a computer I find your faith in technology amusing.
--
..
Heh.. that's an idea our local ISPs got from the USA ISPs. :)
But it doesn't affect my internal networks, so, no.Cheers
--
I only brought it up because my http and ftp connections have started
randomly dropping but only during the eveneing and I'm still on 2.6.24.Gerhard
--
Gerhard Mack<>< As a computer I find your faith in technology amusing.
--
..
Recap, with more info:
The host system is running 2.6.25-rc8-git. It uses netkit-ftp to
send a file to the remote system. Using strace shows that the
entire file was read, and passed to write() for the outbound socket.The remote system is running linux-2.2.xx, and is reporting -EPIPE
from net/socket.c::sock_recvmsg() before all of the data has been received,
and thus ends up with a short file, missing data at the end.This exact sequence, with the exact same software,
works fine when the host system is NOT running 2.6.25-*,
(eg. 2.6.11 through 2.6.24 are fine).Something may be broken here.
--
..
It happened again, once, today.
That's three times over a week of heavy use.Worth tracking, I suppose, but not enough information to resolve it.
--
| Andrea Arcangeli | [PATCH 00 of 12] mmu notifier #v13 |
| Eric W. Biederman | Remaining straight forward kthread API conversions... |
| Eric Paris | Re: [malware-list] [RFC 0/5] [TALPA] Intro to a linux interface for on access scan... |
| Trond Myklebust | Re: Announce: Linux-next (Or Andrew's dream :-)) |
git: | |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| David Miller | [GIT]: Networking |
| Herbert Xu | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Alexey Dobriyan | [PATCH 04/33] Fix {ip,6}_route_me_harder() in netns |
