Adrian Bunk posted a list of known regressions in the latest 2.6.20-rc4 Linux kernel compared to the previous 2.6.19 stable release [story [1]]. In two emails, he listed six regressions that don't have fixes yet, and six regressions with fixes that haven't been merged yet.
In another email thread, Linux creator Linus Torvalds noted that his goal for 2.6.20 is to focus primarily on stability. He also noted that he intends to release the stable kernel at some point after linux.conf.au [2] which is happening this year in Sydney, Australia between January 15th and 20th. He explains, "hopefully 'final -rc' before LCA, but I'll do the actual 2.6.20 release afterwards. I don't want to have a merge window during LCA, as I and many others will all be out anyway. So it's much better to have LCA happen during the end of the stabilization phase when there's hopefully not a lot going on. (Of course, often at the end of the stabilization phase there is all the 'ok, what about regression XyZ?' panic)"
From: Adrian Bunk [email blocked]
To: Linus Torvalds [email blocked], Andrew Morton [email blocked]
Subject: 2.6.20-rc4: known unfixed regressions (v2)
Date: Tue, 9 Jan 2007 06:25:10 +0100
This email lists some known regressions in 2.6.20-rc4 compared to 2.6.19
that are not yet fixed in Linus' tree.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.
Due to the huge amount of recipients, please trim the Cc when answering.
Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
References : http://lkml.org/lkml/2007/1/7/117 [3]
Submitter : Malte Schröder [email blocked]
Status : unknown
Subject : BUG: at fs/inotify.c:172 set_dentry_child_flags()
References : http://bugzilla.kernel.org/show_bug.cgi?id=7785 [4]
Submitter : Cijoml Cijomlovic Cijomlov [email blocked]
Status : unknown
Subject : BUG: scheduling while atomic: hald-addon-stor/...
cdrom_{open,release,ioctl} in trace
References : http://lkml.org/lkml/2006/12/26/105 [5]
http://lkml.org/lkml/2006/12/29/22 [6]
http://lkml.org/lkml/2006/12/31/133 [7]
Submitter : Jon Smirl [email blocked]
Damien Wyart <damien.wyart@free.fr>
Aaron Sethman [email blocked]
Status : unknown
Subject : problems with CD burning
References : http://www.spinics.net/lists/linux-ide/msg06545.html [8]
Submitter : Uwe Bugla <uwe.bugla@gmx.de>
Status : unknown
Subject : USB keyboard unresponsive after some time
References : http://lkml.org/lkml/2006/12/25/35 [9]
http://lkml.org/lkml/2006/12/26/106 [10]
Submitter : Florin Iucha [email blocked]
Handled-By : Jiri Kosina [email blocked]
Status : problem is being debugged
Subject : Acer Extensa 3002 WLMi: 'shutdown -h now' reboots the system
References : http://lkml.org/lkml/2006/12/25/40 [11]
Submitter : Berthold Cogel [email blocked]-koeln.de>
Handled-By : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
Status : problem is being debugged
From: Linus Torvalds [email blocked]
Subject: Re: 2.6.20-rc4: known unfixed regressions (v2)
Date: Tue, 9 Jan 2007 09:58:19 -0800 (PST)
On Tue, 9 Jan 2007, Adrian Bunk wrote:
>
> Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
> References : http://lkml.org/lkml/2007/1/7/117 [12]
> Submitter : Malte Schröder [email blocked]
> Status : unknown
Adrian, this is also available as
http://lkml.org/lkml/2007/1/5/308 [13]
But, at worst, I don't think this is a show-stopper (oh, well: I actually
liked it better when "WARN_ON()" said _warning_, not BUG, since it
separates out the two cases visually much better, but others disagreed.
Crud).
It does show that something is wrong in reiserfs-land, although probably
not any worse than it ever was before, so in that sense this is not a
"regression", it's actually an _improvement_. Now it warns about reiserfs
trying to clear the dirty bit on a page cache that is still mapped (and
that _may_ be dirty in the page tables, although it almost certainly isn't
in practice).
That warning just didn't exist before.
Now, that said, the call stack is interestign:
BUG: at mm/truncate.c:60 cancel_dirty_page()
[<c0137371>] cancel_dirty_page+0x45/0x7b
[<df944b18>] reiserfs_cut_from_item+0x7cc/0x7fd [reiserfs]
[<c01e5eba>] __kfree_skb+0x9b/0xf7
[<df9316a0>] make_cpu_key+0x3f/0x46 [reiserfs]
[<df944efa>] reiserfs_do_truncate+0x3b1/0x515 [reiserfs]
[<df949901>] journal_begin+0x3f/0xd0 [reiserfs]
[<df9322fc>] reiserfs_truncate_file+0x1c1/0x2ad [reiserfs]
[<df938172>] reiserfs_file_release+0x35f/0x379 [reiserfs]
[<c013be42>] free_pgtables+0x70/0x7c
[<c01491f1>] __fput+0xa5/0x14d
[<c0146e7a>] filp_close+0x51/0x58
[<c0147de8>] sys_close+0x55/0x8a
[<c0102ab2>] sysenter_past_esp+0x5f/0x85
in that a final "sys_close()" that releases the file and causes it to be
truncated (which is apparently what is going on) should NOT have any
mappings of that file active any more!
If there are mappings active, the reiserfs_truncate_file() thing should
have been delayed until the mappins are gone!
So something interesting is definitely going on, but I don't know exactly
what it is. Why does reiserfs do the truncate as part of a close, if the
same inode is actually mapped somewhere else? And if it's a race with two
different CPU's (one doing a "munmap()" and the other doing a "close()",
then the unmap should _still_ have actually unmapped the pages before it
actually did _its_ "release()" call.
In general, a filesystem should never do a truncate at "release()" time
_anyway_. It should do it at "drop_inode" time.
So I think this does show some confusion in reiserfs, but it's not
anything new. The only new thing is that the _message_ happens.
So I don't personally consider this a regression. Just a sign of old and
preexisting confusion that is now uncovered by new code (and it will print
out the scary message at most four times, and then stop complaining about
it. So apart from the scary message, nothing new and bad has really
happened).
Linus
From: Malte Schröder [email blocked]
Subject: Re: 2.6.20-rc4: known unfixed regressions (v2)
Date: Tue, 9 Jan 2007 19:08:40 +0100
On Tuesday 09 January 2007 18:58, Linus Torvalds wrote:
> On Tue, 9 Jan 2007, Adrian Bunk wrote:
> > Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
> > References : http://lkml.org/lkml/2007/1/7/117 [14]
> > Submitter : Malte Schröder [email blocked]
> > Status : unknown
>
> Adrian, this is also available as
>
> http://lkml.org/lkml/2007/1/5/308 [15]
>
> But, at worst, I don't think this is a show-stopper (oh, well: I actually
> liked it better when "WARN_ON()" said _warning_, not BUG, since it
> separates out the two cases visually much better, but others disagreed.
> Crud).
--8<--
> So something interesting is definitely going on, but I don't know exactly
> what it is. Why does reiserfs do the truncate as part of a close, if the
> same inode is actually mapped somewhere else? And if it's a race with two
> different CPU's (one doing a "munmap()" and the other doing a "close()",
> then the unmap should _still_ have actually unmapped the pages before it
> actually did _its_ "release()" call.
This was on a single core. But with CONFIG_PREEMPT_VOLUNTARY=y.
It didn't happen again since then.
>
> In general, a filesystem should never do a truncate at "release()" time
> _anyway_. It should do it at "drop_inode" time.
>
> So I think this does show some confusion in reiserfs, but it's not
> anything new. The only new thing is that the _message_ happens.
>
> So I don't personally consider this a regression. Just a sign of old and
> preexisting confusion that is now uncovered by new code (and it will print
> out the scary message at most four times, and then stop complaining about
> it. So apart from the scary message, nothing new and bad has really
> happened).
I also didn't reboot the machine afterwards and did not notice any problems
beside that one message.
--
---------------------------------------
Malte Schröder
MalteSch@gmx.de [16]
ICQ# 68121508
---------------------------------------
From: Linus Torvalds [email blocked]
Subject: Re: 2.6.20-rc4: known unfixed regressions (v2)
Date: Tue, 9 Jan 2007 10:30:43 -0800 (PST)
On Tue, 9 Jan 2007, Malte Schröder wrote:
>
> > So something interesting is definitely going on, but I don't know exactly
> > what it is. Why does reiserfs do the truncate as part of a close, if the
> > same inode is actually mapped somewhere else? And if it's a race with two
> > different CPU's (one doing a "munmap()" and the other doing a "close()",
> > then the unmap should _still_ have actually unmapped the pages before it
> > actually did _its_ "release()" call.
>
> This was on a single core. But with CONFIG_PREEMPT_VOLUNTARY=y.
> It didn't happen again since then.
Yeah, PREEMPT would be able to show most races like this too. In fact,
some races show up much better with preemption than they do with real SMP.
But I haven't looked at what exactly reiserfs does. I did check that the
VM layer definitely does the remove_vma() stuff (that actually closes the
files) _after_ it has unmapped everything. It would have surprised me if
we had had that kind of bug, but still..
Linus
From: Adrian Bunk [email blocked]
Subject: 2.6.20-rc4: known regressions with patches (v2)
Date: Tue, 9 Jan 2007 06:51:01 +0100
This email lists some known regressions in 2.6.20-rc4 compared to 2.6.19
with patches available.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.
Due to the huge amount of recipients, please trim the Cc when answering.
Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (XFS)
References : http://lkml.org/lkml/2007/1/5/308 [17]
Submitter : Sami Farin [email blocked]
Handled-By : David Chinner [email blocked]
Patch : http://lkml.org/lkml/2007/1/7/201 [18]
Status : patch available
Subject : bluetooth oopses because of multiple kobject_add()
References : http://lkml.org/lkml/2007/1/2/101 [19]
Submitter : Pavel Machek [email blocked]
Handled-By : Marcel Holtmann [email blocked]
Patch : http://lkml.org/lkml/2007/1/2/147 [20]
Status : patch available
Subject : ftp: get or put stops during file-transfer
References : http://lkml.org/lkml/2006/12/16/174 [21]
Submitter : Komuro [email blocked]
Caused-By : YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc
Handled-By : Craig Schlenter [email blocked]
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Patch : http://lkml.org/lkml/2007/1/9/5 [22]
Status : patch available
Subject : nf_conntrack_netbios_ns.c causes Oops
References : http://lkml.org/lkml/2007/1/7/188 [23]
Submitter : Peter Osterlund [email blocked]
Caused-By : Patrick McHardy [email blocked]
commit 92703eee4ccde3c55ee067a89c373e8a51a8adf9
Handled-By : Patrick McHardy [email blocked]
Patch : http://lkml.org/lkml/2007/1/8/290 [24]
Status : patch available
Subject : forcedeth.c 0.59: problem with sideband managment
References : http://bugzilla.kernel.org/show_bug.cgi?id=7684 [25]
Submitter : Michael Reske [email blocked]
Handled-By : Ayaz Abdulla [email blocked]
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=7684 [26]
Status : patch available
Subject : nVidia CK804 chipset: not detecting HT MSI capabilities
References : http://lkml.org/lkml/2007/1/5/215 [27]
Submitter : Brice Goglin [email blocked]
Robert Hancock [email blocked]
Handled-By : Brice Goglin [email blocked]
Patch : http://lkml.org/lkml/2007/1/5/215 [28]
Status : patch available
Related Links:
- Archive of above thread [29]