So yet another week, another -rc. This one should be the last one: we're
certainly not running out of regressions, but at the same time, at some
point I just have to pick some point, and on the whole the regressions
don't look _too_ scary. And -rc8 obviously does fix more of them.
Most of the changes since -rc7 are pretty small, and there aren't even a
whole lot of them. The shortlog (appended) is just a couple of pages, and
the diffstat is even smaller, but since the dirstat is a dense overview,
I'll just put that here instead:
4.6% arch/m32r/kernel/
5.7% arch/m32r/
9.5% arch/mips/pci/
10.4% arch/mips/
4.2% arch/x86/kernel/
4.4% arch/x86/
26.0% arch/
3.5% drivers/usb/storage/
10.4% drivers/usb/
3.6% drivers/watchdog/
23.8% drivers/
11.5% fs/xfs/
13.5% fs/
3.7% kernel/
9.8% net/9p/
10.6% net/
5.4% scripts/kconfig/
5.9% scripts/
7.4% sound/soc/codecs/
8.4% sound/soc/
10.1% sound/
and it's actually more spread out than usual. Arch and drivers are just
half of the patch even when combined.
Give it a try,
Linus
---
Adrian Bunk (5):
m32r: remove the unused NOHIGHMEM option
m32r: don't offer CONFIG_ISA
m32r: export empty_zero_page
m32r: export __ndelay
m32r/kernel/: cleanups
Adrian Hunter (2):
UBIFS: TNC / GC race fixes
UBIFS: remove incorrect assert
Akinobu Mita (2):
[WATCHDOG] ibmasr: remove unnecessary spin_unlock()
ibmasr: remove unnecessary spin_unlock()
Alan Cox (1):
pcmcia: Fix broken abuse of dev->driver_data
Alan Stern (2):
USB: unusual_devs addition for RockChip MP3 player
USB: revert recovery from transient errors
Alex Chiang (1):
[IA64] Ski simulator doesn't need check_sal_cache_flush
Alexander Beregalov (1):
UBIFS: fix printk format warnings
Alexander Duyck (1):
netdev: simple_tx_hash shouldn't hash inside fragments
Andrea Righi (1):
x86, oprofile: BUG schedu...Hi....
Dealing with my Aspire One setup, I found this (so obvious I don't send a patch:)
arch/x86/kernel/cpu/mtrr/main.c:
static int __init disable_mtrr_cleanup_setup(char *str)
{
if (enable_mtrr_cleanup != -1)
enable_mtrr_cleanup = 0;
return 0;
}
early_param("disable_mtrr_cleanup", disable_mtrr_cleanup_setup);
static int __init enable_mtrr_cleanup_setup(char *str)
{
if (enable_mtrr_cleanup != -1)
enable_mtrr_cleanup = 1;
return 0;
}
early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
^^^^^^
Nice ;)
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2009.0 (Cooker) for i586
Linux 2.6.25-jam18 (gcc 4.3.1 20080626 (GCC) #1 SMP
--heh. Could you send a patch with a changelog please? Ingo --
These options are also named inconsistently with all other options. The standard way to name an boolean option is "foo" versus "nofoo", in this case, "mtrrcleanup" vs "nomtrrcleanup". -hpa --
ok, we could change it... YH --
If we're fixing a typo anyway I'd suggest so. We know we're not breaking anyone's working setup... -hpa --
mtrr_cleanup and no_mtrr_cleanup? YH --
Dashes seem to be used more than underscores, so it probably should be "mtrr-cleanup" and "nomtrr-cleanup" if you want a separator. -hpa --
i need to document the mtrr_cleanup_debug too...change it to mtrrcleanup_debug ? just like initcall_debug? YH --
I would prefer "mtrr-cleanup-debug" if the main one is "mtrr-cleanup"; mixing dashes and underscores is a bit sick. Unfortunately we have had very few attempts at consistency with command line options... some in the early days were even StudlyCaps (yuck...) -hpa --
Here it goes...I hope its right.
==================
Correct typo for 'enable_mtrr_cleanup' early boot param name.
Signed-off-by: J.A. Magallon <jamagallon@ono.com>
diff -p -up linux/arch/x86/kernel/cpu/mtrr/main.c.orig linux/arch/x86/kernel/cpu/mtrr/main.c
--- linux/arch/x86/kernel/cpu/mtrr/main.c.orig 2008-09-30 09:57:46.000000000 +0200
+++ linux/arch/x86/kernel/cpu/mtrr/main.c 2008-09-30 09:57:55.000000000 +0200
@@ -834,7 +834,7 @@ static int __init enable_mtrr_cleanup_se
enable_mtrr_cleanup = 1;
return 0;
}
-early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
+early_param("enable_mtrr_cleanup", enable_mtrr_cleanup_setup);
struct var_mtrr_state {
unsigned long range_startk;
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2009.0 (Cooker) for i586
Linux 2.6.25-jam18 (gcc 4.3.1 20080626 (GCC) #1 SMP
--applied to tip/x86/urgent, thanks! Ingo --
Ingo, why did you require a patch? Was not it really more simple and easy for everyone to write it yourself? Since I am sure it was not only a laziness matter (really?), I am very curious to know the reason. Thank you, Domenico -----[ Domenico Andreoli, aka cavok --[ http://www.dandreoli.com/gpgkey.asc ---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50 --
I see two things : - preserve authorship of the code - "laziness" as you call it, is the only way to scale for a maintainer. Willy --
yeah, correct. Also, i asked (not required) J.A. Magallón whether he could send a patch - if he didnt (no time, etc.) i'd have fixed it myself (crediting him in the changelog). But it's also a general principle: maintainers dont 'own' the code in any way and there should be no assymetry in the ability to modify the code. So if people are willing to fix bugs they notice, i prefer that far more than me doing it. Ingo --
I think I got the lesson although the assymetry matter is still not that clear to me. Anyway I also know that when you talk about code you prefer patches to plain english so I expect you'd like others do the same ;) Thank you, Domenico -----[ Domenico Andreoli, aka cavok --[ http://www.dandreoli.com/gpgkey.asc ---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50 --
If 2.6.27 is released with e1000e driver corrupting EEPROM contents on many systems out there, rendering the cards unusable for most of the i-am-not-a-hacker users (and remember, even Dave Airlie bricked his laptop completely to death, when trying to restore eeprom contents), well, I personally find that very scary. Intel is working with us on tracking down and resolving the issue, but this is not going as well as one would like to see (one attempt, one card with completely hosed EEPROM contents ... and restoring the contents is not *that* trivial). Intel has some patches to mitigate the symptoms (even though we still don't know who is causing the breakage, but Xorg is the biggest suspect in my eyes), but they are neither in your tree nor in any other maintainer's queue yet, as far as I know. -- Jiri Kosina SUSE Labs --
What's the magic to trigger it? I've got a laptop with that e1000e chip in it, and am obviously running a recent kernel on it. Do people have a handle on it? Is it actually verified to be kernel-related, and not related to the X server etc? Linus --
So far it seems to be that you need 1) something close to xorg 7.4 and 2) 2.6.27-rcX kernel to trigger it. Not every system having e1000e is affected. Apparently it is some kind of race, as it usually takes multiple cycles to trigger (on one of our testing machines this took three attempts to trigger for the first time, and then after unbricking the machine and restarting testing, the reproduction tests have been running for several hours). It always seems to happen when X is probing/initializing the graphics card. So it really seems to be some badness in Xorg intel driver initialization code, and kernel/hardware allows bad things to happen. Last time I heard, our X developers are suspecting vbeinit initialization code in Intel driver and are looking into it. Also, we are going to release next opensuse/SLES beta with patches that should mitigate the problem (Jesse has posted a new version of them), so hopefully we will then receive some stacktraces from the users who are able to trigger the problem more easily. -- Jiri Kosina SUSE Labs --
And this e1000e must be ICH*, right? I.e. not a separate e1000e chip/card? -- Krzysztof Halasa --
So far all the affected systems I am aware of were ICH. -- Jiri Kosina SUSE Labs --
Ditto here, i.e. we have no similar reports on other parts. -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Jiri Kosina Sent: Tuesday, September 30, 2008 7:11 AM To: Krzysztof Halasa Cc: Linus Torvalds; Linux Kernel Mailing List; Brandeburg, Jesse Subject: Re: Linux 2.6.27-rc8 So far all the affected systems I am aware of were ICH. -- Jiri Kosina SUSE Labs -- --
my current status mail was posted earlier today to lkml from this address, since then we've had a local reproduction and are going for number two. The reproduction seems racy, i.e. it doesn't happen every time, so we put it in a loop doing detect, check eeprom, detect, etc, and we'll see if it fails. Reproduction seems to consistently be around X probing time, no firm leads yet. As for Intel we have keithp and jbarnes as well as arjan, auke, myself and a few others involved. We have some patches to lock the nvm down, we'll be posting those tonight and tomorrow, I also have some debug logic (and fixes) to help prove that we don't think it's a race in e1000e. -- Jesse --
Can we get the simple debug patches including the fixes which resulted from them pushed upstream ASAP ? Thanks, tglx --
On Tue, Sep 30, 2008 at 11:56 AM, Linus Torvalds If we had the magic we'd have fixed it by now, the current working theory is its X server related. This hasn't been proven, though my ATI GPU e1000e seems fine so it may have some legs. If it is X related then its both a kernel + X server issue, the e1000e driver opens the barn door, the X server drives the horses through it. Of course until someone produces a way to fix the hw after it breaks, reproducing this isn't something for the feint hearted. I'm hoping my laptop comes back today with a brand new motherboard in it. Dave. --
Are you sure? There was a mandriva report abou NVM corruption on an e100 too (that one apparently just caused PXE failure, the networking worked fine). So I wonder if it's _purely_ X-server-related, adn the reason people blame 2.6.27-rc1 is just timing of some X update and then people just look at the kernel beceuse the 'network card failed' looks so kernel-related. The reason I mention that is right now it looks like the distros are just running around disabling the e1000e module, or perhaps downgrading it. Which may not even work! The discussions in some of the bug-trackers seem to be full of people who have no actual information, but are perfectly willing to flail around wildly saying obviously crazy things. The Ubuntu people are some of the crazier ones (should I be surprised?), but that one also has Ben Collins claiming they use the same e1000e driver for the 2.6.26/27 kernels (from intels sf.net project). That may be bogus, but if true it would indicate that it's possibly not so kernel-related, or at least not so e1000e-driver-related. Linus --
That is very probably completely separate issue, and shoudl have been I think that not many peeople are suspecting bug in e1000e directly. Rather a combination of X bug, kernel allowing X to do bad things (for example the missing check in drivers/pci/pci-sysfs.c:pci_mmap_resource() looks particularly suspicious) and a "bug-friendly" hardware behavior. -- Jiri Kosina SUSE Labs --
Likely not, you are mentioning a patch for e1000, while the Mandriva bug report is about e100: https://qa.mandriva.com/show_bug.cgi?id=44192 See you, Eric --
Em Tue, 30 Sep 2008 09:58:56 +0200 Eric Piel <eric.piel@tremplin-utc.net> escreveu: | Jiri Kosina schreef: | > On Mon, 29 Sep 2008, Linus Torvalds wrote: | > | >>> If it is X related then its both a kernel + X server issue, the e1000e | >>> driver opens the barn door, the X server drives the horses through it. | >> Are you sure? There was a mandriva report abou NVM corruption on an e100 | >> too (that one apparently just caused PXE failure, the networking worked | >> fine). | > | > That is very probably completely separate issue, and shoudl have been | > fixed already by 78566fecb. | Likely not, you are mentioning a patch for e1000, while the Mandriva bug | report is about e100: | https://qa.mandriva.com/show_bug.cgi?id=44192 Yes, also the reporter has said that he has got the problem with -rc7 and this fix is available since -rc6. Jiri, doesn't e100 need that fix as well? Anyway, it is not clear for us whether this is a kernel problem. We could not reproduce it here and the reporter is now checking his network. -- Luiz Fernando N. Capitulino --
He finished checks and discovered the e100 issue was in reality a hardware problem in the switch being used that started to have problems now, coincidently with this e1000e issue getting more attention, after swapping the switch the problem stopped, so just a false alarm. I closed https://qa.mandriva.com/show_bug.cgi?id=44192 that was the original report. -- []'s Herton --
On Mon, 29 Sep 2008 19:21:02 -0700 (PDT) btw, we're also working on making some parts of the kernel more robust against certain types of bugs; for example the ioremap checks and sysfs resource checks. There's a set of checks and API changes we can do to make it less likely that drivers end up doing bad stuff; but that's obviously more for 2.6.28 than for .27 -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
On Tue, Sep 30, 2008 at 12:21 PM, Linus Torvalds Well from a purely empirical standpoint, I've been running new X against that laptop for a long time, and others have the same laptop, so I think its a problem with the e1000e driver putting the card into a state which allows X to do bad things. I think X maybe causing issues on other hw, like e100 and some realtek.. Also when we say X I think it looks like Intel driver interaction issues, as I said I'm running the same stuff on my ATI gpu laptop with e1000e and haven't had any problems. But I'm leaving this up to Intel, I don't think HP will take it too kindly if I keep returning my laptop. Dave. --
On Tue, 30 Sep 2008 11:59:58 +1000 we have a patch to save/restore now, in final testing stages (obviously we want to be really careful with this) Note that so far it seems to mostly hit with "new" distros, so both new kernel and new X... ;( -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Btw, the _real_ bug is clearly in the hardware design that allows you to brick those things without apparently even having a lock bit. I'm hoping Intel doesn't treat this as just a software bug. Some hw designer should be thinking hard about which orifice they put their head up in. It used to be that you could fry some monitors by feeding them out-of-range signals. The _monitors_ got fixed. Linus --
I am confident they will, because right now some more malicious virus writers will be thinking 'whoopeee party time'. --
The hardware has a lock bit, and we're trying to figure out why the BIOS writers guide doesn't say to set it. Probably because of the MAC address, We will post a patch to e1000e tomorrow that sets a lock bit that prevents the registers memory mapped by 0:19.0 BAR1 from causing flash write cycles. The patches I've just posted don't quite do that yet. --
Mostly. I think you can still do bad things to internal LCD's on at least some laptops. Although I hope I'm wrong. Linus --
You still can in some cases. You can also erase many video card firmwares, trash disks, brick DVD drives and the like fairly easily too but you do tend to have to try to be evil in these cases, not just get an address wrong. Alan --
unless there is news that I missed, the E1000 bricking bug is still out there. that is a particularly nasty one. --
