"Incidentally i was thinking about using KVM for automated testing. Important pieces of hardware should get an in-KVM simulator/emulator, that way developers who do not own that hardware can do functionality testing too," Ingo Molnar suggested during a thread discussing a SCSI driver bug fix. Linus Torvalds was originally unimpressed by the idea:
"Using emulators to test device drivers is almost certain to be pointless. The problem with device drivers tends to be timing issues, odd hardware interactions, and lots of strange (and sometimes undocumented) behaviour and dependencies (eg things like 'you have to wait 50us after setting the reset bit until the hardware has actually reset'). These are all things that you'd generally not catch in emulation - because the emulation by necessity is only going to be a very weak picture of the real thing."
Alan Cox countered, "for some things. I do it a bit because you can use it to fake failures that are tricky to do in the real world. It won't tell you the driver works but its surprisingly good for testing for races (forcing IRQ delivery at specific points), buggy hardware you don't posses, and things like media failures and timeouts your real hardware refuses to do." Linus acquiesced conditionally, "I do agree that you likely find bugs, even if quite often it's exactly because the behaviour is something that will never happen on real hardware," then acknowledged previous debugging efforts by Alan, "but failure testing is very useful - I forget who it was who debugged some driver by taking a CD and just scratching it mercilessly to induce read errors ;)" Ingo added, "something like that wont enable 100% coverage (or even reasonable coverage for most hardware), so it's no replacement for actual hard testing, but it could push out the domain of minimally tested code quite a bit and increase the quality of the kernel."
"Highlights include in-kernel pic/lapic/ioapic emulation, improved guest support, preemptibility, an improved x86 emulator, and a fair amount of cleanup.
"The changes outside drivers/kvm/ and include/linux/kvm*.h fix the CR8 mask definition (which is not otherwise used in the kernel) and expose some ioapic register definitions even if ioapic support is not compiled in. The diff is appended below."
A recent attempt to push some V4L/DVB updates for inclusion in the 2.6.24 Linux kernel met with some resistance. Linus Torvalds summarized the problems affecting the
em28xx video driver:
"I've talked to various people, and none of the main kernel people end up being at all interested in a kernel that has external dependencies on binary blobs for tuners.
"So right now it seems like while I would personally want to have more vendors supprt their own drivers, if that in this case means that we'd have to have user-space and unmaintainable binaries to tune the cards, everybody seems to hate that idea."
"Finally. Yeah, it got delayed, not because of any huge issues, but because of various bugfixes trickling in and causing me to reset my 'release clock' all the time. But it's out there now, and hopefully better for the wait," Linus Torvalds said announcing the 2.6.23 kernel. He noted few changes since the last release candidate, "not a whole lot of changes since -rc9, although there's a few updates to mips, sparc64 and blackfin in there. Ignoring those arch updates, there's basically a number of mostly one-liners (mostly in drivers, but there's some networking fixes and soem VFS/VM fixes there too)." Source level changes can be viewed via the gitweb interface. A nice overview of all changes can be found at Kernel Newbies. Linus went on to describe his plan going forward:
"I want this to be what people look at for a few days, but expect the x86 merge to go ahead after that. So far, all indications are still that it's going to be all smooth sailing, but hey, those indicators seem to always say that, and only after the fact do people notice any problems ;)"
When a Linux user reported a repeatedly high load average on an idle server, tracking the problem to a specific patch labeled, "user of the jiffies rounding code", Andrew Morton replied, "this is unexpected. High load average is due to either a task chewing a lot of CPU time or a task stuck in uninterruptible sleep." Linus Torvalds disagreed, explaining:
"We saw high loadaverages with the timer bogosity with 'gettimeofday()' and 'select()' not agreeing, so they would do things like
'date = time(..); select(.. , timeout = );'and when 'date' wasn't taking the jiffies offset into account, and thus mixing these kinds of different time sources, the select ended up returning immediately because they effectively used different clocks, and suddenly we had some applications chewing up 30% CPU time, because they were in a loop that *tried* to sleep."
Linus offered what he described as an "idiotic patch" to cause the load average to not be calculated exactly once every 5 seconds to prevent it from being in sync with something else waking up every 5 seconds, noting, "the load average is not calculated every tick, because that's not just expensive, but we also want to have some time-based decay." Arjan van de Ven pointed out that this shouldn't help, "I mean, the load gets only updated in actual timer interrupts... and on a tickless system there's very few of those around..... and usually at places round_jiffies() already put a timer on." Linus agreed with this reasoning, suggesting, "maybe Anders' problem stems partly from the fact that he really is using the tweaks to make that tickless theory more true than it tends to be on most systems?" Arjan pointed out that a lot of work has been successful in making tickless kernels wake up less, "we fixed a TON of stuff over the last months.. standard desktops (F8 / next Ubuntu) will be around 10 wakeups/sec, in a lab environment you can get below 2 ;)"
A bug report filed by Ingo Molnar regarding a procfs crash in the recently released 2.6.23-rc9 kernel was quickly tracked down by Linus Torvalds as a compiler bug. The bug was ultimately determined to be from a compiler optimization generated with an older version of GCC. Ingo was skeptical at first, "it's 4.0.2. Not the latest & greatest but I've been using it for 2 years and this would be the first time it miscompiles a 32-bit kernel out of tens of thousands of successful kernel bootups." Linus replied, "I am 100% sure. I can look at the disassembly, and point to the fact that your Oops happens on code that is simply totally bogus." He continued on to offer an interesting review of the crash, explaining line by line what should have been generated versus what actually was, causing the crash. In the end, Ingo switched to a distribution compiled GCC 4.1.2 and confirmed that the crash went away, "so you are completely right, it's a compiler bug in 4.0.2."
During the thread, Linus suggested that the optimization made by the compiler wasn't "legal", to which Alan Cox retorted, "pedant: valid. Almost all optimizations are legal, nobody has yet written laws about compilers. Sorry but I'm forever fixing misuse of the word 'illegal' in printks, docs and the like and it gets annoying after a bit." Linus playfully responded, "heh. When I'm ruler of the universe, it *will* be illegal. I'm just getting a bit ahead of myself." When asked how long until he expected to be ruler, Linus added, "I'm working on it, I'm working on it. I'm just as frustrated as you are. It turns out to be a non-trivial problem."
In a continuing discussion about the difference between pluggable security and pluggable schedulers, Linus Torvalds quoted himself:
"Another difference is that when it comes to schedulers, I feel like I actually can make an informed decision. Which means that I'm perfectly happy to just make that decision, and take the flak that I get for it. And I do (both decide, and get flak). That's my job."
He added, "which you seem to not have read or understood (neither did apparently anybody on slashdot)". Linus continued, "the arguments that 'servers' have a different profile than 'desktop' is pure and utter garbage, and is perpetuated by people who don't know what they are talking about." He then asked and answered his own question, "really: tell me what the difference is between 'desktop' and 'server' scheduling. There is absolutely *none*," going on to explain:
"Yes, there are differences in tuning, but those have nothing to do with the basic algorithm. They have to do with goals and trade-offs, and most of the time we should aim for those things to auto-tune (we do have the things in /proc/sys/kernel/, but I really hope very few people use them other than for testing or for some extreme benchmarking - at least I don't personally consider them meant primarily for 'production' use)."
"I said I was hoping that -rc8 was the last -rc, and I hate doing this, but we've had more changes since -rc8 than we had in -rc8. And while most of them are pretty trivial, I really couldn't face doing a 2.6.23 release and take the risk of some really stupid brown-paper-bag thing," Linus Torvalds said, announcing the 2.6.23-rc9 kernel. He added, "so there's a final -rc out there, and right now my plan is to make this series really short, and release 2.6.23 in a few days. So please do give it a last good testing, and holler about any issues you find!" He went on to warn developers that the first thing planned for 2.6.24 was to merge the unified x86 architecture:
"This is also a good time to warn about the fact that we're doing the x86 merge very soon (as in the next day or two) after 2.6.23 is out, so if you have pending patches for the next series that touch arch/i386 or x86-64, you should get in touch with Thomas Gleixner and Ingo Molnar, who are the keepers of the merge scripts, and will help you prepare..
"Doing it as early as possible in the 2.6.24-rc4 series (basically I'll do it first thing) will mean that we'll have the maximum amount of time to sort out any issues, and the thing is, Thomas and Ingo already have a tree ready to go, so people can check their work against that, and don't need to think that they have to do any fixups after it his *my* tree. It would be much better if everybody was just ready for it, and not taken by surprise."
Recent efforts to add additional details to
Documentation/CodingStyle prompted Linus Torvalds to reply, "I'm not very happy with this." He explained:
CodingStyle' should be about the big issues, not about details. Yes, we've messed that up over the years, but let's not continue that. In other words, I'd suggest *removing* lines from CodingStyle, not adding them. The file has already gone from a 'good general principles' to 'lots of stupid details'. Let's not make it worse."
Erez Zadok pointed out, "there's a lot of good value in having all those details, as it helps people standardize on more common practices, including details. I think removing all those details may only increase the amount-of less-accepted code to be posted, resulting in more lkml people having to repeat themselves on what not to do." He went on to ask, "would you prefer if CodingStyle was reorganized or even split into (1) general principles and (2) details? Perhaps we need a CodingStylePrinciples and a CodingStyleDetails?" Linus responded favorably, "I'm certainly ok with the split into two files. What I'm not ok with is really important stuff (indentation), and then mixing in silly rules (`parenthesis are bad in printk's`?)"
"I think the decision to merge Smack is something that needs to be considered in the wider context of overall security architecture," suggested James Morris following Andrew Morton's recent comment that he plans to merge the functionality in the upcoming 2.6.24 kernel. While James had no complaints about Smack itself, he expressed concerns regarding the pluggable nature of LSM, which is used by Smack, cautioning, "if LSM remains, security will never be a first class citizen of the kernel," adding, "on a broader scale, we'll miss the potential of Linux having a coherent, semantically strong security architecture." He noted that he'd rather see SELinux as the sole Linux security framework, "merging Smack, however, would lock the kernel into the LSM API. Presently, as SELinux is the only in-tree user, LSM can still be removed."
Linus Torvalds firmly stated, "LSM stays in. No ifs, buts, maybes or anything else." He explained, "you security people are insane. I'm tired of this 'only my version is correct' crap. The whole and only point of LSM was to get away from that." Linus continued, "I guess I have to merge AppArmor and SMACK just to get this *disease* off the table. You're acting like a string theorist, claiming that t here is no other viable theory out there. Stop it. It's been going on for too damn long." Stephen Smalley responded, "you argued against pluggable schedulers, right? Why is security different?" Linus explained:
"Schedulers can be objectively tested. There's this thing called 'performance', that can generally be quantified on a load basis.
"Yes, you can have crazy ideas in both schedulers and security. Yes, you can simplify both for a particular load. Yes, you can make mistakes in both. But the *discussion* on security seems to never get down to real numbers. So the difference between them is simple: one is 'hard science'. The other one is 'people wanking around with their opinions'."
"Ok, I think I'm getting close to releasing a real 2.6.23," began Linus Torvalds in his release announcement for the eighth release candidate of the upcoming 2.6.23 kernel. "Things seem to have calmed down, and I think Thomas Gleixner may have found the suspend/resume regression that has dogged us for a while, so I'm feeling happy about things." Linus continued:
"Of course, me feeling happy is usually immediately followed by some nasty person finding new problems, but I'll just ignore that and enjoy the feeling anyway, however fleeting it may be.
"The shortlog really is pretty short, and I'm appending the diffstat at the end too in case anybody cares, but basically it's just a number of fairly small but real fixes, with some support for a few new chips to the sky2 network driver.."
"It took me quite a while to realize the real root cause of the VAIO - and probably many other machines - suspend/resume regressions, which were unearthed by the dyntick / clockevents patches," Thomas Gleixner explained regarding two patches for fixing suspend issues that Andrew Morton experienced with his VAIO laptop. He continued, "we disable a lot of ACPI/BIOS functionality during suspend, but we keep the lower idle C-states functionality active across suspend/resume. It seems that this causes trouble with certain BIOSes, but I assume that the problem is more wide spread and just not surfacing due to the various scenarios in which a machine goes into suspend/resume." Thomas concluded, "I really hope that this two patches finally set an end to the 'jinxed VAIO heisenbug series', which started when we removed the periodic tick with the clockevents/dyntick patches."
Linus Torvalds expressed some concerns, "the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the '*why* does this happen?' question." He agreed that at that point there was a problem with ACPI, but cautioned that this could be triggered by another bug, "in particular, I also suspect that this may not really fix the problem - maybe it just makes the window sufficiently small that it no longer triggers. Because we don't necessarily understand what the real background for the problem is, I'm not sure we can say that it is solved." Linus concluded, "but hey, I think I'll apply the patches as-is. I'd just feel even better if we actually understood *why* doing the CPU Cx states is not something we can do around the suspend code!"
A short thread on the lkml discussed the lack of a
memzero function in the Linux kernel. Cyrill Gorcunov asked, "could anyone tell me why there is no official
memzero function (or macros) in the kernel?" Arjan van de Ven explained, "it doesn't add value....
memset with a constant 0 is just as fast (since the compiler knows it's 0) than any wrapper around it, and the syntax around it is otherwise the same." Linux creator Linus Torvalds went on to explain:
"The reason we have '
clear_page()' is not because the value we're writing is constant - that doesn't really help/change anything at all. We could have had a '
fill_page()' that sets the value to any random byte, it's just that zero is the only value that we really care about.
"So the reason we have '
clear_page()' is because the *size* and *alignment* is constant and known at compile time, and unlike the value you write, that actually matters. So '
memzero()' would never really make sense as anything but a syntactic wrapper around '
"sched_yield() is not - and should not be - about 'recalculating the position in the scheduler queue' like you do now in CFS," Linus Torvalds stated in a discussion with Completely Fair Scheduler author Ingo Molnar, pointing to the man pages to back up his argument that sched_yield should instead move a thread to the end of its queue, adding, "quite frankly, the current CFS behaviour simply looks buggy. It should simply not move it to the 'right place' in the rbtree. It should move it *last*."
Ingo described how it worked with the pre-2.6.23 scheduler, "the O(1) implementation of yield() was pretty arbitrary: it did not move it last on the same priority level - it only did it within the active array. So expired tasks (such as CPU hogs) would come _after_ a yield()-ing task." He went on to compare this to the new process scheduler , "so the yield() implementation was so much tied to the data structures of the O(1) scheduler that it was impossible to fully emulate it in CFS. In CFS we dont have a per-nice-level rbtree, so we cannot move it dead last within the same priority group - but we can move it dead last in the whole tree. (then they'd be put even after nice +19 tasks.) People might complain about _that_." He also noted that this would change the behavior for some desktop applications that call sched_yield(), "there will be lots of regression reports about lost interactivity during load."
"Ahoy me laddies (and beauties)," Linux creator Linus Torvalds began, announcing the seventh release candidate for the upcoming 2.6.23 kernel, "time for the traditional 'Talk Like a Pirate Day' kernel release!" He noted, "now, last year we had a full release (2.6.18 was immortalized on TLAP-2006), but this year I'm chickening out, and we're just doing what is hopefully going to be the last -rc release for the 2.6.23 series." Full source changes can be viewed via the gitweb interface. Linus also offered a brief summary of the changes:
"I'm not including the diffstat, because it got blown up by the resurrection of the sk98lin driver - because skge that is supposed to supplant it doesn't handle some of the hardware. Oh well.
"Apart from that, we had some mips, powerpc and xtense updates, and various driver tweaks. Things like the USB autosuspend revert should make people happier, and some more clockevents fixes should help suspend/restore on i386."