"Frankly I don't care about apparmor, I don't see it as a serious project. Smack is kind of neat but looks like a nicer way to specify selinux rules."
"In a nutshell, there is no safe way to unload an LSM. The modular interface is thus unecessary and broken infrastructure. It is used only by out-of-tree modules, which are often binary-only, illegal, abusive of the API and dangerous, e.g. silently re-vectoring SELinux," explained James Morris in an October 17'th commit message converting LSM to be a static interface. Andreas Gruenbacher countered, "LSM can be abused ... so what, this doesn't mean the interface is bad. Non-LSM loadable modules have been known to do lots of bad things, and yet nobody made them non-loadable either (yet)." Linus Torvalds explained that he was willing to unmerge the commit if a valid use for unloadable modules was demonstrated, "I repeat: we can undo that commit, but I will damn well not care one whit about yet another pointless security model flamewar."
Jan Engelhardt pointed to his multiadm security framework which provides multiple "root" users each with unique UIDs as an example of an LSM that benefits from supporting loading and unloading modules. "The use case is so that profs (taking the role of sub-admins), can operate on student's data/processes/etc. (quite often needed), but without having the full root privileges," Jan explained, adding, "this LSM basically grants extra rights unlike most other LSMs, which is why modprobe makes much more sense here.(It also does not have to do any security labelling that would require it to be loaded at boot time already.)" James acknowledged, "based on Linus' criteria, this appears to be a case for reverting the static LSM patch."
"Sysrq-p is pretty useless unless you can force the keyboard interrupt and the spinning process onto the same CPU," noted Chuck Ebbert during a discussion centered around debugging tasks stuck in a running state. Pressing the
<Alt><SysRq><p> key combination is used for debugging, dumping the registers and flags from the CPU that handles the keypress interrupt to the console. UltraSPARC maintainer, David Miller, replied, "yes, I find this a painful limitation too," adding:
"Sparc64 used to dump the registers on all active cpus for show_regs() via a cross-call, and this was incredibly useful. But I disabled that as soon as I started playing with Niagara because at 32 cpus and larger the output is just too voluminous to be useful."
David then suggested, "what might be appropriate is just to get a one-line program counter dump on every cpu via some new sysrq keystroke." Chuck noted that similar functionality is provided by a patch in the -mm kernel, "IIRC -mm had something like this but it was buggy because we were sending IPIs to each processor asking them to print their state. Maybe it would work if we had a way of making them dump their state to a memory location and then collected and printed it from the CPU that's handling the sysrq."
"The fact is, security people *are* insane. You just argue all the time, instead of doing anything productive. So please don't include me in the Cc on your insane arguments - instead do something productive and I'm interested."
"With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies," Arjan van de Ven reported, describing a "mass priority inversion" caused by, "an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is 'special' in that it does a lot of journal work". Finally, he offered a tiny patch to resolve the issue, "the patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x !"
Andrew Morton took a cautious stance, "seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to 'do more seeks to improve one test case'. Surely other testcases will worsen. What are they?" CFQ author Jens Axboe agreed, "It should not be merged as-is, instead I'll provide a function to do this." Ingo Molnar wasn't convinced, "atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this." He added, "if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks??" Andrew calmy replied, "I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them."
"It wouldn't be efficient for you to implement something new, only to have it criticized again. I'd suggest that you come up with a concrete design, describe to us what you propose to do and let's take it from there."
"[The] latest checkpatch.pl works really well on sched.c," commented Ingo Molnar, noting considerable improvements since the last release of the script. Andy Whitcroft recently released version 0.11 of the script, "this version brings a more cautious checkpatch.pl by default. The more subjective checks are only applied with the
--strict option. It also brings the usual slew of corrections for false positives."
Ingo noted one remaining false positive running the script on
sched.c, "I think it has been pointed out numerous times that it is perfectly fine to use curly braces for multi-line single-statement blocks. It's perfectly legitimate, in fact more robust. So if checkpatch.pl wants to make any noise about such constructs it should warn about the _lack_ of curly braces in every multi-line condition block _except_ the only safe single-line statement." Andy agreed, fixing the false positive and adding, "Indeed. We should probably do more on the indentation checks in general," continuing on to discuss some additional common coding mistakes that the next version of the perl script is likely to detect.
"Sometimes I'm tardy and miss things for weeks and need prodding, and sometimes I pull almost before you've sent the 'please pull' message. I'm unpredictable. Or keeping you on your toes. Or incompetent. Pick whatever suits your mood ;)"
Jeff Garzik posted a series of nine patchs to the lkml titled to "remove [the] 'irq' argument from all irq handlers", explaining, "the overwhelming majority of drivers do not ever bother with the 'irq' argument that is passed to each driver's irq handler. Of the minority of drivers that do use the arg, the majority of those have the irq number stored in their private-info structure somewhere." He noted that he had no intention to push the patches upstream anytime soon.
Feedback was entirely positive, with Thomas Gleixner suggesting, "Full ACK. We should do this right at the edge of -rc1. And let's do this right now in .24 and not drag it out for no good reason." Ingo Molnar concurred, "full ACK on the concept from me too. Please go ahead! :)" Eric Biederman noted that there was still work to be done, "the practical question is how do we make this change without breaking the drivers that use their irq argument." Jeff agreed, explaining why the code won't be pushed upstream during -rc1, "I am finding a ton of bugs in each get_irqfunc_irq() driver, so I would rather patiently sift through them, and push fixes and cleanups upstream. Once that effort is done, everything should be in the 'trivial' pile and not have the logic that you are worried about (and thus there would be no need to add an additional branch to the irq handling path)."
When asked how to best refer to kernels between official releases and release candidates, Linus Torvalds pointed to his automated git snapshots. "I still call them 'nightly snapshots', but they do in fact happen twice a day if there have been changes, so that's not technically correct," he noted. The latest snapshot is 2.6.23-git15, "this is an exact name, because you can go to kernel.org and look up the exact commit ID that was used to generate it (there's an 'ID' file associated with each snapshot there)." For git users, he suggested using the "
git describe" command to get the git name, with the current head being named v2.6.23-6562-g8add244. He went on to explain that the name "tells you three things: (a) it's based on 2.6.23 (b) there's been 6562 commits since 2.6.23 and (c) the top-of-tree abbreviated commit is '8add244'."
When asked about the previously discussed usage of "-rc0" and other similar proposed naming conventions, Linus replied:
"Please don't use those names. They don't actually tell anything about where in the cycle it is, and as you can see above, there's been 6500+ commits since 2.6.23, so saying '2.6.23-rc0' or similar really isn't very helpful if anybody actually cares about just where in the release cycle you are."
"Currently in mainline the balancing of multiple RT threads is quite broken. That is to say that a high priority thread that is scheduled on a CPU with a higher priority thread, may need to unnecessarily wait while it can easily run on another CPU that's running a lower priority thread," began Steven Rostedt, describing his patchset to introduce improved real time task balancing. He explained:
"Balancing (or migrating) tasks in general is an art. Lots of considerations must be taken into account. Cache lines, NUMA and more. This is true with general processes which expect high through put and migration can be done in batch. But when it comes to RT tasks, we really need to put them off to a CPU that they can run on as soon as possible. Even if it means a bit of cache line flushing. Right now an RT task can wait several milliseconds before it gets scheduled to run. And perhaps even longer. The migration thread is not fast enough to take care of RT tasks."
Steven described his test cases and numerous issues he noticed with the current balancing code, noting, "to solve this issue, I've changed the RT task balancing from a passive method (migration thread) to an active method. This new method is to actively push or pull RT tasks when they are woken up or scheduled."
"I'm trying to keep some external drivers up to date with the kernel, and the first two weeks after the release is the worst time for me. There is no way to distinguish the current git kernel from the latest release. It's only after rc1 is released that I can use the preprocessor to check LINUX_VERSION_CODE," explained Pavel Roskin, describing the ongoing effort to keep the out of tree MadWifi driver in sync with the latest released kernel. Rik Van Riel suggested:
"Consider this an incentive to submit your code for inclusion in the upstream kernel. Having all the common drivers integrated in the mainline kernel makes it much easier for users to use all their hardware, external drivers are not just a pain for the developers."
Pavel acknowledged, "the incentive has already worked for MadWifi, which has landed in the wireless-2.6 repository under the name 'ath5k'. Still, there is a lot of work to do, and some features won't appear in the kernel driver soon, partly because they rely on the chipset features that still need to be reverse engineered. " In response to Pavel's original question, Dave Jones noted that Fedora kernels treat the development between a major release and the first release candidate as "rc0".
In a brief follow up to the earlier pluggable security discussion, Thomas Fricaccia reflected on the implications for the various security frameworks, "I noticed James Morris' proposal to eliminate the LSM in favor of ordaining SELinux as THE security framework forever and amen, followed by the definitive decision by Linus that LSM would remain." He then commented on a recent merged patch preventing the loading of security modules into a running kernel, "but then I noticed that, while the LSM would remain in existence, it was being closed to out-of-tree security frameworks. Yikes! Since then, I've been following the rush to put SMACK, TOMOYO and AppArmor 'in-tree'." Linus Torvalds replied:
"Yeah, it did come up. Andrew, when he sent it on to me, said that the SuSE people were ok with it (AppArmor), but I'm with you - I applied it, but I'm also perfectly willing to unapply it if there actually are valid out-of-tree users that people push for not merging. So Í don't think this is settled in any way - please keep discussing, and bringing it up. I'm definitely not in the camp that thinks that LSM needs to be 'controlled', but on the other hand, I'm also not going to undo that commit unless there are good real arguments for undoing it (not just theoretical ones).
"For example, I do kind of see the point that a 'real' security model might want to be compiled-in, and not something you override from a module. Of course, I'm personally trying to not use any modules at all, so I'm just odd and contrary, so whatever.. Real usage scenarios with LSM modules, please speak up!"
The previous 2.4 Linux kernel maintainer, Marcelo Tossati, resurrected a discussion on adding support for out of memory notifications to the Linux kernel. He explained, "AIX contains the SIGDANGER signal to notify applications to free up some unused cached memory," then noting, "there have been a few discussions on implementing such an idea on Linux, but nothing concrete has been achieved." In a request for discussion, Marcelo added, "on the kernel side Rik suggested two notification points: 'about to swap' (for desktop scenarios) and 'about to OOM' (for embedded-like scenarios)." Rik van Riel explained:
"The first threshold - 'we are about to swap' - means the application frees memory that it can. Eg. free()d memory that glibc has not yet given back to the kernel, or JVM running the garbage collector, or ...
"The second threshold - 'we are out of memory' - means that the first approach has failed and the system needs to do something else. On an embedded system, I would expect some application to exit or maybe restart itself."
Evgeniy Polyakov announced a new version of his distributed storage subsystem, "this release includes [a] mirroring algorithm extension, which allows [the subsystem] to store [the] 'age' of the given node on the underlying media." He went on to explain why this was useful:
"In this case, if [a] failed node gets new media, which does not contain [the] correct 'age' (unique id assigned to the whole storage during initialization time), the whole node will be marked as dirty and eventually resynced.
"This allows [it] to have [a] completely transparent failure recovery - [the] failed node can be just turned off, its hardware fixed and then turned on. DST core will detect [the] connection reset and automatically reconnect when [the] node is ready and resync if needed without any special administrator's steps."