The kernel newbies community often gets inquiries from CS students who need a project for their studies and would like to do something with the Linux kernel, but would also like their code to be useful to the community afterwards. In order to make it easier for them, I am trying to put together a page with projects that: - Are self contained enough that the students can implement the project by themselves, since that is often a university requirement. - Are self contained enough that Linux could merge the code (maybe with additional changes) after the student has been working on it for a few months. - Are large enough to qualify as a student project, luckily there is flexibility here since we get inquiries for anything from 6 week projects to 6 month projects. If you have ideas on what projects would be useful, please add them to this page (or email me): http://kernelnewbies.org/KernelProjects thanks, Rik -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
Thanks very much, Rik. I need this eagerly. I want to find a kernel project that can both be my graduation thesis and contribute to the Linux kernel community. I read that page and think your project--Swapout Clustering is interesting for me. Is it alright for me to work on it? And can you give some help? Thanks! -- May the Source Be With You. -
On Mon, 15 Oct 2007 18:40:34 +0800 You would be the third student to take on that project simultaneously. That is not a problem for me (on the contrary, it increases the chances of one codebase being likeable to Linus), but it does decrease the chances of your patch being the one to make it upstream. Still, it should be a fun project to implement and benchmark, so go ahead if you want. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
Hi Rik. In the kernel build area a few possible projects exists. Increase speed for a build with no updates ========================================== On a resonably fast machine with a decent config it takes roughly 10 seconds to do a make where nothing is updated. Generating one single Makefile is assumed to speed up things and will in addition allow a simpler syntax as what is used today for some of the uglier constructs. Contact: Sam Ravnborg <sam@ravnborg.org> Difficulty: 5 Language: Perl or C Increase speed for a build wich updates a single file ===================================================== We often edit a single file and then do a build. And the result is that we spend 80% of the time linking the kernel. So an obvious improvement for the kernel community would be to improve the speed of the linker (and decrease memory footprint). Contact: ? Difficulty: ? Language: C Update menuconfig to a modern ncurses look&feel =============================================== htop, aptitude, tig and other ncurses based programs has a more modern and effective look&feel than current menuconfig. Rip out all the lxdialog stuff and replace it with a ncurses based frontend that looks better and has more functionality. Contact: Sam Ravnborg <sam@ravnborg.org> Difficulty: 5 Language: C They are independent but challenging and would be very much appreciated by the kernel community. I could come up with more projects but these are the ones that are most straightforward to start with. Sam -
On Mon, 15 Oct 2007 16:23:52 +0200 Thank you Sam, I have added your project ideas to the page. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
Isn't make -j 2 or more implemented by running multiple make in sub-dirs ? Parallel make is more and more used even on cheap hardware. -- Phe -
Even now, make -j8 really pays off on bigiron AMD. -
make -j works fine with an unique Makefile, if that's the question. Xav -
The kernel build system supports parallel make and I guess all kernel developers use it. People tell me that a 32 way machine is quite good for kernel compilation. The bottleneck is that we spawn so many make instances and each have to read all the same makefiles and stat in total a zillion files for a simple kernel build. With a single Makefile we can run a single instance make where we read all files only once and stat the same file only once. Sam -
Maybe this:
Allow removal of select from Kconfig files
Difficulty: 4
Many config options depend on other options is unrelated submenus. As a
result, people have complained about not being able to select the
desired option because they finding all dependencies is too complicated.
Select solves this problem and creates a near-identical new one. Now it
is just as hard to turn some options _off_ as it was before to turn
others _on_.
The solution would be to have smarter tools that give the user
information roughly like this:
[ ] CONFIG_FOO
If you enable this option, you will also enable CONFIG_BAR.
Or :
[x] CONFIG_BAR
If you disable this option you will also disable CONFIG_FOO
and CONFIG_FOO2.
Difficulty is somewhat increased by the number of tools that require
such functionality. Support for xconfig and menuconfig appears to have
priority as those users have a harder time grepping the kernel.
Jörn
--
There is no worse hell than that provided by the regrets
for wasted opportunities.
-- Andre-Louis Moreau in Scarabouche
-
I'm also quite interested in what compsci students can do for the kernel project. I'm currently doing a little embedded development and research at school, but I and a few others would jump at the chance to work on the kernel (besides finding duplicate problems that the x86 merge is already taking care of, of course. ;) Also (as an aside), we're looking at redoing our operating systems curriculum out here at school...anyone aware of (relatively good) OS curricula? (time scope: one semester.) regards/thanks, -- Doug Whitesell CSU Channel Islands - Computer Science "Unprecedented performance: nothing we had has ever worked like this before..." -
How about this in the Device Mapper raid-1/mirror code? /* FIXME: add read balancing */ That comment has been in there for many releases. I've wanted read balancing for several servers and had all sorts of ideas about it, like adding functions to the underlying device queues to return a "queuing cost" to determine which is the best queue to add the read request. I think that could work better for queues like CFQ than the MD closest-head. An implementation would also need to be benchmarked against the MD raid-1. Along with the time to submit it to LKML, get it reviewed and polish it up, it might make a good student project. --=20 Zan Lynx <zlynx@acm.org>
On Mon, 15 Oct 2007 11:10:32 -0600 I've written down the basic description: http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing Could you add any ideas that you have to the page? It is a wiki, so anybody can edit the site (after creating an account). -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
another couple of raid enhancements would be: 1. teach the system that a raid456 stripe is handled most efficiantly if treated as a single block of data by this I mean that if you read one block from the stripe the system reads the entire stripe, so it should take this into account when doing read-ahead and not always throw away most of the data it read becouse it's outside the current readahead window (if nothing else, look at putting it on the tail of the LRU list instead of just forgetting it) if you write one block of the stripe the system must read the stripe, then update two blocks of the stripe (the data block and the parity block), but if you are going to write the entire stripe out you can ignore whatever's there and just calculate the parity block from the data you are writing. this should make writing to a raid456 stripe as fast as writing to a raid0 stripe (well, almost, you have one more block to write). 2. not directly a kernel project, create userspace tools that make managing raid and partitioning on linux as easy as the zfs tools 3. there is currently the ability to grow a raid56 array by adding a disk, but there is not the ability to take a raid5 array, add a disk and make the result a raid6 array. David Lang -
Is there already a make config option that will do a good job at setting a default .config file based on what is already running on a system? I get tiered of trimming down my .config for my laptop build so it takes less than 30min to build a kernel. Bonus credit to additional "expert" options (like those powertop puts out) for target uses, laptop, HPC, home file share, embedded targets.... Oh, and lets make the expert configs easily extensible. -
I have discussed this briefly with Kay Sievers. What udev can provide is the list of modules needed, so what the kernel need to provide is a simple module to CONFIG option(s) converter + a base config to start out with. Nothing particular difficult but needs a few days work to do. Sam -
could you explain better what you need? I think I've already such tools ;-) ciao cate -
base function: Starting from a stock distro (FC, Ubuntu, OpenSuSE...) and put down a kernel.org tree and automatically create a .config with all the drivers needed for the platform I'm building on. expert configs for different applications: laptop battery, vitalization, HPC, tiny, multi-media, testing --mgross -
Too easy. Since opensuse's udev loads most of the modules for your hardware, all that would be needed is to transform the lsmod list of modules plus the static options in /proc/config.gz (stuff like psmouse) back into kconfig options ;-) -
On Tue, 16 Oct 2007 22:09:04 +0200 (CEST) Well, at that point it does not know whether or not you occasionally plug in an ipod or a digital camera. Going back from the lsmod output to all the right CONFIG options is also not as trivial as it sounds, due to all the dependencies there are. This project sounds like it could be a great undergraduate project, maybe built on top of Ketchup to automatically fetch, configure, compile and install a working kernel :) Are there any volunteers to write down the project description on the kernelnewbies.org wiki? -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
Which is why building an allmod kernel (or what the distros do) is IMO the better solution. -
if all you want is a config that will work you are right. however if you want a good base for an optimized, minimal kernel it's not much help (other then possibly as a stepping stone to then examine all the modules that were loaded and document which ones are needed) David Lang -
you can ask the user to plugin all the different devices that they want to use when doing the config scan bonus points if you have both the ability to go from nothing to a config _and_ take an existing config and add any additional drivers needed for the current hardware -
As part of Linux Kernel Driver DataBase, yesterday I "solved" also this problem: From a module name, I can obtain relative the kernel configuration item. You can see the result in http://cateee.net/lkddb (grep '^drv module' drivers-db). I count 2570 such items. But I've some problems on few cases: sometime there is one module name with more CONFIG_s. Normally such cases happens in modules on the same directory, as support module or as parent module. I don't see a method to distinguish the right (minimal) configuration. One solution would be to remove some dependencies on Makefile, and checking and ev. creating such dependencies on Kconfig. But this require a kernel modification. Or you think there is a better (non-invasive) method? ciao cate -
but than you miss the UBS devices that you eventually plug in. Anyway, in attachment I send: a python script that will create the "mod" file. Call it with one argument: the kernel source directory. The second file "mod" is the output: it lists module with proper dependencies. BTW I'm restoring the autoconfiguration (but more hackerish that the old tante versions ;-) ) ciao cate
Ehh? You do it once, then leave it aside or in /proc/config.gz, on new kernel copy it back, "make oldconfig", answer several questions and here -
Ah yes, but then you buy a new system to which the old config does not apply. Folkert van Heusden -- www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of HP/UX en win een vlaai naar keuze ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com -
yeah I know that. Its a lot more than a few questions, and as we are talking about a linear search for a fully tweaked .config where each pass takes 30 min to know if things work this isn't how I want to spend my time. -
another config thing that would be nice would be to take something like Rob Landley's miniconfig tool and make it work well enough to be integrated (it creates a version of .config that only contains the things that need to be set, not everything that's at a default that doesn't make any difference) David Lang -
Hello, I read the messages about the company list and now this CS projects list and I was wondering if is there any similar list of labs/universities that host PhD projects related to the Linux kernel. I am thinking about switching from physics to CS and it would be really cool to work with the kernel. Thanks in advance, Guilherme -
You might take a look at proceedings for conferences with recent linux-related papers (linuxsymposium.org, usenix.org, linux.org.au, ?) and look for urls and presenters with .edu addresses. --b. -
How about a static code tool that will check for initialization races? yesterday I found a lurker bug in some of my code that wouldn't have been exposed had not tripped over it. I wrote some infrastructure code that initializes its lists and notification trees in late_init. Then I found out that there was as client of my infrastructure calling my register API at core_init time. It didn't crash / fail noticeably, but wasn't correct, because at that time I was using a static array. When I changed my code to use an array of pointers instead it went boom! (FWIW I've fixed this issue for now...) It made me feel uneasy how that issue got by un-noticed and I worry that there could be more like it. A tool to scan the code for boot up init calls and check for any callers into any module for entry before the module is fully initialized. --mgross -
On 10/15/2007 8:01 AM, Rik van Riel wrote: > The kernel newbies community often gets inquiries from CS students who > need a project for their studies and would like to do something with > the Linux kernel, but would also like their code to be useful to the > community afterwards. > > In order to make it easier for them, I am trying to put together a > page with projects that: > - Are self contained enough that the students can implement the > project by themselves, since that is often a university requirement. > - Are self contained enough that Linux could merge the code (maybe > with additional changes) after the student has been working on it > for a few months. > - Are large enough to qualify as a student project, luckily there is > flexibility here since we get inquiries for anything from 6 week > projects to 6 month projects. > > If you have ideas on what projects would be useful, please add them > to this page (or email me): > > http://kernelnewbies.org/KernelProjects Well, I know something that might be interesting for kernel newbies including students. So let me share it with you. It's Ubuntu 7.04 based LiveCD with TOMOYO Linux kernel. Directions: 1. visit the following URL and save ISO image http://tomoyo.sourceforge.jp/wiki-e/?TomoyoLive 2. burn CD/DVD and boot from the disc (or start up VM from the downloaded image) 3. open "TOOMYO Linux Policy Editor" icon on the gnome desktop 4. browse "domains" with cursor keys you can see how processes were created (great experience) 5. choose a domain and enter return key you can see the behavior of the selected "domain" (ACL mode) 6. enter return to step 5 (domain transition mode) (repeat 5-7 as you like, type 'r' to refresh screen) 7. enter q to quit the editpolicy program As it's LiveCD like KNOPPIX, hard disks will not be affected unless you mount them and operate with intention. I mean, it's safe to play with. What makes all of these possible is "TOMOYO ...
Hard stuff: * network character device -- similar to nbd, but for char devices. either figure out how to forward ioctls(), or implement usb-over-network, or... * openMosix -- they seem to have userspace solution, but not GPLed. * compression for ext4. Its about time someone did it right. Special bonus if you can do it in a way that it does not slow down. If cpu is free, compress, if it is busy, just write it straight to disk. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
So if I decide that the cpu is busy (because something is asking me to write the cpu is clearly doing something and hence busy), then I can skip compression and just write to disk. So by that definition ext4 already does compression. What a simple project. :) Did you mean it ought to come back and do the compression later? Is it possible that for some data compressing it and writing will take less time than not compressing it and writing it to disk? -- Len Sorensen -
Yes. Typically for all zeros. It will be similar for highly-compressible data (pictures, timetables, ....) root@amd:/data/tmp# time ( cat /dev/zero | head -c 100000000 > delme; sync ) 0.04user 0.48system 6.52 (0m6.521s) elapsed 7.97%CPU root@amd:/data/tmp# time ( cat /dev/zero | head -c 100000000 > delme; sync ) 0.05user 0.61system 6.33 (0m6.333s) elapsed 10.42%CPU root@amd:/data/tmp# time ( cat /dev/zero | head -c 100000000 | gzip - > delme; sync ) 1.57user 0.32system 1.74 (0m1.749s) elapsed 100.00%CPU root@amd:/data/tmp# time ( cat /dev/zero | head -c 100000000 | gzip - > delme; sync ) 1.61user 0.18system 1.65 (0m1.652s) elapsed 100.00%CPU Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
I don't have one. I graduated 7 years ago. I was just pointing holes If it doesn't it seems the compression feature is going to be rather unpredictable and my optimization would be perfectly within spec and That would make it tricky to say if you should ever skip compression due to cpu load. There is a chance cpu load would be better off by doing -- Len Sorensen -
And most executables. There's a reason why my vmlinux files are 11M and my IBM's AIX supported file system compression on the JFS filesystem years ago. I was able to get up to 30% throughput increases by converting the /usr filesystem to compressed - because even a 33mhz Power chipset could read in 5 512-byte blocks and decompress it to the original 4K faster than the disk could read in 8 512-byte blocks. Oh, and it worked for compression on r/w workloads as well - that was one of the ways to get a RS6K model 250 (which was a PowerPC601 chipset, a dead heat with a Mac 6600 (same chipset, same clock) to handle a million e-mail msgs/day - even /var/spool/mqueue worked better. Given that today there's an even *bigger* disparity in CPU speed versus disk speed, I'd be surprised if it doesn't help today too. As a first try, you might consider compressing each 4K filesystem block in-place, and only write as many sectors as the compressed takes (with the obvious fix for the pathological "grows with compression" case of "just write it without"). Probably even more wins can be found if you find a way to store the compressed chunks in a way that minimizes seeks, but that's a filesystem design issue and probably a too-large project (It's easy to do the stupid way - just store the whole file as compressed - the tough part is doing it and not making lseek() *too* painful. Trying to figure out where in a .gz file byte 65536... ouch. ;)
On Fri, 02 Nov 2007 23:08:23 -0400 The problem is that disk seek times have not gotten much faster over the years, while disk throughput rates have skyrocketed. Transferring a little less data is not going to help you when 80% of your disk time is spent seeking, not reading or writing. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
This sounds like flash based media are an ideal candidate for compression. No seek times to speak of, transfer rates that are lower than those of disks and limited capacity. I believe JFFS2 (a flash filesystem) allready does compression though. -
however, if you can manage to avoid seeks by packing more data onto each track (or each stripe of a raid array) you could probably see a significant win that's something for aspiring (and experianced) filesystem designers to struggle with for a while (especially trying to figure out what the size of a track or stripe is for the optimal layout) David Lang -
