From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds) Date: 28 Aug 92 23:05:26 GMT Hmm. I'd like to hear more about the problem - especially if you can pinpoint it more closely (ie having used 0.97, 0.97.pl1 and now pl2) to a specific patch. As most people said 0.97.pl1 was fast, I'm assuming it's specific to patch2, but I'd like to have some confirmation before I start looking into the problem. Whether or not a particular version is fast is probably dependent on how much memory you have. I ran a controlled series of tests by running 0.96c, 0.97, 0.97pl1, and 0.97pl2 on my 40 MhZ 386 machine (with 16meg memory), I noticed no appreciable difference in times: Ver. Time to compile the stock 0.97 kernel after doing a "make clean" 0.96c 9:35 (*) 9:33 9:34 0.97 10:21 (*) 9:41 0.97pl1 10:10 (*) 9:45 9:32 0.97pl2 10:36 (*) 9:25 9:30 10:11 (*) 9:41 All of these times were measured by doing (date;make;date) >& MAKELOG and then measuring the difference between the first and second time. The only processes that were running on the machine other than the compile was the X server and a single xterm. The (*) times indicate the first compile after a reboot; the (*) times are higher because the buffer cache hasn't been primed yet. So at least if you have a lot of memory, there is no appreciable difference between 0.96c, 0.97, 0.97p1, and 0.97p2. If I had to make a guess, I would guess that the problem happens on machines with less memory --- say, 4 or 8 megabytes, and I further guess that it might be related to the buffer changes. It could very well be that the poeple who said that 0.97pl1 was fast were running with a lot of memory. If it's patch2, the problem is probably the changed mm code: having different page tables for each process might be costlier than I thought. The old (pre-0.97.pl2) mm was very simple and efficient - TLB flushes happened reasonably seldom. With the new mm, the TLB gets flushed at every task-switch (not due to any explicit flushing code, but just because that's how the 386 does things when tasks have different cr3's). I don't think the TLB cache flush would be much of a problem. Consider: There are 32 entries in the TLB, and if you reference a page which is not in the TLB, you pay a penalty of between 0 and 5 cycles. So the maximum penalty you incur by flushing the TLB is 5x32 or 160 cycles. If you further assume the worst case that you are switching contexts every tick of the 100hz clock, then you will flushing the TLB 100 times a second, or taking a penalty of 16,000 cycles/second. On a 16MHz machine, there are 16 x 10**6 cycles/second. So the worst case extra time incurred by flushing the TLB is (16 x 10**3) / (16 x 10**6) == 10**-3, or an overhead of 0.1%. On a 40MHz machine, this overhead declines to 0.04%. Now, these times do assume that the page table/directories haven't gotten paged out to disk. Since each process must now have at least one page directory and two page tables (one for low memory and one for the stack segment in high memory), if you assume a 2 meg system has 8-9 processes running, 24 4k pages, or 10% of its user memory is being used to hold the page tables/directories. This has two effects; the first is to increase the memory usage, which may increase thrashing. The second is that if these pages get swapped out, the kernel will have to bring them in again the moment that process starts executing again, since the TLB cache will be empty. I can optimize things a bit - it's reasonably easy to fake away some of the TLB flushes by simply forcing the idle task to always use the same cr3 as the last task did (as the idle task runs only in kernel memory, and kernel memory is the same for all processes). So, I'd be interested to hear if this simple patch speeds linux up at all: Given my back of the envelope calculations above, I would be doubtful if this patch speeds up Linux by any appreciable amount. And any speed improvement will probably be taken up by the extra time to do the extra check in the scheduler. But this is only a theoretical guess; someone should probably gather experimental evidence to make sure. - Ted
| Andrew Morton | Re: 2.6.24-rc6-mm1 |
| Glauber de Oliveira Costa | [PATCH 8/19] modify write_ldt function |
| Steven Rostedt | Re: Major regression on hackbench with SLUB |
| Satyam Sharma | Re: 2.6.23-rc4-mm1 |
git: | |
| Chris Ortman | [FEATURE REQUEST] git-svn format-patch |
| Bill Lear | Meaning of "fatal: protocol error: bad line length character"? |
| Scott Chacon | Git Community Book |
| Catalin Marinas | Re: [ANNOUNCE] pg - A patch porcelain for GIT |
| Richard Stallman | Real men don't attack straw men |
| frantisek holop | nptd regression in 4.2 |
| Kevin | uvm_mapent_alloc: out of static map entries on 4.3 i386 |
| Vim Visual | GRAPE cluster supercomputer + OpenBSD |
| John Stoffel | Re: [PATCH] LogFS take three |
| hooanon05 | [PATCH 62/67] aufs magic sysrq handler |
| Chris Mason | Re: [PATCH][RFC] fast file mapping for loop |
| Chris Mason | Re: [ANNOUNCE] Btrfs v0.12 released |
