Re: A little coding style nugget of joy

Previous thread: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU by Frantisek Rysanek on Wednesday, September 19, 2007 - 7:58 am. (1 message)

Next thread: [PATCH] [UFS] fs/ufs/super.c misreads the file system state by Leonid Kalev on Wednesday, September 19, 2007 - 9:31 am. (2 messages)
From: Matt LaPlante
Date: Wednesday, September 19, 2007 - 9:34 am

Since everyone loves random statistics, here are a few gems to give you a break from your busy day:

Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
Bytes saved by removing said whitespace: 151809
Lines in the (unified) diff: 455437
Size of the diff: 15M
People brave enough to submit the patch: ~0

Take care. :)

-
Matt
-

From: Andi Kleen
Date: Wednesday, September 19, 2007 - 10:13 am

You don't actually save anything on disk on most file systems
(essentially everything except reiserfs on current Linux)
because all files are rounded to block size (normally 4K) 

Same in page cache.


Many kernel maintainers automatically remove trailing white space on any new 
lines these days. So as the kernel keeps changing it should eventually all
disappear; except on essentially dead code.

-Andi
-

From: Andy Lutomirski
Date: Wednesday, September 19, 2007 - 2:22 pm

This is a terrible assumption in general (i.e. if filesize % blocksize 
is close to uniformly distributed).  If you remove one byte and the data 
is stored with blocksize B, then you either save zero bytes with 
probability 1-1/B or you save B bytes with probability 1/B.  The 
expected number of bytes saved is B*1/B=1.  Since expectation is linear, 
if you remove x bytes, the expected number of bytes saved is x (even if 
there is more than one byte removed per file).

In my tree, about half of the files have size >= 4k, so the assumption 
is probably not _that_ far off the mark.

Alternatively, there are an average of about 16 bytes removed per file, 
and there are 11 which are <= 16 bytes short of a 4k boundary, so it's 

That's true.

--Andy
-

From: Andi Kleen
Date: Wednesday, September 19, 2007 - 2:30 pm

You didn't calculate the probability of actually saving a full block 
or not (that's the only thing that matters). I assumed it's relatively
small and can be ignored in practice since the amount of end white
space is negligible compared to total file size.

-Andi
-

From: Andrew Lutomirski
Date: Wednesday, September 19, 2007 - 2:39 pm

Sure I did.  It's roughly 1/B per byte removed ( = 1/4096 ).

--Andy
-

From: Pádraig Brady
Date: Thursday, September 20, 2007 - 2:20 am

It's gradually getting better so:
http://lwn.net/2001/1129/a/whitespace.php3
-

From: Robert P. J. Day
Date: Thursday, September 20, 2007 - 3:11 am

On Thu, 20 Sep 2007, P
From: Scott Preece
Date: Thursday, September 20, 2007 - 7:04 am

---

I think you're on to something here. If we stored the files with all
the non-meaningful whitespace (including non-meaningful newlines)
removed, not only would we save disk space, but we would also
eliminate significant amounts of developer time and LKML bandwidth
currently expended on arguing about formatting. Everybody could just
run things through indent with whatever formatting they preferred.
Might make diffs ugly, though...

scott
-- 
scott preece
-

Previous thread: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU by Frantisek Rysanek on Wednesday, September 19, 2007 - 7:58 am. (1 message)

Next thread: [PATCH] [UFS] fs/ufs/super.c misreads the file system state by Leonid Kalev on Wednesday, September 19, 2007 - 9:31 am. (2 messages)