This is a terrible assumption in general (i.e. if filesize % blocksize
is close to uniformly distributed). If you remove one byte and the data
is stored with blocksize B, then you either save zero bytes with
probability 1-1/B or you save B bytes with probability 1/B. The
expected number of bytes saved is B*1/B=1. Since expectation is linear,
if you remove x bytes, the expected number of bytes saved is x (even if
there is more than one byte removed per file).
In my tree, about half of the files have size >= 4k, so the assumption
is probably not _that_ far off the mark.
Alternatively, there are an average of about 16 bytes removed per file,
and there are 11 which are <= 16 bytes short of a 4k boundary, so it's
That's true.
--Andy
-