Re: [PATCH] init: bzip2 or lzma -compressed kernels and initrds

Previous thread: [PATCH] documentation: clean up extras and omissions in dontdiff by Alain Knaff on Saturday, September 6, 2008 - 2:22 pm. (1 message)

Next thread: RE: [PATCH] Ghost EDD devices in /sys again by H. Peter Anvin on Saturday, September 6, 2008 - 3:08 pm. (3 messages)
From: Alain Knaff
Date: Saturday, September 6, 2008 - 2:19 pm

From: Alain Knaff <alain@knaff.lu>

This patch, based on an idea by Christian Ludwig, includes support for
compressing the kernel with bzip2 or lzma rather than gzip. Both
compressors give smaller sizes than gzip.  Moreover, lzma's
decompresses faster than gzip.

It also supports ramdisks and initramfs compressed using these two
compressors.

The functionality has been successfully used for a couple of years by
the udpcast project

This version applies to kernel 2.6.26.3

Signed-off-by: Alain Knaff <alain@knaff.lu>

---

diff -purN -X linux-2.6.26.3udpcast/Documentation/dontdiff linux-2.6.26.3/arch/x86/boot/compressed/head_32.S linux-2.6.26.3udpcast/arch/x86/boot/compressed/head_32.S
--- linux-2.6.26.3/arch/x86/boot/compressed/head_32.S	2008-08-20 20:11:37.000000000 +0200
+++ linux-2.6.26.3udpcast/arch/x86/boot/compressed/head_32.S	2008-09-06 12:23:22.000000000 +0200
@@ -75,7 +75,11 @@ startup_32:
 
 	/* Replace the compressed data size with the uncompressed size */
 	subl input_len(%ebp), %ebx
+#ifdef CONFIG_KERNEL_LZMA
+	movl output_len_lzma(%ebp), %eax
+#else
 	movl output_len(%ebp), %eax
+#endif
 	addl %eax, %ebx
 	/* Add 8 bytes for every 32K input block */
 	shrl $12, %eax
@@ -135,7 +139,11 @@ relocated:
 /*
  * Do the decompression, and jump to the new kernel..
  */
+#ifdef CONFIG_KERNEL_LZMA
+	movl output_len_lzma(%ebx), %eax
+#else
 	movl output_len(%ebx), %eax
+#endif
 	pushl %eax
 	pushl %ebp	# output address
 	movl input_len(%ebx), %eax
diff -purN -X linux-2.6.26.3udpcast/Documentation/dontdiff linux-2.6.26.3/arch/x86/boot/compressed/head_64.S linux-2.6.26.3udpcast/arch/x86/boot/compressed/head_64.S
--- linux-2.6.26.3/arch/x86/boot/compressed/head_64.S	2008-08-20 20:11:37.000000000 +0200
+++ linux-2.6.26.3udpcast/arch/x86/boot/compressed/head_64.S	2008-09-06 21:27:59.000000000 +0200
@@ -89,7 +89,11 @@ startup_32:
 
 	/* Replace the compressed data size with the uncompressed size */
 	subl	input_len(%ebp), %ebx
+#ifdef ...
From: Leon Woestenberg
Date: Saturday, September 6, 2008 - 3:29 pm

Hello,

a small remark on the non-code parts:


This seems contradictionary information.

However, I welcome more compression options in kernel and filesystem
land, so I'm very interested in this patch.
Recently, on the filesystem side there seems to be some effort to
modularize the decompressors, instead of the use of #ifdef's.

The other architectures (especially used in embedded) need to hook in
on this, getting rid of the many out-of-tree patches for kernel/fs
decompression.

Regards,
-- 
Leon
--

From: Alain Knaff
Date: Saturday, September 6, 2008 - 3:59 pm

Oops, sorry for that. Actually the Kconfig text is correct. On
decompression, Lzma is faster than bzip2 but slower than gzip:

Compressor	Compression	Decompression	Size
gzip -9		1,01s		0,11s		833069
lzma -9		3,43s		0,24s		705125
bzip2 -9	2,88s		0,38s		777706

On compression, lzma is actually slowest of the 3, but that should be of
little concern, as this happens only once, whereas decompression happens
many times (on each boot).

So, overall lzma looks like the best (acceptable decompression speed,
best decompression ratio). I only included Bzip2 because it's much


Unfortunately, I didn't have any such machine available for testing, so
I just for Intel 32/64.

However, the changes in the Assembly part (head_32.S and head_64.S) are
trivial, so should be easy to port. The only change to these files is
the offset where the uncompressed size of the file may be found (4 bytes
from the end for gzip, and 5 from the start for lzma).

misc.c, where the bulk of the "architecture-specific" change is, is
actually not that architecture-specific, and could maybe be moved to a
common part? Diff'ing the boot/compressed/misc.c's of various
architectures shows (at first glance) mostly version differences: some
architectures get some changes/enhancements earlier than others. It's as
if the various architectures were stuck at some different points in the
past as compared to Intel...

most of the other files are architecture-independant anyways (the stuff
in lib/ and init/)

Alain
--

From: Rob Landley
Date: Saturday, September 6, 2008 - 11:17 pm

I vaguely recall that lzma requires more memory to decompress than bzip2 does, 
although I don't remember the details.  I know that bzip2 takes around 4 megs 
(although you need space for the decompressed kernel on _top_ of that, so you 
should be able to do it in about 8 megs total).  gunzip uses a 64k sliding 
window plus dictionary and the whole mess should fit in about 1/4 of a meg.

Actually, from what I've seen the main reason lzma doesn't get used for 
tarballs a lot is that whoever originally created it didn't include a 
fingerprint.  You can go "file tar.gz" or "file tar.bz2" and it can figure 
out by looking at the contents of the file what it _is_, but last I checked 
there's no obvious way to tell an lzma file from the output of /dev/urandom.  
This causes all sorts of small but annoying problems, and discourages its use 
a bit...

(Also, the decompressor's not so bad but the compressor was _painfully_ slow, 
last I checked.  It's been a couple years since I was really paying 

Eh, anybody can mess with arm, powerpc, and mips.

1) Install qemu 0.9.1.

2) Go to http://landley.net/code/firmware/downloads

In the binaries/system-image directory are tarballs of prebuilt systems with 
zImage files of kernels qemu can boot, and ext2 image files of uClibc/busybox 
root filesystems.  There are also ./run-emulator.sh scripts that will boot 
the above under qemu 0.9.1 (giving you a shell prompt).

If you'd like to replace the zImage files with ones you build yourself, you'll 
need to either grab the source tarballs at the top level and run the whole 
build yourself (try "./build.sh armv4l" for example; you can run it with no 
arguments to see available targets) or grab the appropriate tarball out of 
binaries/cross-compiler/host-x86_64, extract it, add its "bin" subdirectory 
to your $PATH, and then build the kernel via something like:

  make CROSS_COMPILE=armv4l- ARCH=arm

With the appropriate .config of course (look at the defconfig files under 
arch/arm/config or ...
From: H. Peter Anvin
Date: Monday, September 8, 2008 - 1:35 am

Both 7zip and LZMA-Utils have serious file format problems.  The author 
of LZMA-Utils is working on a new format, which is likely to be widely 
adopted once it materializes.

	-hpa
--

From: Jörn
Date: Monday, September 8, 2008 - 6:14 am

Less, actually.  Iirc gzip takes about 280k for compression and
somewhere below 100k for decompression with the kernel runtime zlib.
The various copies that unpack kernels at boottime may be worse - they
are certainly rather old copies of zlib and haven't seen much
maintenance since.

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979
--

From: H. Peter Anvin
Date: Saturday, September 6, 2008 - 8:18 pm

Could you produce a version against the -tip tree, either the master 
branch or the x86/setup branch?

	-hpa
--

From: Steven Noonan
Date: Saturday, September 6, 2008 - 9:35 pm

I second this.

I tested the existing patch, and it works great when applied to
2.6.26. It'd be great to see a version of this either go into -mm or
mainline sometime soon.

- Steven
--

From: H. Peter Anvin
Date: Saturday, September 6, 2008 - 9:46 pm

I should warn that I looked over the patch, and it looks quite messy. 
It may need structural cleanup before integration.

	-hpa
--

From: Alain Knaff
Date: Sunday, September 7, 2008 - 12:40 am

Could you elaborate please?

Alain

--

From: H. Peter Anvin
Date: Sunday, September 7, 2008 - 9:30 am

At a brief glance, there is an awful lot of #ifdefs in that code, a lot 
of which look unnecessary.  #ifdefs make the code hard to read, and 
usually signify that additional abstractions are called for.

	-hpa
--

From: Alain Knaff
Date: Sunday, September 7, 2008 - 12:39 am

Thanks for the note... but where/how can I download the -tip tree from?

Thanks

Alain

--

From: Alain Knaff
Date: Sunday, September 7, 2008 - 2:17 am

Thanks, this works

Alain

--

From: Willy Tarreau
Date: Saturday, September 6, 2008 - 10:48 pm

Hi Alain,


So it must have evolved a lot, because last time I considered it
(about 2 years ago), it was the opposite, it needed about 2 minutes
to decompress a kernel on a small machine (650 MHz). It was a shame

You should not change the messages above, it does not make sense.
What struck me is "unlzmaing" which is almost an unparsable word.
If you really want to indicate the algo, better add it in parenthesis :

Does it mean it will always need at least 4 MB for decompressing ? If
so, you should explicitly state it in the config help so that persons
running embedded systems (the more tempted ones by LZMA/BZIP2) do not

Are you sure you will never run into trouble with such defines ? I don't
see the reason for the #ifndef, maybe you'd prefer #undef to force their

I'm wondering if it's really worth having all those #ifdef/#endif.
After all, if you leave it to the caller to check ztype depending
on what is enabled, it would even be better, because the correct
image type will have been identified and indicated before being
refused in case of mismatch. This is helpful for people who will
use more than one format (eg: all those who will slowly migrate

Just a suggestion : create an array filled with pointers to the
various decompression methods, or NULL when not defined. Here,
you'd just do :
     if (crd_loader[ztype]) {
        if (cdr_loader[ztype](in_fd, out_fd) == 0)
           goto successful_load;
     } else {
        printk(KERN_ERR "No decompressor for image type %d\n", ztype);

This becomes a bit more horrible... It would be nice if you could get






It would be the only command in the build process that requires perl. While
not a disaster, it's a bit annoying to introduce a new build dependency.

You may want to experiment with command-line tools such as stat and printf,
I have one horrible version working at the command line below, I have
"makefilified" it but not tested it :

size_append = stat -c "%s"
cmd_bzip2 = (bzip2 -9 < $< ; x=$(size_append) ...
From: Alain Knaff
Date: Sunday, September 7, 2008 - 1:59 am

See my other mail. This was a typo, lzma decompresses faster than
_bzip_, but not _gzip_.  Sorry for the confusion. Nowadays lzma is about

It does make sense for the clumsy who forget to actually copy their
kernel to where it will be booted from. These messages make it obvious


It's complicated... Lzma's memory needs depend on the file being
decompressed. The first byte tells how much is needed. Here is the
relevant info from decompress_unlzma.c:

...
#define LZMA_BASE_SIZE 1846
#define LZMA_LIT_SIZE 768
...
	uint16_t *p;
...
	mi = header.pos / 9;
	lc = header.pos % 9;
	pb = mi / 5;
	lp = mi % 5;
...
	num_probs = LZMA_BASE_SIZE + (LZMA_LIT_SIZE << (lc + lp));
	p = large_malloc(num_probs * sizeof(*p));

Where header.pos is the first byte of the file.

In most cases of kernel compression that I saw, even with lzma -9, that
first byte was 0x5d = 93.
So lc = 3 and lp = 0.
Meaning a malloc of (1846 + 768 << 3) * 2 = 15980 which would even fit
into the default heap of 16384 bytes.

However, in a theoretical worst case, this could balloon up to lc=8 lp=4:

(1846 + 768 << 12) * 2 = 6295148, i.e. more than 6MB.

bzip2, on the other hand, needs a constant space of a little bit more
than 40000 bytes.

The best move would probably be sure to move the heap at end of memory
(rather than in the middle of bss segment where it is now) and not
specify any size at all.


... but now that I think about it, I've actually got 0x5d hardwired as a
magic number for the initrd (see below), and in all the years since I
use this in udpcast, it never happened that it was anything else (or
else it would have failed to boot, as initrd would not have recognized
the magic number). So 15980 bytes should be ok in almost all cases => no


This is done to make the file work in both contexts:
 - initrd/initramfs where everything is linked, so functions should be
extern
 - boot/compressed/misc.c where everything (including code) is
#include'd where functions should be static.

... but ...
From: Rob Landley
Date: Sunday, September 14, 2008 - 6:37 pm

Last I checked it was more.  (I very vaguely recall somebody saying 16 megs 
working space back when this was first submitted to busybox, but that was a 
few years ago...)

A quick Google found a page that benchmarks them.  Apparently it depends 
heavily on which compression option you use:

http://tukaani.org/lzma/benchmarks

Something compressed with lzma -6 takes 5 megabytes to decompress, -7 takes 9 
megs, -8 takes 17 megs, and -9 takes 33.  (Plus your source and target 
buffers for in-memory compression.)

So decompressing anything more than an "allnoconfig" kernel with lzma -6 
probably wouldn't quite fit in 8 megs (800k source data, 2.5 megs destination 
data, 5 megs working space... boing.).  You'd need to bump up to 12 or so.  

While I despise requiring perl to build the kernel, I'd like to point out that 
H. Peter Anvin introduced Perl as a build requirement for 2.6.25:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bdc807...

Before that perl was only required for optional things like bloat-o-meter, not 
to actually _build_ the kernel.

My argument about it with him and Sam Ravnborg back in February was at:
http://lkml.org/lkml/2008/2/15/541

And their position was "perl is inevitable, go ahead and add python too if you 
like".  Apparently your build environment can have an infinite number of 
requirements, including things like perl where the implementation is the 
spec.  (There is no perl spec, there's only a perl implementation.  Python at 
least has a java implementation...)

My updated patch to remove the dependency from 2.6.26 was here:
http://www.mail-archive.com/linux-embedded%40vger.kernel.org/msg00748.html

Rob
--

From: Frans Meulenbroeks
Date: Monday, September 15, 2008 - 5:46 am

[...]

Apologies if I'm sidetracking the discussion, but I'd like to coin a remark.

For kernel/ramfsimage etc the best choice is the one that has the
fastest decompression (info on tukaani.org says gzip).
Rationale: as it uncompresses faster the system will boot faster.

Of course this only holds if the background memory can hold that
image. For disk based systems, I assume this is not a problem at all,
but for embedded systems with all software in flash a higher
compression ration (e.g. lzma) can just make the difference between
fit and not fit (so in those cases lzma could just make your day).

Side note: although I think the conclusion at the tukaani website
holds, the data themselves are questionable.
I guess this is done on the internal hard disk of the laptop (this is
not specified). It would be better to do this on a ramfs to avoid
effects from data still being in the buffer cache (or not yet it).

Also the actual time in the tests is spent on three things: read from
disk, decompress, write to disk. (i'll only talk about decompress
here, guess an additional second or so to compress is not that
important).
You can argue that the latter is a constant as the same amount of data
is written, but the first one (the read time) depends on the actual
amount of data and the transfer rate of the device.
In case of slower devices it could well be that higher compression
yields a smaller image. If the reduction in read time is bigger than
the additional cost for the slower decompress, the net effect still
would be a win when it comes to boot time.

and finally: I've seen substantial timing differences when comparing
algorithms on different architectures (arm/mips/x86), so processor
might also make a difference on what is best. (and so will the
compiler).

FM
--

From: Bill Davidsen
Date: Monday, September 15, 2008 - 10:13 am

Given the larger memory needed to decompress, it becomes a very interesting 
calculation in really small memory machines.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
--

From: Steven Noonan
Date: Monday, September 15, 2008 - 10:28 am

It all really depends on what you're prioritizing. If your priority is
speed and low RAM usage, you'd want to go with gzip. If your priority
is low disk usage (for instance, if you're a kernel developer with
dozens of kernels on /boot) and speed/RAM usage are less important,
LZMA is a good choice. It's just a matter of priority in non-embedded
machines. And in embedded machines, you just need to be -really-
careful about the RAM usage. LZMA is pretty flexible, though, so you
can customize the settings to get it to fit whatever memory profile
you need to.
--

From: Leon Woestenberg
Date: Friday, September 26, 2008 - 11:53 am

Hello all,


Exactly.

Whenever LZMA comes up people start discussing the differences and
then discuss what is THE best.
So far, IMHO there is NONE. Selecting the best filesystem is about
making trade-offs depending on what you need.

For example, the systems I use squashfs for, are intended to be booted
only *once* (or incidently after a firmware upgrade). I do not care
about the memory it takes, or if takes twice as long. Also, I need the
rootfs to be read-only. Distributing over any media and uploading
Linux/FPGA firmware over very slow device backplanes is my main
concern.
Squashfs w/ LZMA works best for me, because smallest. Kudos.

Linux, THE best if you need choice.

Regards,
-- 
Leon
--

Previous thread: [PATCH] documentation: clean up extras and omissions in dontdiff by Alain Knaff on Saturday, September 6, 2008 - 2:22 pm. (1 message)

Next thread: RE: [PATCH] Ghost EDD devices in /sys again by H. Peter Anvin on Saturday, September 6, 2008 - 3:08 pm. (3 messages)