I looked at git profiles yesterday, and some of them are pretty scary. We
spend about 50% of the time under some loads in just zlib uncompression,
and when I actually looked closer at the zlib sources I can kind of
understand why. That thing is horrid.
The sad part is that it looks like it should be quite possible to make
zlib simply just perform better. The profiles seem to say that a lot of
the cost is literally in the "inflate()" state machine code (and by that I
mean *not* the code itself, but literally in the indirect jump generated
by the case-statement).
Now, on any high-performance CPU, doing state-machines by having
for (;;)
switch (data->state) {
...
data->state = NEW_STATE;
continue;
}
(which is what zlib seems to be doing) is just about the worst possible
way to code things.
Now, it's possible that I'm just wrong, but the instruction-level profile
really did pinpoint the "look up state branch pointer and jump to it" as
some of the hottest part of that function. Which is just *evil*. You can
most likely use direct jumps within the loop (zero cost at all on most OoO
CPU's) most of the time, and the entry condition is likely quite
predictable too, so a lot of that overhead seems to be just sad and
unnecessary.
Now, I'm just wondering if anybody knows if there are better zlib
implementations out there? This really looks like it could be a noticeable
performance issue, but I'm lazy and would be much happier to hear that
somebody has already played with optimizing zlib. Especially since I'm not
100% sure it's really going to be noticeable..
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html| Michal Piotrowski | Re: 2.6.23-rc3-mm1 |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Fred Tyler | Slow, persistent memory leak in 2.6.20 |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Antonio Almeida | HTB accuracy for high speed |
