That reminds me to finally implement nicer (git-describe like)
[proposed] snapshot filenames. For example for snapshot of state
given by some tag (snapshot of tagged release [1]), don't use
generic
<project basename>-<40-chars sha1>.<suffix>
git-b2a42f55bc419352b848751b0763b0a2d1198479.tar.gz
but
<project basename>-<tag name>.<suffix>
git-v1.5.5.3.tar.gz
(well, currently tags don't have 'snapshot' link, but this is easily
fixed). What do you think about (ab)using 'fp' (file_parent)
parameter to pass proposed snapshot file name?
[1] Would it be good feature to add support for limiting snapshots
to snapshots only of tagged releases (which would be I guess more
important when gitweb caching gets implemented).
Well, only in a sense that with front-end caching (to choose if CPU
matters most) this can be done "for free", without incurring extra
CPU, at the cost of little more disk space.
Of course, if most clients understand (accept) Content-Encoding
(transfer encoding), you can store compressed output, with a little
CPU cost to decompress for non-conformant clients; this way frontend
caching can have cache size comparable to [parsed] data caching.
What I'd like to see in a bit of time is some estimate how much time
would take implementing data caching almost from scratch (a bit of
code in repo's gitweb), compared to merging in kernel.org's gitweb
caching code...
As Lars Hjemli wrote in "[RFC/PATCH] gitweb: Paginate project list"
thread (unfortunately not all articles got to git mailing list)
http://thread.gmane.org/gmane.comp.version-control.git/81838/focus=81875
<quote>
In cgit I've chosen "projectlist in a single file" and "cache html
output". This makes it cheap (in terms of cpu and io) to both generate
and serve the cached page (and the cache works for all pages).
</quote>
<quote>
While I agree that caching search result output almost never makes
sense, I think it's more important that cache hits requires minimal
processing. This is why I've chosen to cache the final result instead
of an intermediate state, but both solutions obviously got some pros
and cons.
</quote>
caching final output is important if you want to minimize processing
(CPU time). I'd say also if you want to implement Range: for resumable
downloads (snapshots), because otherwise I think it would be quote hard
to do reasonably (with caching only data).
So I guess best solution would be mixed one: use output cache for large
or CPU intensive pages, use data caching to limit cache size and for
maximum flexibility (relative dates, sorting by columns: athough that
would be best solved using some DHTML/JavaScript, paginated output,
projects search etc.)
By the way, we have your (Petr 'Pasky' Baudis, based on repo.or.cz)
and John 'Warthog9' Hawley (based on kernel.org) statements that gitweb
performance is I/O bound, but I don't remember any hard data.
I have said wrongly that one can use 'fio' tool to check I/O
performance; it is not true, this tool can be used to test _filesystem_
by generating specified pattern of I/O load. I don't know of any tool
which allow to measure if I/O is bottleneck for given application;
'iogrind' measures cold cache start, and it requires I think compiled
program, as it uses Valgrind. You can measure CPU load, time to
response and memory usage using 'time' (running gitweb as script),
ApacheBench and top; you can measure latency using LatencyTOP.
Is there some iotop tool, and can it be used to measure performance
bottlenecks of web scripts (web applications)?
--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html