Evening everyone,
This is the latest incarnation of gitweb w/ caching. This is finally at the point where it should probably start either being considered for inclusion or mainline, or I need to accept that this will never get in and more perminantely fork (as is the case with Fedora where this is going in as gitweb-caching as a parrallel rpm package).
That said this brings the base up to mainline (again), it updates a number of elements in the caching engine, and this is a much cleaner break-out of the tree vs. what I am currently developing against.
New things known to work:
- Better breakout
- You can actually disable the cache now
- John 'Warthog9' Hawley
John 'Warthog9' Hawley (6):
GITWEB - Load Checking
GITWEB - Missmatching git w/ gitweb
GITWEB - Add git:// link to summary pages
GITWEB - Makefile changes
GITWEB - File based caching layer
GITWEB - Separate defaults from main file
.gitignore | 1 +
Makefile | 15 +-
gitweb/Makefile | 14 +
gitweb/cache.pm | 293 +++++++
gitweb/gitweb.css | 6 +
gitweb/gitweb.perl | 1821 ++++++++++++++++++++-----------------------
gitweb/gitweb_defaults.perl | 468 +++++++++++
7 files changed, 1651 insertions(+), 967 deletions(-)
create mode 100644 gitweb/Makefile
create mode 100644 gitweb/cache.pm
create mode 100644 gitweb/gitweb_defaults.perl
This adds $missmatch_git so that gitweb can run with a miss-matched git install. Gitweb, generally, runs fine on a very broad range of git versions, but it's not always practicle or useful to upgrade it every time you upgrade git. This allows the administrator to realize they are miss-matched, and should they be so inclined, disable the check entirely and run in a miss-matched fasion. This is more here to give an obvious warning as to whats going on vs. silently failing. Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net> --- gitweb/gitweb.perl | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-)
This adds a git:// link to the summary pages should a common $gitlinkurl be defined (default is nothing defined, thus nothing shown) This does make the assumption that the git trees share a common path, and nothing to date is known to actually make use of the link Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net> --- gitweb/gitweb.perl | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-)
This adjust the makefiles so that you can do such things as make gitweb from the top level make tree, or if your in the gitweb directory itself typing make will call back up to the main Makefile and build gitweb Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net> --- Makefile | 4 +++- gitweb/Makefile | 14 ++++++++++++++ 2 files changed, 17 insertions(+), 1 deletions(-) create mode 100644 gitweb/Makefile
This is an attempt to break out the default values & associated documentation from the main gitweb file so that it's easier to browse / read and understand without the associated code involved. This helps by making defaults self contained with their documentation making it easier for someone to read through things and find what they want This is also a not-so-subtle start of trying to break up gitweb into separate files for easier maintainability, having everything in a single file is just a mess and makes the whole thing more complicated than it needs to be. This is a bit of a baby step towards breaking it up for easier maintenance. Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net> --- .gitignore | 1 + Makefile | 15 +- gitweb/Makefile | 2 +- gitweb/gitweb.perl | 515 +++++-------------------------------------- gitweb/gitweb_defaults.perl | 468 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 537 insertions(+), 464 deletions(-) create mode 100644 gitweb/gitweb_defaults.perl
The question is if easier maintenance and development by spliting Hmmm... gitweb/gitweb_defaults.perl as source file, and gitweb/gitweb_defaults.pl as generated file? Wouldn't it be better to go with the convention used elsewhere in gitweb and use gitweb/gitweb_defaults.perl.in or gitweb/gitweb_defaults.pl.in as Why the slightly inconsistent style ("%.cgi,%perl" vs "%.pl, %perl")? Also wouldn't all replacements be in the new gitweb_defaults file, so there would be no need then to do replacements for gitweb.cgi? Why this block is required? Why not have variables defined (using "our") in gitweb_defaults file? Why not "our $base_url = $my_url;"? -- Jakub Narebski Poland ShadeHawk on #git --
This would just get dropped into the same location that gitweb.cgi exists in, there is no real difference in installation, and thus I can't I think you got confused, the committed file is .perl the generated file Cruft, thought I had deleted and excluded it, won't be there in next Considering that the defaults is more of an include vs. a cgi it probably shouldn't share the standard expected executable suffix, thus I used .pl. Could just as easily change it to .pm, or something else but I think it would make the most sense to leave things we are expecting the webserver to directly execute as .cgi, and includes as a different Not all replacements are done in one or the other, and since it's basically a NOP to perform the full set of replacements on both files Wanted to make sure things were properly defined, if in an unexpected same reason as the other 'our' includes above, though why this ended up as a separate patch vs. the rest of the file I don't know. - John 'Warthog9' Hawley --
To be more exact you have to know that you have to drop _generated files_,
which means (for this version of patch) gitweb.cgi and gitweb_defaults.pl
(or whatever the generated file with config variables would be named).
ATTENTION!
Your changes would make stop working all gitweb tests. Currently they
do some magic with generated gitweb config file "$(pwd)/gitweb_config.perl"
set via GITWEB_CONFIG configuration variable to be able to run
_unprocessed_ gitweb/gitweb.perl (without any substitutions). This
allow to run tests on "live" version of gitweb.
After your changes it would be no longer possible, at least not if we
want to be sure that we test the same version of gitweb as gitweb_defaults.
It would probably mean that we need to move to testing built version,
Maybe I wasn't entirely clean. I meant that the committed source file
should perhaps have *.in extension to denote that it is to be processed
via variable substitution, so it would be
committed file: gitweb/gitweb_defaults.pl.in
generated file: gitweb/gitweb_defaults.pl
I was not asking about that, but about
+ $(patsubst %.cgi,%.perl,$(patsubst %.pl, %.perl, $@)) >$@+ && \
vs
+ $(patsubst %.cgi,%.perl,$(patsubst %.pl,%.perl,$@)) >$@+ && \
But after thinking about it a bit, and consulting make documentation
(in particular definition of $@ variable) this rule shouldn't work at all.
`$@'
The file name of the target of the rule. If the target is an
archive member, then `$@' is the name of the archive file. In a
pattern rule that has multiple targets (*note Introduction to
Pattern Rules: Pattern Intro.), `$@' is the name of whichever
target caused the rule's commands to be run.
What we need is to run pattern substitution for _two_ files, perhaps
using the for loop.
Also I think the order of substitutions would be better to be reversed:
$(patsubst %.pl,%.perl,$(patsubst %.cgi,%.perl,$FILE)) >$FILE+
O.K.
But Makefile would be (slightly) ...You didn't have to shout. Any progress on this front? Not that I am anxious to queue new topics to 'next' right now (we are frozen for 1.6.6), but I think having what is proven to work well at a real site like k.org is much better than waiting for an unproven reimplementation using somebody else's framework only for your theoretical cleanliness. John has better things to do than doing such a rewrite himself, and even if you helped the process by producing a competing caching scheme based on existing web caching engines, the aggregated result (not just the web caching engine you base your work on) needs to get a similar field exposure to prove itself that it can scale to the load k.org sees, which would be quite a lot of work, no? --
Sadly, no. Busy weekend and a need to get some of the kernel.org servers upgraded has taken some precedence. I should be circling back --
So should I wait for reroll with proposals for improvements (modified patches)? -- Jakub Narebski Poland --
I'd probably wait, though it's starting to look like if I get to gitweb today it will be this evening as I ventured off into getting the last 6 of the kernel.org servers upgraded. Either way I will have a new patch series and some changes in my own git tree shortly. - John 'Warthog9' Hawley --
I'm not against (well, not much against) custom caching that kernel.org uses, but I am against large change to gitweb code currently accompanying caching, namely gather then output solution, which would negatively affect performance when caching is turned off. Also I'd like to have caching code (the one that didn't made it to git mailing list for some reason, probably vger anti-SPAM filter) cleaned up for submission: remove commented-out code, reduce code duplication, separate dealing with orthogonal issues (cache itself, adaptivity of cache, background generation and 'in progress' info, generating key for cache (and improve key generation to include path_info / use %input_params)), follow the same style that gitweb itself uses. As for the "[PATCH 6/6] GITWEB - Separate defaults from main file" patch, it would require modifying gitweb tests to use generated gitweb/gitweb.cgi rather than source gitweb/gitweb.perl. As for having caching code tested by git.kernel.org: IIRC there was issue with it not having cache expiration thus gathering GB of cached data. -- Jakub Narebski Poland --
"John 'Warthog9' Hawley" <warthog9@kernel.org> writes: Below are _proposed_ changes to make commit message easier to read, in Add "gitweb" target to main Makefile so you would be able to simply instead of requiring to spell it in full IMPORTANT! A note about this change: I think it would be better to move creating gitweb.cgi (and optionally gitweb.min.js) to gitweb/Makefile, and make main Makefile call gitweb/Makefile, and not vice versa like in your solution. Why it is here, and not with the .PHONY block at line 1924 of Makefile? It would be nice to have comment supporting this choice in -- Jakub Narebski Poland ShadeHawk on #git --
It's quite possible, and I'm fine with doing that. If no one has any objections I can re-work those with the understanding that the build process for gitweb shift to the gitweb/ directory instead of the main There are 6 other instances of .PHONY in the makefile, having the .PHONY localized seemed to make it the most obvious since it was right next to Preparation for a later change. The change could happen all at the same From this makefile I wanted to explicitly call up to the main makefile no matter what, the main makefile doesn't consider the targets .PHONY --
In my opinion it would be better solution because it would reduce size of main (master) Makefile, and not be much larger than this solution. git-gui/, Documentation/, perl/ all have their own makefiles, which do I was thinking here about this large block of .PHONY declarations, Please at least describe this change in commit message. But I think it could be moved to other patch, or put in separate patch. What is the reason of this phony .PHONY? If gitweb.cgi is newer than gitweb.perl (and other sources), then without .PHONY it wouldn't be regenerated. With .PHONY it would call master Makefile... which would notice that gitweb.cgi is newer than gitweb.perl and do not regenerate. So what this .PHONY does is unnecessary call make on master Makefile... But I guess this issue would be moot if it was the other way around, i.e. master Makefile calling gitweb/Makefile. -- Jakub Narebski Poland --
From: John 'Warthog9' Hawley <warthog9@kernel.org>
This commit adjust the main Makefile so you can simply run
make gitweb
which in turn calls gitweb/Makefile. This means that in order to
generate gitweb, you can simply run 'make' from gitweb subdirectory:
cd gitweb
make
Targets gitweb/gitweb.cgi and (dependent on JSMIN being defined)
gitweb/gitweb.min.js in main Makefile are preserved for backward
compatibility.
Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This implements separate Makefile for gitweb, with main Makefile calling
it, like for gitk-git/, git-gui/, Documentation/, t/, and templates/
directories.
It is marked as RFC because I don't feel that my make-fu is strong enough
to be sure that there are no errors / mistakes. Very slightly tested.
Makefile | 64 +++++----------------------
gitweb/Makefile | 129 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 141 insertions(+), 52 deletions(-)
create mode 100644 gitweb/Makefile
diff --git a/Makefile b/Makefile
index 4a1e5bc..50d815e 100644
--- a/Makefile
+++ b/Makefile
@@ -280,29 +280,6 @@ pathsep = :
# JavaScript minifier invocation that can function as filter
JSMIN =
-# default configuration for gitweb
-GITWEB_CONFIG = gitweb_config.perl
-GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
-GITWEB_HOME_LINK_STR = projects
-GITWEB_SITENAME =
-GITWEB_PROJECTROOT = /pub/git
-GITWEB_PROJECT_MAXDEPTH = 2007
-GITWEB_EXPORT_OK =
-GITWEB_STRICT_EXPORT =
-GITWEB_BASE_URL =
-GITWEB_LIST =
-GITWEB_HOMETEXT = indextext.html
-GITWEB_CSS = gitweb.css
-GITWEB_LOGO = git-logo.png
-GITWEB_FAVICON = git-favicon.png
-ifdef JSMIN
-GITWEB_JS = gitweb.min.js
-else
-GITWEB_JS = gitweb.js
-endif
-GITWEB_SITE_HEADER =
-GITWEB_SITE_FOOTER =
-
export prefix bindir sharedir sysconfdir
CC = gcc
@@ -1509,6 +1486,11 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
chmod +x $@+ && \
mv $@+ ...Hi, Nice. I forgot to mention in my comments to 2/6 that you seem to wrap after more than 80 characters. However, I have no idea what the suggested line width is for gitweb. Again, this could be done by having the variable defined as undef. Maybe it would be even nicer if the administrator could specify the protocol, e.g. when they do not want/cannot allow git:// but only http:// access to the repositories? Ciao, Dscho --
The problem I had and have with this patch is the duplication of data:
$gitlinkurl contains subset of information in @git_base_url_list,
which in turn is filled from GITWEB_BASE_URL build config variable.
I can understand that for performance reason you don't want to check
$projectroot/$project/cloneurl nor gitweb.url config variable for
each and every displayed project; if the link to repository (for git)
cannot be derived from project path (repository path), then simply
s/shwon/shown/
I'd say that 'Full URL is "$gitlinkurl/$project"' instead of last
sentence in above comment.
Why not
our $gitlinkurl_base = "++GITWEB_BASE_URL++";
Does it even pass tests?
$cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
$cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
+ ($gitlinkurl_base ?
+ " | " . $cgi->a({-href=>"$gitlinkurl_base/$pr->{'path'}", "git") : '') .
"</td>\n" .
"</tr>\n";
}
Changes made:
* Instead of using separate if conditional statement and print
statement (note that you forgot to change '.' to ';' to end
statement) use ternary conditional operator "?:"
* Make $gitlinkurl_base include "git://" protocol specifier
* Do not create "git" link if $gitlinkurl_base is false, which means
undef, empty string '' and 0 (but 0 is not very likely to be base
for "git" link).
* Do not use esc_html on fragment of URL. The CGI.pm should escape
attributes itself. If it was HTTP link, one should perhaps esc_url
on whole link, but esc_html is for escaping HTML.
--
Jakub Narebski
Poland
ShadeHawk on #git
--
From: John 'Warthog9' Hawley <warthog9@kernel.org> This adds a "git" link for each project in the project list page, should a common $gitlinkurl_base be defined and not empty. The full URL of each link is composed of $gitlinkurl_base and project name. It is intended for git:// links, and in fact GITWEB_BASE_URL build variable is used as its default value only if it starts with git:// This does make the assumption that the git repositories share a common path. Nothing to date is known to actually make use of introduced link. Created "git" link follows rel=vcs-* microformat specification: http://kitenet.net/~joey/rfc/rel-vcs/ Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- I think it might be good idea... but for the fact "Nothing to date is known to actually make use of introduced link". What's its intended use? Differences to original version by John 'Warthog9' Hawley (J.H.): * It doesn't cause syntax error ;-) * Escaping of attribute value is left to CGI.pm * $gitlinkurl got renamed to $gitlinkurl_base, now includes git:// prefix, and defaults to GITWEB_BASE_URL if it begins with git:// * Added description to gitweb/README * Uses rel=vcs-* microformat by Joey Hess I assume that nobody sane would define $gitlinkurl_base to "0"... gitweb/README | 4 ++++ gitweb/gitweb.perl | 8 ++++++++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/gitweb/README b/gitweb/README index 608b0f8..36fb059 100644 --- a/gitweb/README +++ b/gitweb/README @@ -71,6 +71,7 @@ You can specify the following configuration variables when building GIT: * GITWEB_BASE_URL Git base URLs used for URL to where fetch project from, i.e. full URL is "$git_base_url/$project". Shown on projects summary page. + If it begins with "git://" it is also used for $gitlinkurl_base, see below. Repository URL for project can be also configured per repository; this takes precedence over URLs ...
First, why one would want to require that gitweb version (version at
the time of build) and runtime git version (version of git used to run
commands) match?
Second, it is mismatch, not missmatch (one 's', not double 's').
Third, in my opinion it would be better to name variable in question
e.g. $versions_must_match and also flip its meaning (true means check
First, 'undef' is false, so it could have been written as
+our $missmatch_git;
Or if you prefer explicit false-ish value as default, 0 would be I
think better than empty string '':
+our $missmatch_git = 0;
Second, there is question whether default should be to allow
mismatched versions (current behaviour, more lenient...) or deny (or
warn about) mismatched version, i.e. should it be $versions_must_match
Could you please clean up language in above comment? It is very
convoluted. Please also limit line width of above comment to 76 / 80
Style
+if (!$allow_versions_mismatch &&
+ $git_version ne $version) {
Do not compare $missmatch_git / $allow_versions_mismatch against '':
Shouldn't this be "500 Internal Server Error" or something (using the
--
Jakub Narebski
Poland
ShadeHawk on #git
--
From: John 'Warthog9' Hawley <warthog9@kernel.org>
This adds $git_versions_must_match variable, which is set to true
value checks that we are running on the same version of git that we
shipped with, and if not throw '500 Internal Server Error' error.
What is checked is the version of gitweb (embedded in building
gitweb.cgi), against version of runtime git binary used.
Gitweb can usually run with a mismatched git install. This is more
here to give an obvious warning as to whats going on vs. silently
failing.
By default this feature is turned off.
Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
I don't quite see the reason behind such option, and I think that
error (instead of for example warning) on version mismatch is too much.
This is an RFC because formatting of error page is a bit rough, and
(ab)uses exist CSS classes instead of creating new classnames for
semantic markup.
Differences from original version, by J.H.:
* Changed name and flipped meaning of config variable, from
$missmatch_git to $git_versions_must_match
* $git_versions_must_match is boolean flag - do not compare with an
empty string.
* Changed error message a bit, fixed style, added entry in README
gitweb/README | 3 +++
gitweb/gitweb.perl | 33 +++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/gitweb/README b/gitweb/README
index 6c2c8e1..608b0f8 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -233,6 +233,9 @@ not include variables usually directly set during build):
If server load exceed this value then return "503 Service Unavaliable" error.
Server load is taken to be 0 if gitweb cannot determine its value. Set it to
undefined value to turn it off. The default is 300.
+ * $git_versions_must_match
+ If set, gitweb fails with 500 Internal Server Error if the version of gitweb
+ doesn't match version of git binary. The default is false.
...Hi, I'm not a native English speaker and all, but I thought it was spelt 'mismatch', i.e. with only one 's'. Maybe even name it 'allow_different_git_version' or 'no_strict_git_version'. A few comments on the patch: the style of the if() statement disagrees with the other ones; please use the same style. Also, as with 1/6, turning off the feature might be better done by setting it to undef. Finally, would it not be nicer if the warning really was only a warning, i.e. that the script would try to continue after giving the users a pretty warning header? Ciao, Dscho --
This changes the behavior, slightly, of gitweb so that it verifies that the box isn't inundated with before attempting to serve gitweb. If the box is overloaded, it basically returns a 503 server unavailable until the load falls below the defined threshold. This helps dramatically if you have a box that's I/O bound, reaches a certain load and you don't want gitweb, the I/O hog that it is, increasing the pain the server is already undergoing. adds $maxload configuration variable. Default is a load of 300, which for most cases should never be hit. Please note this makes the assumption that /proc/loadavg exists as there is no good way to read load averages on a great number of platforms [READ: Windows], or that it's reasonably accurate. Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net> --- gitweb/gitweb.perl | 24 ++++++++++++++++++++++++ 1 files changed, 24 insertions(+), 0 deletions(-)
Heya, On Fri, Dec 11, 2009 at 00:45, John 'Warthog9' Hawley Also, what happened to including patches inline for ease of review? -- Cheers, Sverre Rabbelier --
Your patch doesn't allow for *turning off* this feature. Reasonable solution would be to use 'undef' or negative number to turn off this What about MacOS X, or FreeBSD, or OpenSolaris? You should mention that it is intended that if gitweb cannot read load average (for example /proc/loadavg does not exist), then the feature Why signoff is different from author (warthog9@kernel.org)? Why this I'd probably say: +# Used to set the maximum load that we will still respond to gitweb queries. +# If server load exceed this value then return "503 server busy" error, Why not use one of existing CPAN modules: Sys::Info::Device::CPU, BSD::getloadavg, Sys::CpuLoad? Style: + open (my $load, '<', '/proc/loadavg') or return 0; and of course no "my $load" at beginning. Also perhaps $fh, or Why not use die_error subroutine? Is it to have generate absolutely minimal load, and that is why you do not use die_error(), or even $cgi->header()? Wouldn't a better solution be to use here-doc syntax? + print <<'EOF'; +Content-Type: text/plain; charset=utf-8 +Status: 503 Excessive load on server + +The load average on the server is too high +EOF -- Jakub Narebski Poland ShadeHawk on #git --
I would prefer to hear something along the lines of...
I like this. Here is a follow-up patch you can squash in to
support other platforms.
I gave the patches a cursory look (I somehow didn't see 5/6, though) and
they all looked decently done, except that some of the lines were
excessively long.
--
Well there's the opposite argument that setting the number arbitrarily high, 4096 for instance would also in essence negate this (though I'll admit I've reached and exceeded those numbers before) That said I agree, being able to turn this off needs to be added and My bad, did the patches up on my laptop but had to send them out from kernel.org, thus the miss-match: I.E. user error. Here's the fundamental problem: Sys:Info:Device:CPU Windows: Using this method under Windows is not recommended since, the WMI interface will possibly take at least 2 seconds to complete the request. BSD::getloadavg While this more or less supports anything with a libc getloadavg (and thus might be the best one I've seen, I'll admit I didn't notice this one when I looked years ago) getting it to work on windows looks, exciting. Sys::CpuLoad: http://cpansearch.perl.org/src/CLINTDW/Sys-CpuLoad-0.03/README Specifically: - Currently FreeBSD and OpenBSD are supported. - Wanted: HPUX 11.11 ... - Todo: Win32 support So this doesn't really buy me anything but, maybe, BSD support. So at the end of the day, none of those really gets me a "useful" cross platform load checker (though like I said BSD::getloadavg looks to be the best of the ones you mentioned) and more or less Windows is going to lose this as a usable feature no matter what. I think I'd almost rather set this up so that if it can't get something useful (I.E. /proc/loadavg is missing) it just skips past it as if the load was 0. I might try out the BSD::getloadavg but I want to take a look and see if that's easily installed or not, if it's not it might be difficult to justify that as a dependency. It was intended to be the most minimal possible, mainly get in, get out. Also not sure the die_error existed in gitweb when this was originally written. Probably worth switching to it now since it's there either way, and I don't think using it would add enough overhead to ...
Thanks; all sounded a reasonable response to the review. Are you re-rolling the series anytime soon (I am asking because then I'd rather not to queue this round especially because I didn't see 5/6). --
I'll probably have some changes up and about tomorrow, and it's a little troubling that 5/6 didn't come through for you 6 at least made it to marc.info: http://marc.info/?l=git&m=126048884825985&w=2 and 5 seems to have been eaten by a grue somewhere. It was a *big* patch mainly because all the caching flips over in a single go. If you want I can privately bounce 5 & 6 to you so you have a complete tree right now? - John 'Warthog9' Hawley --
Not to reply to myself but this might also be helpful: http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v2 - John 'Warthog9' Hawley --
Thanks, but not interested, in the sense that it wouldn't make much sense to me to have a version tonight that is known to go stale within a few days. I only pick up and queue patches to 'pu' to save me from later trouble of finding them from the mailing list backlog, and not to actively review and engage in the discussion to polish them right now. We are already deep in pre-release freeze and my attention is not currently on anything that won't go in to the upcoming release. I want to have a solid 1.6.6 before the holidays as a present to all ;-). --
Simplest solution would be to used 'undef' (undefined value) for "turned off", i.e.: I think it would be better to write in commit message that because finding load average is OS dependent, there is provided (sample) solution which uses /proc/loadavg and works (at least) on Linux. And that for platforms which do not have /proc/loadavg the feature is simply turned off (by the See above proposal. This information should be present in commit message, After thinking about this a bit, now I don't think that it is terribly important. You *might* describe alternate approaches (roads not taken) in commit message, but requiring /proc/loadavg for the feature to work Well, if you are not worring excessively about overhead, then I think using die_error would be the best solution, as it would preserve look of gitweb. It would require extending die_error by 503 response, or rather %http_responses hash and comment above die_error. Also I think that Status: should be before Content-Type: header (but probably it is not required by the standard). -- Jakub Narebski Poland --
From: John 'Warthog9' Hawley <warthog9@kernel.org>
This changes slightly the behavior of gitweb, so that it verifies
that the box isn't inundated with before attempting to serve gitweb.
If the box is overloaded, it basically returns a 503 Server Unavailable
until the load falls below the defined threshold. This helps dramatically
if you have a box that's I/O bound, reaches a certain load and you
don't want gitweb, the I/O hog that it is, increasing the pain the
server is already undergoing.
This behavior is controlled by $maxload configuration variable.
Default is a load of 300, which for most cases should never be hit.
Unset it (set it to undefined value, i.e. undef) to turn off checking.
Currently it requires that '/proc/loadavg' file exists, otherwise the
load check is bypassed (load is taken to be 0). So platforms that do
not implement '/proc/loadavg' currently cannot use this feature.
Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is my take on this patch, with all my concerns taken into
consideration... well, all except describing alterante approaches
to straight using /proc/loadavg.
Differences to original version by John 'Warthog9' Hawley (J.H.):
* Slightly improved wording in commit message and in comments
* $maxload described in gitweb/README, in "Gitweb config file variables"
section
* You can use '$maxload = undef;' to turn off load checking
* Error page for too high load is generated using die_error, which had
to be extended to handle 503 Service Unavailable HTTP error code
gitweb/README | 7 ++++++-
gitweb/gitweb.perl | 39 +++++++++++++++++++++++++++++++++++----
2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/gitweb/README b/gitweb/README
index e34ee79..6c2c8e1 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -174,7 +174,7 @@ not include variables usually directly set during build):
Base URL for relative URLs in pages generated by ...What about systems not having /proc/loadavg
--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 34 29 155 34 / +261 33 11 207 36
--
Heya, On Fri, Dec 11, 2009 at 00:45, John 'Warthog9' Hawley I'd prefer not to be shouted at, how about s/GITWEB/gitweb: /g ? :) -- Cheers, Sverre Rabbelier --
This patch didn't made it to git mailing list. I suspect that you ran afoul vger anti-SPAM filter. Does this "File based caching layer" have anything common with GSoC -- Jakub Narebski Poland ShadeHawk on #git --
First, why do you reinvent the wheel instead of using one of existing caching interfaces like CHI or Cache::Cache (perhaps creating a custom backend or middle layer which incorporates required features, like being load-aware)? This way changing from file-based cache to e.g. mmap based one or to memcached would be very simple. And you would avoid pitfals in doing your own cache management. perl-Cache-Cache should be available package in extras repositories. If module is no available this would simply mean no caching, like in many (or not so many) other cases with optional features in gitweb. Second, if you can't use CGI::Cache directly, you can always steal the idea from it, then the change to gitweb itself would be minimal: "Internally, the CGI::Cache module ties the output file descriptor (usually STDOUT) to an internal variable to which all output is saved." P.S. I'll postpone critique of the patch itself for now. The above issues are much more important. -- Jakub Narebski Poland --
Well for starters this isn't exactly a reinvention of the wheel, and this isn't something "new" per-se. This code has been actively running on git.kernel.org for something like 3 - 4 years so there's something to be said for the devil we know and understand. As well using the other caching strategies involves adding dramatically more complex interactions with caching layer. The caching layer is actually quite specific to how git + gitweb works and solves more than just "caching" on the surface. Specifically it solves the stampeding herd problem which would have to be solved either way even if I didn't implement my own caching, and since I had to do that caching was barely a step beyond True but these are *VERY* different caching strategies than the one I've got here, yes it's using files as a backend but it's doing so with specific goals in mind. As I've said I plan to integrate Lea's memcached based caching into this in the future and that has different advantages and disadvantages. At the end of the day the "normal" caching engines aren't as efficient as mine and there is the case the very high performance sites are going to have to investigate a number of different solutions to see what works best for them. Mine is also *dramatically* simpler to setup as well, There's pitfalls if I do it myself, or I use one of the other "common" perl modules. I did it this way years ago, I've maintained it and it works pretty well. I won't admit that it's the smartest caching engine on the planet, far from it, but it has evolved specifically for gitweb and that itself saves me a lot of pitfalls from cache engine + gitweb Yes, but as can be seen from how you enable various other caching engines the setup of those is non-trivial, this is and either way caching *HAS* to be explicitly turned on by the admin/user since they are going to have to do *some* configuration, or at least be aware that I thought about that 3 years ago, and decided it wasn't a ...
It _might_ be caused by the fact that you used attachement. But it might
The question would be then whether it makes sense to have two caches at
different levels in the stack (see also discussion below about Lea
Well, if it is not reinventing the wheel, then it is yak shaving (yet
another ...) ;-)
The fact that the code was used and tested at one single installation
doesn't mean that it doesn't have warts that could be avoided by using
I am hoping that if it was done right, by using CHI or Cache::Cache
caching interface, then choosing alternate caching engine, or adding
extra level of cache would be simple and decoupled from issues specific
The idea is for gitweb/cache.pm (or gitweb/Gitweb-Cache.pm, or
gitweb/Gitweb/Cache.pm) is to encapsulate issues specific to gitweb,
like generating cache key, or printing "Generating...", etc.
Perhaps also the idea of filling cache in background (but see discussion
CHI tries to solve cache miss stampedes via expires_variance mechanism.
There is Cache::Adaptive (and its subclass Cache::Adaptive::ByLoad)
which does adaptive lifetime control (it accepts any Cache::Cache
compatible cache, so I think it also accepts CHI compatible cache).
Errr... besides using Cache::Cache compatible cache (see!!!), for example
Cache::Memcached, Lea Wiemann's gitweb caching did caching at entirely
different level than original kernel.org's gitweb.
The stack for gitweb looks somewhat like this:
git commands output open my $fd, '-|, git_cmd(), ...
|
v
parsed output data parse_ls_tree_line, parse_commit, ...
|
v
generated HTML etc. print ...
:
V
caching optional
reverse proxy Varnish, Squid
If I understand correctly Lea Wiemann code cache git command output.
The fork of gitweb used at repo.or.cz does caching of parsed data at
least for most intensive projects list page. This patch was about caching
generated output. ...