Re: [PATCH 4/6] GITWEB - Makefile changes

Previous thread: [PATCH] git gui: Use git diff --submodule when available by Jens Lehmann on Thursday, December 10, 2009 - 3:44 pm. (2 messages)

Next thread: How to selectively recreate merge state? by Jay Soffian on Thursday, December 10, 2009 - 4:56 pm. (18 messages)
From: John 'Warthog9' Hawley
Date: Thursday, December 10, 2009 - 4:45 pm

Evening everyone,

This is the latest incarnation of gitweb w/ caching.  This is finally at the point where it should probably start either being considered for inclusion or mainline, or I need to accept that this will never get in and more perminantely fork (as is the case with Fedora where this is going in as gitweb-caching as a parrallel rpm package).

That said this brings the base up to mainline (again), it updates a number of elements in the caching engine, and this is a much cleaner break-out of the tree vs. what I am currently developing against.

New things known to work:
    - Better breakout
    - You can actually disable the cache now

- John 'Warthog9' Hawley 

John 'Warthog9' Hawley (6):
  GITWEB - Load Checking
  GITWEB - Missmatching git w/ gitweb
  GITWEB - Add git:// link to summary pages
  GITWEB - Makefile changes
  GITWEB - File based caching layer
  GITWEB - Separate defaults from main file

 .gitignore                  |    1 +
 Makefile                    |   15 +-
 gitweb/Makefile             |   14 +
 gitweb/cache.pm             |  293 +++++++
 gitweb/gitweb.css           |    6 +
 gitweb/gitweb.perl          | 1821 ++++++++++++++++++++-----------------------
 gitweb/gitweb_defaults.perl |  468 +++++++++++
 7 files changed, 1651 insertions(+), 967 deletions(-)
 create mode 100644 gitweb/Makefile
 create mode 100644 gitweb/cache.pm
 create mode 100644 gitweb/gitweb_defaults.perl
From: John 'Warthog9' Hawley
Date: Thursday, December 10, 2009 - 4:45 pm

This adds $missmatch_git so that gitweb can run with a miss-matched
git install.  Gitweb, generally, runs fine on a very broad range of
git versions, but it's not always practicle or useful to upgrade it
every time you upgrade git.

This allows the administrator to realize they are miss-matched, and
should they be so inclined, disable the check entirely and run in
a miss-matched fasion.

This is more here to give an obvious warning as to whats going on
vs. silently failing.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

From: John 'Warthog9' Hawley
Date: Thursday, December 10, 2009 - 4:45 pm

This adds a git:// link to the summary pages should a common
$gitlinkurl be defined (default is nothing defined, thus nothing
shown)

This does make the assumption that the git trees share a common
path, and nothing to date is known to actually make use of the link

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

From: John 'Warthog9' Hawley
Date: Thursday, December 10, 2009 - 4:45 pm

This adjust the makefiles so that you can do such things as

	make gitweb

from the top level make tree, or if your in the gitweb directory
itself typing

	make

will call back up to the main Makefile and build gitweb

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 Makefile        |    4 +++-
 gitweb/Makefile |   14 ++++++++++++++
 2 files changed, 17 insertions(+), 1 deletions(-)
 create mode 100644 gitweb/Makefile

From: John 'Warthog9' Hawley
Date: Thursday, December 10, 2009 - 4:45 pm

This is an attempt to break out the default values & associated
documentation from the main gitweb file so that it's easier to
browse / read and understand without the associated code involved.

This helps by making defaults self contained with their documentation
making it easier for someone to read through things and find what
they want

This is also a not-so-subtle start of trying to break up gitweb into
separate files for easier maintainability, having everything in a
single file is just a mess and makes the whole thing more complicated
than it needs to be.  This is a bit of a baby step towards breaking it
up for easier maintenance.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 .gitignore                  |    1 +
 Makefile                    |   15 +-
 gitweb/Makefile             |    2 +-
 gitweb/gitweb.perl          |  515 +++++--------------------------------------
 gitweb/gitweb_defaults.perl |  468 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 537 insertions(+), 464 deletions(-)
 create mode 100644 gitweb/gitweb_defaults.perl

From: Jakub Narebski
Date: Friday, December 11, 2009 - 8:46 am

The question is if easier maintenance and development by spliting


Hmmm... gitweb/gitweb_defaults.perl as source file, and
gitweb/gitweb_defaults.pl as generated file?  Wouldn't it be better to
go with the convention used elsewhere in gitweb and use
gitweb/gitweb_defaults.perl.in or gitweb/gitweb_defaults.pl.in as


Why the slightly inconsistent style ("%.cgi,%perl" vs "%.pl, %perl")?


Also wouldn't all replacements be in the new gitweb_defaults file, so
there would be no need then to do replacements for gitweb.cgi?



Why this block is required?  Why not have variables defined (using
"our") in gitweb_defaults file?


Why not "our $base_url = $my_url;"?


-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: J.H.
Date: Friday, December 11, 2009 - 8:58 am

This would just get dropped into the same location that gitweb.cgi 
exists in, there is no real difference in installation, and thus I can't 

I think you got confused, the committed file is .perl the generated file 

Cruft, thought I had deleted and excluded it, won't be there in next 

Considering that the defaults is more of an include vs. a cgi it 
probably shouldn't share the standard expected executable suffix, thus I 
used .pl.  Could just as easily change it to .pm, or something else but 
I think it would make the most sense to leave things we are expecting 
the webserver to directly execute as .cgi, and includes as a different 

Not all replacements are done in one or the other, and since it's 
basically a NOP to perform the full set of replacements on both files 


Wanted to make sure things were properly defined, if in an unexpected 

same reason as the other 'our' includes above, though why this ended up 
as a separate patch vs. the rest of the file I don't know.

- John 'Warthog9' Hawley
--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 3:53 pm

To be more exact you have to know that you have to drop _generated files_,
which means (for this version of patch) gitweb.cgi and gitweb_defaults.pl
(or whatever the generated file with config variables would be named).


ATTENTION!

Your changes would make stop working all gitweb tests.  Currently they
do some magic with generated gitweb config file "$(pwd)/gitweb_config.perl"
set via GITWEB_CONFIG configuration variable to be able to run
_unprocessed_ gitweb/gitweb.perl (without any substitutions).  This
allow to run tests on "live" version of gitweb.

After your changes it would be no longer possible, at least not if we
want to be sure that we test the same version of gitweb as gitweb_defaults.

It would probably mean that we need to move to testing built version,

Maybe I wasn't entirely clean.  I meant that the committed source file
should perhaps have *.in extension to denote that it is to be processed
via variable substitution, so it would be

  committed file: gitweb/gitweb_defaults.pl.in
  generated file: gitweb/gitweb_defaults.pl
 

I was not asking about that, but about

+	    $(patsubst %.cgi,%.perl,$(patsubst %.pl, %.perl, $@)) >$@+ && \

vs

+	    $(patsubst %.cgi,%.perl,$(patsubst %.pl,%.perl,$@)) >$@+ && \

But after thinking about it a bit, and consulting make documentation
(in particular definition of $@ variable) this rule shouldn't work at all.

`$@'
     The file name of the target of the rule.  If the target is an
     archive member, then `$@' is the name of the archive file.  In a
     pattern rule that has multiple targets (*note Introduction to
     Pattern Rules: Pattern Intro.), `$@' is the name of whichever
     target caused the rule's commands to be run.
 
What we need is to run pattern substitution for _two_ files, perhaps
using the for loop.

Also I think the order of substitutions would be better to be reversed:

    $(patsubst %.pl,%.perl,$(patsubst %.cgi,%.perl,$FILE)) >$FILE+


O.K.

But Makefile would be (slightly) ...
From: Junio C Hamano
Date: Tuesday, December 15, 2009 - 6:22 pm

You didn't have to shout.

Any progress on this front?

Not that I am anxious to queue new topics to 'next' right now (we are
frozen for 1.6.6), but I think having what is proven to work well at a
real site like k.org is much better than waiting for an unproven
reimplementation using somebody else's framework only for your theoretical
cleanliness.  John has better things to do than doing such a rewrite
himself, and even if you helped the process by producing a competing
caching scheme based on existing web caching engines, the aggregated
result (not just the web caching engine you base your work on) needs to
get a similar field exposure to prove itself that it can scale to the load
k.org sees, which would be quite a lot of work, no?

--

From: J.H.
Date: Tuesday, December 15, 2009 - 7:00 pm

Sadly, no.  Busy weekend and a need to get some of the kernel.org 
servers upgraded has taken some precedence.  I should be circling back 
--

From: Jakub Narebski
Date: Wednesday, December 16, 2009 - 12:52 pm

So should I wait for reroll with proposals for improvements (modified 
patches)?

-- 
Jakub Narebski
Poland
--

From: J.H.
Date: Wednesday, December 16, 2009 - 1:04 pm

I'd probably wait, though it's starting to look like if I get to gitweb 
today it will be this evening as I ventured off into getting the last 6 
of the kernel.org servers upgraded.  Either way I will have a new patch 
series and some changes in my own git tree shortly.

- John 'Warthog9' Hawley

--

From: Jakub Narebski
Date: Tuesday, December 15, 2009 - 7:22 pm

I'm not against (well, not much against) custom caching that kernel.org
uses, but I am against large change to gitweb code currently accompanying
caching, namely gather then output solution, which would negatively 
affect performance when caching is turned off.

Also I'd like to have caching code (the one that didn't made it to git
mailing list for some reason, probably vger anti-SPAM filter) cleaned up
for submission: remove commented-out code, reduce code duplication, 
separate dealing with orthogonal issues (cache itself, adaptivity of cache,
background generation and 'in progress' info, generating key for cache
(and improve key generation to include path_info / use %input_params)),
follow the same style that gitweb itself uses.

As for the "[PATCH 6/6] GITWEB - Separate defaults from main file" patch,
it would require modifying gitweb tests to use generated gitweb/gitweb.cgi
rather than source gitweb/gitweb.perl.


As for having caching code tested by git.kernel.org: IIRC there was issue
with it not having cache expiration thus gathering GB of cached data.
-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 7:28 am

"John 'Warthog9' Hawley" <warthog9@kernel.org> writes:

Below are _proposed_ changes to make commit message easier to read, in

Add "gitweb" target to main Makefile so you would be able to simply

instead of requiring to spell it in full





IMPORTANT!

A note about this change: I think it would be better to move creating
gitweb.cgi (and optionally gitweb.min.js) to gitweb/Makefile, and make
main Makefile call gitweb/Makefile, and not vice versa like in your
solution.


Why it is here, and not with the .PHONY block at line 1924 of
Makefile?  It would be nice to have comment supporting this choice in




-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: J.H.
Date: Friday, December 11, 2009 - 9:22 am

It's quite possible, and I'm fine with doing that.  If no one has any 
objections I can re-work those with the understanding that the build 
process for gitweb shift to the gitweb/ directory instead of the main 

There are 6 other instances of .PHONY in the makefile, having the .PHONY 
localized seemed to make it the most obvious since it was right next to 

Preparation for a later change.  The change could happen all at the same 

 From this makefile I wanted to explicitly call up to the main makefile 
no matter what, the main makefile doesn't consider the targets .PHONY 



--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 9:41 am

In my opinion it would be better solution because it would reduce size
of main (master) Makefile, and not be much larger than this solution.
 
git-gui/, Documentation/, perl/ all have their own makefiles, which do

I was thinking here about this large block of .PHONY declarations,

Please at least describe this change in commit message.

But I think it could be moved to other patch, or put in separate patch.


What is the reason of this phony .PHONY?  If gitweb.cgi is newer than
gitweb.perl (and other sources), then without .PHONY it wouldn't be 
regenerated.  With .PHONY it would call master Makefile... which would
notice that gitweb.cgi is newer than gitweb.perl and do not regenerate.
 
So what this .PHONY does is unnecessary call make on master Makefile...



But I guess this issue would be moot if it was the other way around,
i.e. master Makefile calling gitweb/Makefile.

-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Saturday, December 19, 2009 - 6:32 am

From: John 'Warthog9' Hawley <warthog9@kernel.org>

This commit adjust the main Makefile so you can simply run

     make gitweb

which in turn calls gitweb/Makefile.  This means that in order to
generate gitweb, you can simply run 'make' from gitweb subdirectory:

     cd gitweb
     make

Targets gitweb/gitweb.cgi and (dependent on JSMIN being defined)
gitweb/gitweb.min.js in main Makefile are preserved for backward
compatibility.

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This implements separate Makefile for gitweb, with main Makefile calling
it, like for gitk-git/, git-gui/, Documentation/, t/, and templates/
directories.

It is marked as RFC because I don't feel that my make-fu is strong enough
to be sure that there are no errors / mistakes.  Very slightly tested.

 Makefile        |   64 +++++----------------------
 gitweb/Makefile |  129 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+), 52 deletions(-)
 create mode 100644 gitweb/Makefile

diff --git a/Makefile b/Makefile
index 4a1e5bc..50d815e 100644
--- a/Makefile
+++ b/Makefile
@@ -280,29 +280,6 @@ pathsep = :
 # JavaScript minifier invocation that can function as filter
 JSMIN =
 
-# default configuration for gitweb
-GITWEB_CONFIG = gitweb_config.perl
-GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
-GITWEB_HOME_LINK_STR = projects
-GITWEB_SITENAME =
-GITWEB_PROJECTROOT = /pub/git
-GITWEB_PROJECT_MAXDEPTH = 2007
-GITWEB_EXPORT_OK =
-GITWEB_STRICT_EXPORT =
-GITWEB_BASE_URL =
-GITWEB_LIST =
-GITWEB_HOMETEXT = indextext.html
-GITWEB_CSS = gitweb.css
-GITWEB_LOGO = git-logo.png
-GITWEB_FAVICON = git-favicon.png
-ifdef JSMIN
-GITWEB_JS = gitweb.min.js
-else
-GITWEB_JS = gitweb.js
-endif
-GITWEB_SITE_HEADER =
-GITWEB_SITE_FOOTER =
-
 export prefix bindir sharedir sysconfdir
 
 CC = gcc
@@ -1509,6 +1486,11 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
 	chmod +x $@+ && \
 	mv $@+ ...
From: Johannes Schindelin
Date: Friday, December 11, 2009 - 5:52 am

Hi,


Nice.

I forgot to mention in my comments to 2/6 that you seem to wrap after more 
than 80 characters.  However, I have no idea what the suggested line width 
is for gitweb.

Again, this could be done by having the variable defined as undef.

Maybe it would be even nicer if the administrator could specify the 
protocol, e.g. when they do not want/cannot allow git:// but only http:// 
access to the repositories?

Ciao,
Dscho

--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 6:44 am

The problem I had and have with this patch is the duplication of data:
$gitlinkurl contains subset of information in @git_base_url_list,
which in turn is filled from GITWEB_BASE_URL build config variable.

I can understand that for performance reason you don't want to check
$projectroot/$project/cloneurl nor gitweb.url config variable for
each and every displayed project; if the link to repository (for git)
cannot be derived from project path (repository path), then simply

s/shwon/shown/

I'd say that 'Full URL is "$gitlinkurl/$project"' instead of last
sentence in above comment.


Why not

our $gitlinkurl_base = "++GITWEB_BASE_URL++";


Does it even pass tests?

 		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
 		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
 		      ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
+     		      ($gitlinkurl_base ? 
+		       " | " . $cgi->a({-href=>"$gitlinkurl_base/$pr->{'path'}", "git") : '') .
 		      "</td>\n" .
 		      "</tr>\n";
 	}

Changes made: 
* Instead of using separate if conditional statement and print
  statement (note that you forgot to change '.' to ';' to end
  statement) use ternary conditional operator "?:"
* Make $gitlinkurl_base include "git://" protocol specifier
* Do not create "git" link if $gitlinkurl_base is false, which means
  undef, empty string '' and 0 (but 0 is not very likely to be base
  for "git" link).
* Do not use esc_html on fragment of URL.  The CGI.pm should escape
  attributes itself.  If it was HTTP link, one should perhaps esc_url
  on whole link, but esc_html is for escaping HTML.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Jakub Narebski
Date: Friday, December 18, 2009 - 2:02 pm

From: John 'Warthog9' Hawley <warthog9@kernel.org>

This adds a "git" link for each project in the project list page,
should a common $gitlinkurl_base be defined and not empty.  The full
URL of each link is composed of $gitlinkurl_base and project name.
It is intended for git:// links, and in fact GITWEB_BASE_URL build
variable is used as its default value only if it starts with git://

This does make the assumption that the git repositories share a common
path.  Nothing to date is known to actually make use of introduced
link.

Created "git" link follows rel=vcs-* microformat specification:
  http://kitenet.net/~joey/rfc/rel-vcs/

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
I think it might be good idea... but for the fact "Nothing to date is
known to actually make use of introduced link".  What's its intended
use?

Differences to original version by John 'Warthog9' Hawley (J.H.):
* It doesn't cause syntax error ;-)
* Escaping of attribute value is left to CGI.pm
* $gitlinkurl got renamed to $gitlinkurl_base, now includes git://
  prefix, and defaults to GITWEB_BASE_URL if it begins with git://
* Added description to gitweb/README
* Uses rel=vcs-* microformat by Joey Hess

I assume that nobody sane would define $gitlinkurl_base to "0"...

 gitweb/README      |    4 ++++
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index 608b0f8..36fb059 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -71,6 +71,7 @@ You can specify the following configuration variables when building GIT:
  * GITWEB_BASE_URL
    Git base URLs used for URL to where fetch project from, i.e. full
    URL is "$git_base_url/$project".  Shown on projects summary page.
+   If it begins with "git://" it is also used for $gitlinkurl_base, see below.
    Repository URL for project can be also configured per repository; this
    takes precedence over URLs ...
From: Jakub Narebski
Date: Friday, December 11, 2009 - 3:52 am

First, why one would want to require that gitweb version (version at
the time of build) and runtime git version (version of git used to run
commands) match?

Second, it is mismatch, not missmatch (one 's', not double 's').

Third, in my opinion it would be better to name variable in question
e.g. $versions_must_match and also flip its meaning (true means check


First, 'undef' is false, so it could have been written as

+our $missmatch_git;

Or if you prefer explicit false-ish value as default, 0 would be I
think better than empty string '':

+our $missmatch_git = 0;


Second, there is question whether default should be to allow
mismatched versions (current behaviour, more lenient...) or deny (or
warn about) mismatched version, i.e. should it be $versions_must_match

Could you please clean up language in above comment?  It is very
convoluted.  Please also limit line width of above comment to 76 / 80

Style

+if (!$allow_versions_mismatch &&
+    $git_version ne $version) {

Do not compare $missmatch_git / $allow_versions_mismatch against '':

Shouldn't this be "500 Internal Server Error" or something (using the


-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Jakub Narebski
Date: Friday, December 18, 2009 - 12:18 pm

From: John 'Warthog9' Hawley <warthog9@kernel.org>

This adds $git_versions_must_match variable, which is set to true
value checks that we are running on the same version of git that we
shipped with, and if not throw '500 Internal Server Error' error.
What is checked is the version of gitweb (embedded in building
gitweb.cgi), against version of runtime git binary used.

Gitweb can usually run with a mismatched git install.  This is more
here to give an obvious warning as to whats going on vs. silently
failing.

By default this feature is turned off.

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
I don't quite see the reason behind such option, and I think that
error (instead of for example warning) on version mismatch is too much.

This is an RFC because formatting of error page is a bit rough, and
(ab)uses exist CSS classes instead of creating new classnames for
semantic markup.

Differences from original version, by J.H.:
* Changed name and flipped meaning of config variable, from
  $missmatch_git to $git_versions_must_match
* $git_versions_must_match is boolean flag - do not compare with an
  empty string.
* Changed error message a bit, fixed style, added entry in README

 gitweb/README      |    3 +++
 gitweb/gitweb.perl |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index 6c2c8e1..608b0f8 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -233,6 +233,9 @@ not include variables usually directly set during build):
    If server load exceed this value then return "503 Service Unavaliable" error.
    Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
    undefined value to turn it off.  The default is 300.
+ * $git_versions_must_match
+   If set, gitweb fails with 500 Internal Server Error if the version of gitweb
+   doesn't match version of git binary.  The default is false.
 
 
 ...
From: Johannes Schindelin
Date: Friday, December 11, 2009 - 5:49 am

Hi,


I'm not a native English speaker and all, but I thought it was spelt 
'mismatch', i.e. with only one 's'.  Maybe even name it 
'allow_different_git_version' or 'no_strict_git_version'.

A few comments on the patch: the style of the if() statement disagrees 
with the other ones; please use the same style.

Also, as with 1/6, turning off the feature might be better done by setting 
it to undef.

Finally, would it not be nicer if the warning really was only a warning, 
i.e. that the script would try to continue after giving the users a pretty 
warning header?

Ciao,
Dscho
--

From: John 'Warthog9' Hawley
Date: Thursday, December 10, 2009 - 4:45 pm

This changes the behavior, slightly, of gitweb so that it verifies
that the box isn't inundated with before attempting to serve gitweb.
If the box is overloaded, it basically returns a 503 server unavailable
until the load falls below the defined threshold.  This helps dramatically
if you have a box that's I/O bound, reaches a certain load and you
don't want gitweb, the I/O hog that it is, increasing the pain the
server is already undergoing.

adds $maxload configuration variable.  Default is a load of 300,
which for most cases should never be hit.

Please note this makes the assumption that /proc/loadavg exists
as there is no good way to read load averages on a great number of
platforms [READ: Windows], or that it's reasonably accurate.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

From: Sverre Rabbelier
Date: Thursday, December 10, 2009 - 4:54 pm

Heya,

On Fri, Dec 11, 2009 at 00:45, John 'Warthog9' Hawley

Also, what happened to including patches inline for ease of review?

-- 
Cheers,

Sverre Rabbelier
--

From: Jakub Narebski
Date: Thursday, December 10, 2009 - 5:52 pm

Your patch doesn't allow for *turning off* this feature.  Reasonable
solution would be to use 'undef' or negative number to turn off this

What about MacOS X, or FreeBSD, or OpenSolaris?

You should mention that it is intended that if gitweb cannot read load
average (for example /proc/loadavg does not exist), then the feature

Why signoff is different from author (warthog9@kernel.org)?  Why this


I'd probably say:

+# Used to set the maximum load that we will still respond to gitweb queries.
+# If server load exceed this value then return "503 server busy" error,

Why not use one of existing CPAN modules: Sys::Info::Device::CPU,
BSD::getloadavg, Sys::CpuLoad?

Style:

+    open (my $load, '<', '/proc/loadavg') or return 0;

and of course no "my $load" at beginning.  Also perhaps $fh, or

Why not use die_error subroutine?  Is it to have generate absolutely
minimal load, and that is why you do not use die_error(), or even
$cgi->header()?

Wouldn't a better solution be to use here-doc syntax?

+    print <<'EOF';
+Content-Type: text/plain; charset=utf-8
+Status: 503 Excessive load on server
+
+The load average on the server is too high
+EOF

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Junio C Hamano
Date: Thursday, December 10, 2009 - 6:10 pm

I would prefer to hear something along the lines of...

    I like this.  Here is a follow-up patch you can squash in to
    support other platforms.

I gave the patches a cursory look (I somehow didn't see 5/6, though) and
they all looked decently done, except that some of the lines were
excessively long.
--

From: J.H.
Date: Thursday, December 10, 2009 - 7:19 pm

Well there's the opposite argument that setting the number arbitrarily 
high, 4096 for instance would also in essence negate this (though I'll 
admit I've reached and exceeded those numbers before)

That said I agree, being able to turn this off needs to be added and 



My bad, did the patches up on my laptop but had to send them out from 
kernel.org, thus the miss-match: I.E. user error.


Here's the fundamental problem:

Sys:Info:Device:CPU
	Windows:
		Using this method under Windows is not recommended
		since, the WMI interface will possibly take at least 2
		seconds to complete the request.

BSD::getloadavg
	While this more or less supports anything with a libc getloadavg
	(and thus might be the best one I've seen, I'll admit I didn't
	notice this one when I looked years ago) getting it to work on
	windows looks, exciting.

Sys::CpuLoad:
	http://cpansearch.perl.org/src/CLINTDW/Sys-CpuLoad-0.03/README
	Specifically:
		- Currently FreeBSD and OpenBSD are supported.
		- Wanted: HPUX 11.11 ...
		- Todo: Win32 support

	So this doesn't really buy me anything but, maybe, BSD support.
	
So at the end of the day, none of those really gets me a "useful" cross 
platform load checker (though like I said BSD::getloadavg looks to be 
the best of the ones you mentioned) and more or less Windows is going to 
lose this as a usable feature no matter what.

I think I'd almost rather set this up so that if it can't get something 
useful (I.E. /proc/loadavg is missing) it just skips past it as if the 
load was 0.

I might try out the BSD::getloadavg but I want to take a look and see if 
that's easily installed or not, if it's not it might be difficult to 
justify that as a dependency.


It was intended to be the most minimal possible, mainly get in, get out. 
  Also not sure the die_error existed in gitweb when this was originally 
written.  Probably worth switching to it now since it's there either 
way, and I don't think using it would add enough overhead to ...
From: Junio C Hamano
Date: Thursday, December 10, 2009 - 7:50 pm

Thanks; all sounded a reasonable response to the review.  Are you
re-rolling the series anytime soon (I am asking because then I'd rather
not to queue this round especially because I didn't see 5/6).
--

From: J.H.
Date: Thursday, December 10, 2009 - 7:58 pm

I'll probably have some changes up and about tomorrow, and it's a little 
troubling that 5/6 didn't come through for you

6 at least made it to marc.info: 
http://marc.info/?l=git&m=126048884825985&w=2

and 5 seems to have been eaten by a grue somewhere.  It was a *big* 
patch mainly because all the caching flips over in a single go.  If you 
want I can privately bounce 5 & 6 to you so you have a complete tree 
right now?

- John 'Warthog9' Hawley
--

From: J.H.
Date: Thursday, December 10, 2009 - 8:07 pm

Not to reply to myself but this might also be helpful:

http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v2

- John 'Warthog9' Hawley
--

From: Junio C Hamano
Date: Thursday, December 10, 2009 - 8:09 pm

Thanks, but not interested, in the sense that it wouldn't make much sense
to me to have a version tonight that is known to go stale within a few
days.  I only pick up and queue patches to 'pu' to save me from later
trouble of finding them from the mailing list backlog, and not to actively
review and engage in the discussion to polish them right now.  We are
already deep in pre-release freeze and my attention is not currently on
anything that won't go in to the upcoming release.

I want to have a solid 1.6.6 before the holidays as a present to all ;-).
--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 3:09 am

Simplest solution would be to used 'undef' (undefined value) for 
"turned off", i.e.:


I think it would be better to write in commit message that because finding
load average is OS dependent, there is provided (sample) solution which
uses /proc/loadavg and works (at least) on Linux.  And that for platforms
which do not have /proc/loadavg the feature is simply turned off (by the

See above proposal.  This information should be present in commit message,

After thinking about this a bit, now I don't think that it is terribly
important.  You *might* describe alternate approaches (roads not taken)
in commit message, but requiring /proc/loadavg for the feature to work

Well, if you are not worring excessively about overhead, then I think
using die_error would be the best solution, as it would preserve look
of gitweb.  It would require extending die_error by 503 response, or
rather %http_responses hash and comment above die_error.

Also I think that Status: should be before Content-Type: header (but
probably it is not required by the standard).

-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Friday, December 18, 2009 - 9:36 am

From: John 'Warthog9' Hawley <warthog9@kernel.org>

This changes slightly the behavior of gitweb, so that it verifies
that the box isn't inundated with before attempting to serve gitweb.
If the box is overloaded, it basically returns a 503 Server Unavailable
until the load falls below the defined threshold.  This helps dramatically
if you have a box that's I/O bound, reaches a certain load and you
don't want gitweb, the I/O hog that it is, increasing the pain the
server is already undergoing.

This behavior is controlled by $maxload configuration variable.
Default is a load of 300, which for most cases should never be hit.
Unset it (set it to undefined value, i.e. undef) to turn off checking.

Currently it requires that '/proc/loadavg' file exists, otherwise the
load check is bypassed (load is taken to be 0).  So platforms that do
not implement '/proc/loadavg' currently cannot use this feature.

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is my take on this patch, with all my concerns taken into 
consideration... well, all except describing alterante approaches
to straight using /proc/loadavg.

Differences to original version by John 'Warthog9' Hawley (J.H.):
* Slightly improved wording in commit message and in comments
* $maxload described in gitweb/README, in "Gitweb config file variables"
  section
* You can use '$maxload = undef;' to turn off load checking
* Error page for too high load is generated using die_error, which had
  to be extended to handle 503 Service Unavailable HTTP error code

 gitweb/README      |    7 ++++++-
 gitweb/gitweb.perl |   39 +++++++++++++++++++++++++++++++++++----
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index e34ee79..6c2c8e1 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -174,7 +174,7 @@ not include variables usually directly set during build):
    Base URL for relative URLs in pages generated by ...
From: Mihamina Rakotomandimby
Date: Friday, December 11, 2009 - 6:53 am

What about systems not having /proc/loadavg

-- 
       Architecte Informatique chez Blueline/Gulfsat:
    Administration Systeme, Recherche & Developpement
                +261 34 29 155 34 / +261 33 11 207 36
--

From: Sverre Rabbelier
Date: Thursday, December 10, 2009 - 4:53 pm

Heya,

On Fri, Dec 11, 2009 at 00:45, John 'Warthog9' Hawley

I'd prefer not to be shouted at, how about s/GITWEB/gitweb: /g ? :)

-- 
Cheers,

Sverre Rabbelier
--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 8:51 am

This patch didn't made it to git mailing list.  I suspect that you ran
afoul vger anti-SPAM filter.

Does this "File based caching layer" have anything common with GSoC

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Jakub Narebski
Date: Friday, December 11, 2009 - 11:01 am

First, why do you reinvent the wheel instead of using one of existing
caching interfaces like CHI or Cache::Cache (perhaps creating a custom
backend or middle layer which incorporates required features, like being
load-aware)?  This way changing from file-based cache to e.g. mmap based
one or to memcached would be very simple.  And you would avoid pitfals
in doing your own cache management.  perl-Cache-Cache should be available
package in extras repositories.

If module is no available this would simply mean no caching, like in many
(or not so many) other cases with optional features in gitweb.


Second, if you can't use CGI::Cache directly, you can always steal the
idea from it, then the change to gitweb itself would be minimal:

  "Internally, the CGI::Cache module ties the output file descriptor
  (usually STDOUT) to an internal variable to which all output is saved."


P.S. I'll postpone critique of the patch itself for now.  The above issues
are much more important.

-- 
Jakub Narebski
Poland
--

From: J.H.
Date: Friday, December 11, 2009 - 11:26 am

Well for starters this isn't exactly a reinvention of the wheel, and 
this isn't something "new" per-se.  This code has been actively running 
on git.kernel.org for something like 3 - 4 years so there's something to 
be said for the devil we know and understand.  As well using the other 
caching strategies involves adding dramatically more complex 
interactions with caching layer.  The caching layer is actually quite 
specific to how git + gitweb works and solves more than just "caching" 
on the surface.  Specifically it solves the stampeding herd problem 
which would have to be solved either way even if I didn't implement my 
own caching, and since I had to do that caching was barely a step beyond 

True but these are *VERY* different caching strategies than the one I've 
got here, yes it's using files as a backend but it's doing so with 
specific goals in mind.  As I've said I plan to integrate Lea's 
memcached based caching into this in the future and that has different 
advantages and disadvantages.

At the end of the day the "normal" caching engines aren't as efficient 
as mine and there is the case the very high performance sites are going 
to have to investigate a number of different solutions to see what works 
best for them.  Mine is also *dramatically* simpler to setup as well, 

There's pitfalls if I do it myself, or I use one of the other "common" 
perl modules.  I did it this way years ago, I've maintained it and it 
works pretty well.  I won't admit that it's the smartest caching engine 
on the planet, far from it, but it has evolved specifically for gitweb 
and that itself saves me a lot of pitfalls from cache engine + gitweb 

Yes, but as can be seen from how you enable various other caching 
engines the setup of those is non-trivial, this is and either way 
caching *HAS* to be explicitly turned on by the admin/user since they 
are going to have to do *some* configuration, or at least be aware that 

I thought about that 3 years ago, and decided it wasn't a ...
From: Jakub Narebski
Date: Friday, December 11, 2009 - 6:37 pm

It _might_ be caused by the fact that you used attachement.  But it might

The question would be then whether it makes sense to have two caches at
different levels in the stack (see also discussion below about Lea 

Well, if it is not reinventing the wheel, then it is yak shaving (yet
another ...) ;-)

The fact that the code was used and tested at one single installation
doesn't mean that it doesn't have warts that could be avoided by using

I am hoping that if it was done right, by using CHI or Cache::Cache
caching interface, then choosing alternate caching engine, or adding
extra level of cache would be simple and decoupled from issues specific

The idea is for gitweb/cache.pm (or gitweb/Gitweb-Cache.pm, or 
gitweb/Gitweb/Cache.pm) is to encapsulate issues specific to gitweb,
like generating cache key, or printing "Generating...", etc.

Perhaps also the idea of filling cache in background (but see discussion

CHI tries to solve cache miss stampedes via expires_variance mechanism.
There is Cache::Adaptive (and its subclass Cache::Adaptive::ByLoad)
which does adaptive lifetime control (it accepts any Cache::Cache 
compatible cache, so I think it also accepts CHI compatible cache).

Errr... besides using Cache::Cache compatible cache (see!!!), for example
Cache::Memcached, Lea Wiemann's gitweb caching did caching at entirely
different level than original kernel.org's gitweb.

The stack for gitweb looks somewhat like this:

  git commands output       open my $fd, '-|, git_cmd(), ...
          |
          v
  parsed output data        parse_ls_tree_line, parse_commit, ...
          |
          v
  generated HTML etc.       print ...
          :
          V
       caching              optional
    reverse proxy           Varnish, Squid

If I understand correctly Lea Wiemann code cache git command output.
The fork of gitweb used at repo.or.cz does caching of parsed data at
least for most intensive projects list page.  This patch was about caching
generated output.  ...
Previous thread: [PATCH] git gui: Use git diff --submodule when available by Jens Lehmann on Thursday, December 10, 2009 - 3:44 pm. (2 messages)

Next thread: How to selectively recreate merge state? by Jay Soffian on Thursday, December 10, 2009 - 4:56 pm. (18 messages)