This is preliminary (proof of concept) version of shortened series
intended as replacement (rewrite) of "Gitweb caching v8" series from
John 'Warthog9' Hawley (J.H.).
This series shows how one can manage exception handling using
die_error like die, even in the presence of output caching. The
output caching engine has an option that allows to turn off (default)
or on caching of error pages.
This series is unfinished; it does not include adaptive cache
lifetime, nor support for other caching engines than the one provided
(like Cache::Cache or CHI), nor does it support background cache
generation or progress info indicator.
This is just intended as proof of concept.
---
Jakub Narebski (9):
gitweb: Add optional output caching
gitweb/lib - Cache captured output (using compute_fh)
gitweb/lib - Very simple file based cache
gitweb/lib - Simple output capture by redirecting STDOUT to file
t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
gitweb: Prepare for splitting gitweb
gitweb: Introduce %actions_info, gathering information about actions
gitweb: use eval + die for error (exception) handling
gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
gitweb/Makefile | 22 +
gitweb/README | 46 ++
gitweb/gitweb.perl | 280 +++++++++++++--
gitweb/lib/GitwebCache/CacheOutput.pm | 84 ++++
gitweb/lib/GitwebCache/Capture/ToFile.pm | 109 ++++++
gitweb/lib/GitwebCache/FileCacheWithLocking.pm | 452 ++++++++++++++++++++++++
t/gitweb-lib.sh | 11 +
t/t9500-gitweb-standalone-no-errors.sh | 20 +
t/t9501-gitweb-standalone-http-status.sh | 13 +
t/t9502-gitweb-standalone-parse-output.sh | 33 ++
t/t9510-gitweb-capture-interface.sh | 34 ++
t/t9510/test_capture_interface.pl | 132 +++++++
...End the request after die_error finishes, rather than exiting gitweb
instance (perhaps wrapped like in ModPerl::Registry or gitweb.psgi
case).
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
gitweb/gitweb.perl | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4779618..724287b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1169,6 +1169,7 @@ sub run {
run_request();
+ DONE_REQUEST:
$post_dispatch_hook->()
if $post_dispatch_hook;
$first_request = 0;
@@ -3767,7 +3768,7 @@ EOF
print "</div>\n";
git_footer_html();
- goto DONE_GITWEB
+ goto DONE_REQUEST
unless ($opts{'-error_handler'});
}
--
[side note: the "@@ EOF" line above would say "@@ sub die_error {" if
This seems to remove the last user of the DONE_GITWEB label. Why not
delete the label, too?
When die_error is called by CGI::Carp (via handle_errors_html), it
does not rearm the error handler afaict. Previously that did not
matter because die_error kills gitweb; now should it be set up
again?
die_error gets called when server load is too high; I wonder whether
it is right to go back for another request in that case.
A broken per-request (or other) configuration could potentially leave
a gitweb process in a broken state, and until now the state would be
reset on the first error. I wonder if escape valve would be needed
--- e.g., does the CGI harness take care of starting a new gitweb
process after every couple hundred requests or so?
Aside from those (minor) worries, this patch seems like a good idea.
--
Hmmm, I thought that git has Perl-specific diff driver (xfuncname), but
I see that it doesn't. The default funcname works quite well for Perl
code... with exception of here-documents (or rather their ending).
Well, actually this patch is in this series only for the label ;-)
Anyway, I can simply drop this patch, and have next one in series
(adding exception-based error handling, making die_error work like
Thanks, I missed this (but after examining it turns out to be a
non-issue). That will teach me to leave code outside of run()
subroutine; one of reasons behind creating c2394fe (gitweb: Put all
per-connection code in run() subroutine, 2010-05-07) was to clarify
code flow.
A note: using set_message inside handle_errors_html was necessary
because if there was a fatal error in die_error, then
handle_errors_html would be called recursively - this was fixed in
CGI.pm 3.45, but we cannot rely on this; we cannot rely on having new
enough version of CGI::Carp that supports set_die_handler either.
But actually handle_errors_html gets called only from fatalsToBrowser,
which in turn gets called from CGI::Carp::die... which ends calling
CODE::die (aka realdie), which ends CGI process anyway.
That is why die_error ends with
goto DONE_GITWEB
unless ($opts{'-error_handler'});
i.e. it doesn't goto DONE_GITWEB nor DONE_REQUEST if called from
If client (web browser) are requesting connection, we have to tell it
something anyway. Note that each request might serve different client.
But when the die_error(503, "The load average on the server is too
'die $@ if $@' would call CORE::die, which means it would end gitweb
process.
For CGI server it doesn't matter anyway, as for each request the process
is respawned anyway (together with respawning Perl interpreter), and I
think that ModPerl::Registry and FastCGI servers monitor process that it
Thanks a lot for your comments.
--
Jakub Narebski
Poland
--
The default function name discovery already works quite well for Perl
code... with the exception of here-documents (or rather their ending).
sub foo {
print <<END
here-document
END
return 1;
}
The default funcname pattern treats the unindented END line as a
function declaration and puts it in the @@ line of diff and "grep
--show-function" output.
With a little knowledge of perl syntax, we can do better. You can
try it out by adding "*.perl diff=perl" to the gitattributes file.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Maybe something like this?
Documentation/gitattributes.txt | 2 ++
t/t4018-diff-funcname.sh | 2 +-
userdiff.c | 15 +++++++++++++++
3 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 5a7f936..e59b878 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -494,6 +494,8 @@ patterns are available:
- `pascal` suitable for source code in the Pascal/Delphi language.
+- `perl` suitable for source code in the Perl language.
+
- `php` suitable for source code in the PHP language.
- `python` suitable for source code in the Python language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 0a61b57..3646930 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -32,7 +32,7 @@ EOF
sed 's/beer\\/beer,\\/' < Beer.java > Beer-correct.java
-builtin_patterns="bibtex cpp csharp fortran html java objc pascal php python ruby tex"
+builtin_patterns="bibtex cpp csharp fortran html java objc pascal perl php python ruby tex"
for p in $builtin_patterns
do
test_expect_success "builtin $p pattern compiles" '
diff --git a/userdiff.c b/userdiff.c
index 2d54536..fc2afe3 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -61,6 +61,21 @@ PATTERNS("pascal",
"|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
"|<>|<=|>=|:=|\\.\\."
...Hmm. Should it be |([\x80-\xff]+[\x00-\x7f]) then, to match exactly one multibyte UTF-8 character? --
Yes, that's what I was thinking of. v2 will be a two-part series
starting with that.
BTW, the perl token matcher is pretty half-hearted. In part this is
because "only perl can parse perl" [1] terrifies me and in part it is
because I am too lazy to write down the state machine implied by
PPI/Token/*.pm.
If some tokenization wizard would like to work on it, something like
the following might produce more pleasant word diffs:
"[%&$][[:space:]]*[0-9]+" /* $1 */
"|[%&$][[:space:]]*([[:alpha:]_']|::)([[:alnum:]_']|::)*" /* $var1 */
"|[%&$][[:space:]]*\\$([[:alnum:]_]|::)([[:alnum:]_']|::)*" /* $$var1 */
"|[%&$][[:space:]]*\\$\\{" /* $${ introducing complicated expression */
"|[%&$][[:space:]]*\\$\\$" /* $$$ introducing complicated expression */
"|[%&$][[:space:]]*[^[:alnum:]_:'^$]" /* $! */
"|[%&$][[:space:]]*\\^[][A-Z\\^_?]" /* $^A */
"|[%&$][[:space:]]*\\{\\^[][A-Z\\^_?]\\}" /* ${^A} */
"|[%&$][[:space:]]*\\{\\^[][A-Z\\^_?][[:alnum:]_]*\\}" /* ${^Foo} */
/* ${var} */
"|[%&$][[:space:]]*\\{[[:space:]]*([[:alpha:]_']|::)[[:alnum:]_:]*[[:space:]]\\}"
"|[%&$][[:space:]]*\\{" /* ${ introducing complicated expression */
...
though it is an unmaintainable mess. :)
[1] perl::toke.c and http://www.perlmonks.org/?node_id=44722
--
Thanks a lot.
Besides here-doc, there are some tricky things that such code should
be aware about.
1. BEGIN {
...
}
and similar code blocks (END, CHECK, INIT, ...) which I think should
be marked as 'BEGIN' in diff chunk.
2. sub foo {
FOO: while (1) {
...
}
}
which should be marked with 'sub foo {', I think
3. =head1 NAME
Git - Perl interface to the Git version control system
=cut
i.e. POD... which I don't know what to do about.
I have not checked what your code does wrt those.
--
Jakub Narebski
Poland
--
I do not think Jonathan's patterns would be fooled by this; it wants to
catch only "package <anything>;" and "sub <anything> {".
Jonathan's pattern set allows them to be indented, and followed by some
garbage at the end., which we might want to tighten. How many people
start 'package' and the outermost 'sub' indented?
userdiff.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/userdiff.c b/userdiff.c
index fc2afe3..79569c4 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -62,8 +62,10 @@ PATTERNS("pascal",
"|<>|<=|>=|:=|\\.\\."
"|[^[:space:]]|[\x80-\xff]+"),
PATTERNS("perl",
- "^[ \t]*package .*;\n"
- "^[ \t]*sub .* \\{",
+ "^package .*;\n"
+ "^sub .* \\{\n"
+ "^[A-Z]+ \\{\n" /* BEGIN, END, ... */
+ "^=head[0-9] ", /* POD */
/* -- */
"[[:alpha:]_'][[:alnum:]_']*"
"|0[xb]?[0-9a-fA-F_]*"
--
Note that in future Perl 5.14 there would be 'package NAME {' form,
so perhaps it would be better to future-proof and use
Using "sub foo {" is just a recommended programming convention (like e.g.
GNU convention or K&R convention for C code). I think it would be better
to relax it a bit, either
+ "^sub "
or
--
Jakub Narebski
Poland
--
FWIW, I sometimes do:
{
my $static;
sub foo {
... use $static ...
}
}
so perhaps allowing whitespace in front of the keyword is worthwhile
there. I have never indented "package", though.
-Peff
--
I like the current order (first the brief patch to change the
semantics, then the more ambitious change to an eval {} based error
Right, I should have thought a few seconds more. Respawning
Sorry, that was unclear of me. I meant that buggy configuration could
leave a gitweb process in buggy but alive state and frequent failing
requests might be a way to notice that. Contrived example (just to
illustrate what I mean):
our $version .= ".custom";
if (length $version >= 1000) { # untested, buggy code goes here.
@diff_opts = ("--nonsense");
}
I think I was not right to worry about this, either. It is better to
make such unusual and buggy configurations as noticeable as possible
so they can be fixed.
Thanks for a thorough explanation. For what it's worth, with or
without removal of the DONE_GITWEB: label,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
[1] I can imagine scenarios in which exiting gitweb would help
alleviate the load, involving:
- large memory footprint for each gitweb process forcing the system
into swapping (e.g., from a memory leak), or
- FastCGI-like server noticing the load and choosing to decrease the
number of gitweb instances.
In the usual case, presumably gitweb memory footprint is small and
FastCGI-like servers limit the number of gitweb instances to a modest
fixed number.
--
I assume that CGI / FastCGI / mod_perl (+ ModPerl::Registry) web server
would know how to regulate number of workers according to the server
I'm sorry I haven't made myself clear.
What I meant here is that gitweb includes the following code
if (-e $GITWEB_CONFIG) {
do $GITWEB_CONFIG;
die $@ if $@;
}
which means that CGI::Carp::die is called, which might call
handle_errors_html, and which ends in CORE::die, which ends gitweb
process. So if there is no way for broken configuration to leave
gitweb in a rboken state _at this point in series_.
Thank you for thinking about this, because it could cause problems
(could because I have not checked if it does or if it doesn't) in the
following patch, when gitweb uses eval / die for error handling.
Then it might happen when $per_request_config is false or CODE that
instead of trying to reread broken config on subsequent requests, we
will run with broken config. It depends if "die"-ing in
evaluate_gitweb_config would prevent setting $first_request to false.
Thanks.
--
Jakub Narebski
Poland
--
In gitweb code it is assumed that calling die_error() subroutine would end request, just like running "die" would. Up till now it was done by having die_error() jump to DONE_REQUEST (earlier DONE_GITWEB), or in earlier version just 'exit' (for mod_perl via ModPerl::Registry it ends request instead of exiting worker). Instead of using 'goto DONE_REQUEST' for longjmp-like nonlocal jump, or using 'exit', gitweb uses now native for Perl exception handlingin the form of eval / die pair ("eval BLOCK" to trap exceptions, "die LIST" to raise/throw them). Up till now the "goto DONE_REQUEST" solution was enough, but with the coming output caching support and it adding modular structure to gitweb, it would be difficult to continue to keep using this solution (e.g. interaction with capturing output). Because gitweb now traps all exceptions occuring run_request(), the handle_errors_html handler (set via set_message from CGI::Carp) is not needed; gitweb can call die_error in -error_handler mode itself. This has the advantage that we can now set correct HTTP header (handler from CGI::Carp::set_message was run after HTTP headers were already sent). Gitweb assumes here that exceptions thrown by Perl would be simple strings; die_error() throws hash reference (if not for minimal extrenal dependencies, it would be probable object of Class::Exception or Throwable class thrown). Note: in newer versions of CGI::Carp there is set_die_handler(), where handler have to set HTTP headers to the browser itself, but we cannot rely on new enough CGI::Carp version to have been installed. Also set_die_handler interferes with fatalsToBrowser. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/gitweb.perl | 26 ++++++++------------------ 1 files changed, 8 insertions(+), 18 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 724287b..c7a1892 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -12,7 +12,7 @@ use strict; use warnings; use CGI ...
The !ref($@) seems overzealous, which is why I am wondering if it Is the DONE_REQUEST label still needed? Thanks, I am happy to see the semantics becoming less thorny. Jonathan --
First, 'gitweb: Prepare for splitting gitweb' commit is only later in series... ;-) but that of course is not a serious issue. Second, more important is that I'd rather gitweb doesn't go "reinvent the wheel" route. I'd rather (re)use Exception::Class (like e.g. SVN::Web does it) if we go the OO exception handling route. But if we are going to use Exception::Class, then we can also use You meant Scalar::Util::blessed here, isn't it? Fortunately Scalar::Util is core Perl module. By 'overzealous' do you mean here possibility of catching what we shouldn't, i.e. non-gitweb error (not thrown by die_error)? We can narrow it to "ref($@) eq 'HASH'", but I don't think it would be ever Now I should check if this doesn't affect gitweb performance too badly. IIRC I have chosen 'goto DONE_GITWEB' because I didn't know about ModPerl::Registry redefining 'exit' (why it was done), and because of some microbenchmark showing that it performs better than die/eval (why this specific solution)... But I think that the performance hit would be negligible in practice; making gitweb more maintainable is I think worth the cost. -- Jakub Narebski Poland --
Unify error handling by treating errors from Perl (and thrown early in
process using 'die STRING'), and errors from gitweb in the same way.
This means that in both cases error page is generated after an error
is caught in run() subroutine.
die_error() subroutine is now split into three: gen_error() which
massages parameters (escaping HTML, turning HTTP status number into
full HTTP status code), die_error() which uses gen_error() and just
throws an error (and does not generate an error page), and
send_error() which catually generate error page based on provided
error / exception.
Sidenote: probably in the future instead of using simple hash for
throwing gitweb exception, gitweb would use some custom error class,
e.g. derivative of Exception::Class (like SVN::Web does it).
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is sent early to facilitate early comments. It passes test suite,
but it was not extensively tested.
Now die_error() functions mode like 'die'...
gitweb/gitweb.perl | 47 ++++++++++++++++++++++++++++++++++++-----------
1 files changed, 36 insertions(+), 11 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c7a1892..5854f73 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1153,9 +1154,13 @@ sub run {
if $pre_dispatch_hook;
eval { run_request() };
- if (defined $@ && !ref($@)) {
+ my $error = $@;
+ if ($error) {
# some Perl error, but not one thrown by die_error
- die_error(undef, undef, $@, -error_handler => 1);
+ $error = gen_error(undef, undef, $error)
+ unless ref($error);
+
+ send_error($error);
}
DONE_REQUEST:
@@ -3730,11 +3735,14 @@ sub git_footer_html {
# an unknown error occurred (e.g. the git binary died unexpectedly).
# 503: The server is currently unavailable (because it is overloaded,
# or down for maintenance). Generally, this is a temporary state.
-sub die_error {
+
+# gen_error() generates error object from parameters
+# ...Currently it only contains information about output format, and is not
used anywhere. It will be used to check whether current action
produces HTML output, and therefore is displaying HTML-based progress
info about (re)generating cache makes sense.
It can contain information about allowed extra options, whether to
display link to feed (Atom or RSS), etc. in easier and faster way than
listing all matching or all non-matching actions at appropriate place.
Currently not used; will be used in next commit, to check if action
produces HTML output and therefore we can use HTML-specific progress
indicator.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
gitweb/gitweb.perl | 57 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 53 insertions(+), 4 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c7a1892..e50654b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -749,6 +749,54 @@ our %allowed_options = (
"--no-merges" => [ qw(rss atom log shortlog history) ],
);
+# action => {
+# # what is the output format (content-type) of action
+# 'output_format' => ('html' | 'text' | 'feed' | 'binary' | undef),
+# # does action require $project parameter to work
+# 'needs_project' => (boolean | undef),
+# # log-like action, can start with arbitrary ref or revision
+# 'log_like' => (boolean | undef),
+# # has no specific feed, or should lik to OPML / generic project feed
+# 'no_feed' => (boolean | undef),
+# # allowed options to be passed ussing 'opt' parameter
+# 'allowed_options' => { 'option_1' => 1 [, ... ] },
+# }
+our %actions_info = ();
+sub evaluate_actions_info {
+ our %actions_info;
+ our (%actions);
+
+ # unless explicitely stated otherwise, default output format is html
+ # most actions needs $project parameter
+ foreach my $action (keys %actions) {
+ $actions_info{$action}{'output_format'} = 'html';
+ $actions_info{$action}{'needs_project'} = 1;
+ }
+ # list all exceptions; undef ...Add GitwebCache::Capture::ToFile package, which captures output by redirecting STDOUT to given file (specified by filename, or given opened filehandle), earlier saving original STDOUT to restore it when finished capturing. GitwebCache::Capture::ToFile preserves PerlIO layers, both those set before started capturing output, and those set during capture. No care was taken to handle the following special cases (prior to starting capture): closed STDOUT, STDOUT reopened to scalar reference, tied STDOUT. You shouldn't modify STDOUT during capture. Includes separate tests for capturing output in t9510/test_capture_interface.pl which is run as external test from t9510-gitweb-capture-interface.sh. It tests capturing of utf8 data This patch was based on "gitweb: add output buffering and associated functions" patch by John 'Warthog9' Hawley (J.H.) in "Gitweb caching v7" series, and on code of Capture::Tiny by David Golden (Apache License 2.0). Based-on-work-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/lib/GitwebCache/Capture/ToFile.pm | 109 +++++++++++++++++++++++++ t/t9510-gitweb-capture-interface.sh | 34 ++++++++ t/t9510/test_capture_interface.pl | 132 ++++++++++++++++++++++++++++++ 3 files changed, 275 insertions(+), 0 deletions(-) create mode 100644 gitweb/lib/GitwebCache/Capture/ToFile.pm create mode 100755 t/t9510-gitweb-capture-interface.sh create mode 100755 t/t9510/test_capture_interface.pl diff --git a/gitweb/lib/GitwebCache/Capture/ToFile.pm b/gitweb/lib/GitwebCache/Capture/ToFile.pm new file mode 100644 index 0000000..d2dbf0f --- /dev/null +++ b/gitweb/lib/GitwebCache/Capture/ToFile.pm @@ -0,0 +1,109 @@ +# gitweb - simple web interface to track changes in git repositories +# +# (C) 2010, Jakub Narebski <jnareb@gmail.com> +# +# This program is licensed under the GPLv2 + +# +# Simple output capturing via redirecting STDOUT to given file. +# + +# This is the ...
Micronit: if the license of Capture::Tiny were relevant then we would be in trouble, I think. (Apache-2.0 and GPLv2 aren't compatible licenses.) Luckily looks trivial enough. Maybe either avoiding mention of the license or clarifying that that is not intended to be the sole license for the stripped-down code would help? --
Damn, I have thought that Apache-2.0 and GPLv2 are compatibile. This is the only reason that I explicitely mentioned the license (that and it is not usual "licensed like Perl", i.e. dual Artistic Perl License / GPL You are right. I have done similar thing for PerlIO::Util based capture, though I didn't know about the 'binmode($fh, join(":", ":raw", @unique));' trick. So I think we would be in the clear by changing the comment to read: +# see also _relayer in Capture::Tiny by David Golden or something like that. Or we can try to change gitweb license to GPLv3 / AGPLv3, which is compatibile (one way only) with Apache-2.0... just kidding :-) -- Jakub Narebski Poland --
This way we can use it in test scripts written in other languages
(e.g. in Perl), and not rely on "$TEST_DIRECTORY/.."
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
t/test-lib.sh | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 48fa516..c077fa4 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -552,9 +552,9 @@ test_external () {
# Announce the script to reduce confusion about the
# test output that follows.
say_color "" "# run $test_count: $descr ($*)"
- # Export TEST_DIRECTORY, TRASH_DIRECTORY and GIT_TEST_LONG
+ # Export TEST_DIRECTORY, GIT_BUILD_DIR, TRASH_DIRECTORY and GIT_TEST_LONG
# to be able to use them in script
- export TEST_DIRECTORY TRASH_DIRECTORY GIT_TEST_LONG
+ export TEST_DIRECTORY GIT_BUILD_DIR TRASH_DIRECTORY GIT_TEST_LONG
# Run command; redirect its stderr to &4 as in
# test_run_, but keep its stdout on our stdout even in
# non-verbose mode.
--
Prepare gitweb for having been split into modules that are to be installed alongside gitweb in 'lib/' subdirectory, by adding use lib __DIR__.'/lib'; to gitweb.perl (to main gitweb script), and preparing for putting modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile. This preparatory work allows to add new module to gitweb by simply adding GITWEB_MODULES += <module> to gitweb/Makefile (assuming that the module is in 'gitweb/lib/' directory). While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to allow testing installed version of gitweb and installed version of modules (for future tests which would check individual (sub)modules). Using __DIR__ from Dir::Self module (not in core, that's why currently gitweb includes excerpt of code from Dir::Self defining __DIR__) was chosen over using FindBin-based solution (in core since perl 5.00307, while gitweb itself requires at least perl 5.8.0) because FindBin uses BEGIN block, which is a problem under mod_perl and other persistent Perl environments (thought there are workarounds). At Pavan Kumar Sankara suggestion gitweb/Makefile uses install [OPTION]... SOURCE... DIRECTORY format (2nd format) with single SOURCE rather than install [OPTION]... SOURCE DEST format (1st format) because of security reasons (race conditions). Modern GNU install has `-T' / `--no-target-directory' option, but we cannot rely that the $(INSTALL) we are using supports this option. The install-modules target in gitweb/Makefile uses shell 'for' loop, instead of make's $(foreach) function, to avoid possible problem with generating a command line that exceeded the maximum argument list length. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/Makefile | 17 +++++++++++++++-- gitweb/gitweb.perl | 8 ++++++++ 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/gitweb/Makefile b/gitweb/Makefile index 0a6ac00..e6029e1 100644 --- a/gitweb/Makefile +++ ...
Spelled out, this means modules would typically go in /usr/share/gitweb/lib Is that the right place? I suspect something like /usr/lib/gitweb/ could make sense in some installations for two reasons: - even braindamaged webserver configurations would not serve lib/ as static files in that case; - if some modules are implemented in C for speed, they would need to go in /usr/lib anyway to follow usual filesystem conventions. This explanation and the code below leave me nervous that the answer might be "no". ;-) --
Yes, it's true. It is mainly to support situation where one can install
files in (subdirectory of) cgi-bin, but nowehere else. That is why the
default is to install modules alongside with gitweb.
The additional advantage is that t/gitweb-lib.sh used by gitweb tests
can very simply test source version of gitweb, with gitweb finding
source version of modules. But it is not a very large obstacle to
Actually it doesn't matter what web server does with those files when
accessed directly, except for the client (user) confusion if he/she
goes where not invited. Modules are used by Perl (by gitweb), not by
Ugh, XS! I sincerely hope that when there would be decision to implement
some features in C for speed, we would be able to use Perl version of
ctypes for C-to-Perl interface, not XS.
Anyway most probable to be implemented in C would be Git.pm, or rather
Perl interface to libgit2. It is probable that at some point gitweb
would be converted to use Git.pm or its successor. But I guess that
Git Perl module would be installed somewhere in PERL5LIB, so it would
I have thought that I did provide 'gitweblibdir' as configurable knob,
but I see that in the version I have send I don't do this:
# Shell quote;
bindir_SQ = $(subst ','\'',$(bindir))#'
gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'
But if we are to allow custom gitweblibdir, we would have to change the
way gitweb is to find its modules. One solution would be inetad of
current
# __DIR__ is taken from Dir::Self __DIR__ fragment
sub __DIR__ () {
File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
}
use lib __DIR__ . '/lib';
use simply
use lib $ENV{GITWEBLIBDIR} || "++GITWEBLIBDIR++";
Of course both gitweb/Makefile and t/gitweb-lib.sh would have to be
updated: gitweb/Makefile to include replacement rule for '++GITWEBLIBDIR++'
in GITWEB_REPLACE, and ...This is first step towards implementing file based output (response) caching layer that is used on such large sites as kernel.org. This patch introduces GitwebCaching::SimpleFileCache package, which follows Cache::Cache / CHI interface, although do not implement it fully. The intent of following established convention for cache interface is to be able to replace our simple file based cache, e.g. by the one using memcached. The data is stored in the cache as-is, without adding metadata (like expiration date), and without serialization (which means that one can store only scalar data). At this point there is no support for expiring cache entries. The code of GitwebCaching::SimpleFileCache package in gitweb/lib was heavily based on file-based cache in Cache::Cache package, i.e. on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver (including implementing atomic write, something that original patch lacks). It tries to follow more modern CHI architecture, but without requiring Moose. It is much simplified compared to both interfaces and their file-based drivers. This patch does not yet enable output caching in gitweb (it doesn't have all required features yet); on the other hand it includes tests of cache Perl API in t/t9503-gitweb-caching-interface.sh. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/lib/GitwebCache/FileCacheWithLocking.pm | 452 ++++++++++++++++++++++++ t/t9511-gitweb-caching-interface.sh | 34 ++ t/t9511/test_cache_interface.pl | 381 ++++++++++++++++++++ 3 files changed, 867 insertions(+), 0 deletions(-) create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm create mode 100755 t/t9511-gitweb-caching-interface.sh create mode 100755 t/t9511/test_cache_interface.pl diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm ...
Add GitwebCache::CacheOutput package, which introduces cache_output subroutine. If data for given key is present in cache, then cache_output gets data from cache and prints it. If data is not present in cache, then cache_output runs provided subroutine (code reference), captures its output, saves this output in cache, and prints it. It requires that provided $cache supports ->capture_fh method, like GitwebCache::FileCacheWithLocking introduced in earlier commit, and that provided $capture supports capturing to file or filehandle via ->capture($code, $file) method, like GitwebCache::Capture::ToFile introduced in some earlier commit. Exceptions in $code should be thrown using 'die' (Perl exception mechanism); one can choose whether error output (output printed when exception is raised, before raising it) should be saved to cache or not. By default error output is not cached. Gitweb would use cache_output to get page from cache, or to generate page and save it to cache. The die_error subroutine throws exception, which will be caught and by default rethrown; error pages would not be cached. It is assumed that data is saved to cache _converted_, and should therefore be read from cache and printed to STDOUT in ':raw' (binary) mode. Add t9512/test_cache_output.pl test, run as external test in t9512-gitweb-cache. It checks that cache_output behaves correctly, namely that it saves and restores action output in cache, and that it prints generated output or cached output, depending on whether there exist data in cache. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/lib/GitwebCache/CacheOutput.pm | 84 ++++++++++++++++ t/t9512-gitweb-cache-output-interface.sh | 34 ++++++ t/t9512/test_cache_output.pl | 162 ++++++++++++++++++++++++++++++ 3 files changed, 280 insertions(+), 0 deletions(-) create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm create mode 100755 t/t9512-gitweb-cache-output-interface.sh create mode 100755 ...
This commit actually adds output caching to gitweb, as we have now minimal features required for it in GitwebCache::FileCacheWithLocking (a 'dumb' but fast file-based cache engine). To enable cache you need (at least) set $caching_enabled to true in gitweb config, and copy required modules alongside generated gitweb.cgi - this is described in more detail in the new "Gitweb caching" section in gitweb/README. "make install-gitweb" would install all modules alongside gitweb itself. Capturing and caching is designed in such way that there is no behaviour change if $caching_enabled is false. If caching is not enabled, then capturing is also turned off. Enabling caching causes the following additional changes to gitweb output: * Disables content-type negotiation (choosing between 'text/html' mimetype and 'application/xhtml+xml') when caching, as there is no content-type negotiation done when retrieving page from cache. Use lowest common denominator of 'text/html' mimetype which can be used by all browsers. This may change in the future. * Disable optional timing info (how much time it took to generate the original page, and how many git commands it took), and in its place show unconditionally when page was originally generated (in GMT / UTC timezone). * Disable 'blame_incremental' view, as it doesn't make sense without printing data as soon as it is generated (which would require tee-ing when capturing output for caching)... and it doesn't work currently anyway. Alternate solution would be to run 'blame_incremental' view with caching disabled. Add basic tests of caching support to t9500-gitweb-standalone-no-errors test: set $caching_enabled to true and check for errors for first time run (generating cache) and second time run (retrieving from cache) for a single view - summary view for a project. Check in the t9501-gitweb-standalone-http-status test that gitweb at least correctly handles "404 Not Found" error pages also in the case when gitweb ...
This commit removes asymmetry in serving stale data (if stale data exists)
when regenerating cache in GitwebCache::FileCacheWithLocking. The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.
Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data). Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl or from FastCGI: it would leave unreaped children as zombies
(entries in process table). We don't want to wait for background process,
and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to automatically reap
child processes, because this interferes with using
open my $fd, '-|', git_cmd(), 'param', ...
or die_error(...)
# read from <$fd>
close $fd
or die_error(...)
In the above code "close" for magic "-|" open calls waitpid... and we
would would die with "No child processes". Removing 'or die' would
possibly remove ability to react to other errors.
This feature can be enabled or disabled on demand via 'background_cache'
cache parameter. It is turned on by default.
When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::FileCacheWithLocking, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.
Note that without generating data in background, process generating
data wouldn't print progress info, because 'generating_info' can exit
(and in the case of gitweb's git_generating_data_html does exit).
We don't need to daemonize background process in this case, where
there is no stale data to serve, but progress info is on. This is
because we have to wait for the background process to ...Instead of having gitweb use progress info indicator / throbber to notify user that data is being generated by current process, gitweb can now (provided that PerlIO::tee from PerlIO::Util is available) send page to web browser while simultaneously saving it to cache (print and capture, i.e. tee), thus having incremental generating of page serve as a progress indicator. To do this, the GitwebCache::Capture::ToFile module acquired ->tee() subroutine, similar to ->capture(), but it prints while capturing output. The ->tee() method (and its worker methods ->tee_start() and ->tee_end()) are available only if PerlIO::tee from PerlIO::Util distribution is present. Tests checking if this feature works as expected were added to t9510 test. An alternative would be to provide two versions of GitwebCache::Capture::ToFile. Note also that PerlIO::tee is not strictly necessary, as Capture::Tiny shows, but in most generic case (like done in Capture::Tiny) one needs separate process functioning as multplexer. Because tee-ing (printing while capturing) can function as a kind of progress indicator only for process generating the data for cache entry, and not for the processes waiting for data to be generated, therefore 'generating_info' got splitinto 'get_progress_info' and 'set_progress_info'. You can set now in GitwebCache::FIleCacheWithLocking those two separately. You are expected to unset 'set_progress_info' when using tee-ing capturing engine. Some tests added to t9511 with tee-like situation. As a proof of concept gitweb now uses two slightly different versions of "Generating..." page; if you worry about interaction between progress indicator and non-cacheable error pages, you can set 'set_progress_info' separately to undef. The cache_output subroutine from GitwebCache::CacheOutput got updated to use ->tee() subroutine if $capture supports it. If ->tee() is used, then of course generated data doesn't need to and shouldn't be printed; also cache_output unsets ...
In general, and particularly for the large sites that caching is targeted at, teeing is a really bad idea. I've mentioned this several times before, and the progress indicator is a *MUCH* better idea. I'm not sure how many times I can say that, even if this was added it would have the potential to exacerbate disk thrashing and overall make things a lot more complex. 1) Errors may still be generated in flight as the cache is being generated. It would be better to let the cache run with a progress indicator and should an error occur, display the error instead of giving any output that may have been generated (and thus likely a broken page). 2) Having multiple clients all waiting on the same page (in particular the index page) can lead to invalid output. In particular if you are teeing the output a reading client now must come in, read the current contents of the file (as written), then pick up on the the tee after that. It's actually possible for the reading client to miss data as it may be in flight to be written and the client is switching from reading the file to reading the tee. I don't see anything in your code to handle that kind of switch over. 3) This makes no allowance for the file to be generated completely in the background while serving stale data in the interim. Keep in mind that it can (as Fedora has experienced) take *HOURS* to generate the index page, teeing that output just means brokenness and isn't useful. It's much better to have a simple, lightweight waiting message get displayed while things happen. When they are done, output the completed page to all waiting clients. - John 'Warthog9' Hawley P.S. I'm back to work full-time on Wednesday, which I'll be catching up on gitweb and trying to make forward progress on my gitweb code again. --
It might be true that tee-ing is bad for very large sites, as it increases load a bit in those (I think) extremly rare cases where clients concurrently access the very same error page. But it might be a solution for those in between cases. I think that incrementally generated page is better progress indicator than just "Generating..." page. Anyway this proof of concept patch is to show how such thing should be implemented. I don't think that it makes things a lot more complex; in this rewrite everything is quite well modularized, encapsulated, and isolated. But the main intent behind this patch was to avoid bad interaction between 'progress info' indicator (in the process that is generating page, see On the contrary, with tee-ing (and zero size sanity check) you would be able to see pages even if there are errors saving cache entry. Though this wouldn't help very large sites which cannot function without caching, it could be useful for smaller sites. Err... could you explain what do you mean by "client is switching from reading the file to reading the tee"? Hmmm... I thought that the code is clear. Generating data, whether it is captured to be displayed later, or tee-ed i.e. printed and captured to cache, is inside critical section, protected by exclusive lock. Only after cache entry is written (in full), the lock is released, and clients waiting for data can access it; they use shared (readers) lock for sync. Note that in my rewrite (and I think also in _some_ cases in your version) files are written atomically, by writing to temporary file then renaming It does make allowance. cache_output from GitwebCache::CacheOutput uses capturing and not tee-ing if we are in background process. When there is stale data to serve, cache entry is (re)generated in background in detached process. Moreover by default cache_output has safety in that error pages generated by such detached process are cached. Note also that in my rewrite you can simply (by ...
I was not sure how Perl reacts to ENOSPC (No space left on device),
which I think it is only error that can be generated in flight as
cache is being generated (or as gitweb output is printed i.e. sent
to browser and captured/tee-ed i.e. saved to cache entry file), so
I have checked this (using loopback to create small filesystem).
The outcomes one worry about are the following:
* Perl dies during printing - this leads to broken page send to
browser, and no cache entry generated
* Perl prints output without dying at all; the page send to browser
via tee-in is all right, but cache entry is truncated which results
in broken page shown to other clients.
But what actually happens is actually different, and quite safe:
* Perl prints output without dying, and dies on closing cache entry
file with ENOSPC. This means that client generating data gets correct
output, and cache entry is not generated. Other clients with my code
try their hand at generation and also get correct page, but not save
it to cache.
This means that no error page about problems with cache is shown, which
is bad. On the other hand, at least for smaller sites, gitweb keeps
working as if without cache for newer entries.
Note that observed behaviour might depend on operating system / filesystem
Errr... now after rereading your email I see that caching error pages
has one problem: errors that come from the caching engine or capturing
Sent as
[RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error
to create error pages
Message-ID: <201101040135.08638.jnareb@gmail.com>
http://permalink.gmane.org/gmane.comp.version-control.git/164466
--
Jakub Narebski
Poland
--
