Re: [PATCH] git-daemon virtual hosting implementation.

Previous thread: Proposal for new git Merge Strategy by Jon Loeliger on Wednesday, August 23, 2006 - 11:40 am. (3 messages)

Next thread: Re: Proposal for new git Merge Strategy by Jakub Narebski on Wednesday, August 23, 2006 - 12:02 pm. (1 message)
From: Pierre Habouzit
Date: Wednesday, August 23, 2006 - 11:52 am

just add the hostname in the path when using --base-path and --user-path.
this should be enough for most needs.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
---
 Here is a proposal for daemon side virtualhosting support.

 This is quite simple, and just uses the hostname in the path where it looks
 for the git repository. Only works in conjuction of --base-path or
 --user-path btw.

 Documentation/git-daemon.txt |   12 +++++++
 daemon.c                     |   73 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
index 0f7d274..ab5b232 100644
--- a/Documentation/git-daemon.txt
+++ b/Documentation/git-daemon.txt
@@ -79,6 +79,18 @@ OPTIONS
 	taken as a request to access `path/foo` repository in
 	the home directory of user `alice`.
 
+--use-vhosts::
+	Use git daemon cheap virtual hosting scheme.
++
+Used with --base-path=/foo, it will search the repositories in /foo/HOSTNAME
+instead of /foo - if you try to pull git://git.example.com/hello.git, it will
+search into /foo/git.example.com/hello.git.
++
+When using --user-path the HOSTNAME is appended to the path as well, so for
+--user-path=some/path, request to git://git.example.com/~alice/foo is taken as
+a request to access some/path/git.example.com/foo in the home directory of
+user `alice`.
+
 --verbose::
 	Log details about the incoming connections and requested files.
 
diff --git a/daemon.c b/daemon.c
index 012936f..146482c 100644
--- a/daemon.c
+++ b/daemon.c
@@ -17,7 +17,7 @@ static int reuseaddr;
 
 static const char daemon_usage[] =
 "git-daemon [--verbose] [--syslog] [--inetd | --port=n] [--export-all]\n"
-"           [--timeout=n] [--init-timeout=n] [--strict-paths]\n"
+"           [--timeout=n] [--init-timeout=n] [--use-vhosts] [--strict-paths]\n"
 "           [--base-path=path] [--user-path | --user-path=path]\n"
 "           [--reuseaddr] [--detach] [--pid-file=file] ...
From: Junio C Hamano
Date: Wednesday, August 23, 2006 - 1:11 pm

This part is objectionable -- older clients do not give "host=".
I think the plan, when virtual hosting was proposed and we added
this to the client side first, was to treat older clients as if
they specified the "primary" host.  So we would need some
mechanism to say where the repositories of the "primary" host

I am not sure if the interaction between user-path and vhost
should go like this, but I do not think of a good alternative to
suggest right now.  Your code allows ~user/host1 and ~user/host2
to host different set of repositories, but I suspect if somebody
is setting up a virtual hosting of two hosts, he might want to
have two distinct set of users (iow to pretend that ~user exist
only on host1 but not on host2).  I dunno.


-

From: Pierre Habouzit
Date: Wednesday, August 23, 2006 - 1:56 pm

fair enough, so I suppose there is many choices:
 * replace --use-vhosts with --use-vhosts=3D${default hostname}
   but I do not like it very much
 * use `hostname -f` as the default hostname (not very nice either)
 * use the magic _default_ (=E0 la apache) hostname ?

But just note that if a git repository is accessible on an host that is=20
not the default, and only on that one, an older client won't be able to=20
clone the repository anyway, because he will be redirected to the=20
default virtual host. Meaning, that someone that uses virtual hosts=20
will break older clients anyway.

what would be nicer would be a way to gracefully reject older clients in=20
that case with an understandable error message, so that the user knows=20
why it does not work. I don't know PROTO_GIT very well yet,so I don't=20

Another option would be not to support virtual hosts, but instead=20
superseed the --base-path and --user-path with some --base-path-fmt=20
and --user-path-fmt where the user can specify how to build the path=20
with simple sprintf-like formats. One could e.g. support:
 * %% obviously ;
 * %h that will be replaced with the hostname
 * %u (only for --user-path-fmt)
 * %p (asked path)
 * ...

I think that's more clever, and allow more flexible use of the virtual=20
hosting code. It e.g. allow to use the virtual host scheme for the=20
`base-path` repos and to disallow it for the users.

=2D-*-path and --*-path-fmt are obviously mutually exclusive.

What do you think ?

=2D-=20
=B7O=B7  Pierre Habouzit
=B7=B7O                                                madcoder@debian.org
OOO                                                http://www.madism.org
From: Jon Loeliger
Date: Thursday, August 24, 2006 - 1:15 pm

And this is exactly what I have implemented and

I kinda like it... :-)

jdl


-

From: Junio C Hamano
Date: Thursday, August 24, 2006 - 1:35 pm

Where is the patch ;-)?

-

From: Jon Loeliger
Date: Thursday, August 24, 2006 - 1:34 pm

Heh.

Well, I rewrote the interpolation code.  As you pointed
out earlier, it was crap.  I can send that in; no problem.

There was some outstanding debate if this was actually
the right way to go about things.  Specifically, the
problem of canonical hostname.  A proposal was to use
peer IP addresses instead, and I just haven't gotten
around to messing with that yet.  So that is why I
hadn't sent it in yet.  Sorry.

But, I'll rebase it and send it in this evening.

jdl


-

From: Pierre Habouzit
Date: Wednesday, August 23, 2006 - 4:32 pm

Allow a form of virtualhosting, when %h format is used.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
---

    This is intended to be a more flexible solution, that also gives virtual
    hosting as a bonus. I still see no way to deal with older clients when
    virtual hosting is used by the admin though, having a "default" hostname
    won't solve anything at all anyway.

 Documentation/git-daemon.txt |   42 ++++++++++
 daemon.c                     |  170 +++++++++++++++++++++++++++++++++++++++---
 2 files changed, 201 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
index 0f7d274..0aa34d4 100644
--- a/Documentation/git-daemon.txt
+++ b/Documentation/git-daemon.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 'git-daemon' [--verbose] [--syslog] [--inetd | --port=n] [--export-all]
              [--timeout=n] [--init-timeout=n] [--strict-paths]
              [--base-path=path] [--user-path | --user-path=path]
+             [--base-path-fmt=pathfmt] [--user-path-fmt=pathfmt]
 	     [--reuseaddr] [--detach] [--pid-file=file] [directory...]
 
 DESCRIPTION
@@ -45,6 +46,10 @@ OPTIONS
 	'git://example.com/hello.git', `git-daemon` will interpret the path
 	as '/srv/git/hello.git'.
 
+--base-path-fmt=pathfmt::
+	Works like --base-path, but uses a format instead. See Path
+	Formats section below.
+
 --export-all::
 	Allow pulling from all directories that look like GIT repositories
 	(have the 'objects' and 'refs' subdirectories), even if they
@@ -79,6 +84,11 @@ OPTIONS
 	taken as a request to access `path/foo` repository in
 	the home directory of user `alice`.
 
+--user-path-fmt=pathfmt::
+	Allow ~user notation to be used in requests. The path used to look
+	for the git repository is a format string, see Path Formats
+	section below.
+
 --verbose::
 	Log details about the incoming connections and requested files.
 
@@ -98,6 +108,38 @@ OPTIONS
 	--strict-paths is specified this will also include subdirectories
 	of each ...
From: Junio C Hamano
Date: Wednesday, August 23, 2006 - 5:17 pm

I mildly disagree about the last sentence.  Enabling virtual
hosting does not have to mean all virtual hosts are treated
equal.  It is conceivable that a site hosts the primary,
"collection of public repositories everybody would want to go
to" set, with supplemental ones for specific audiences that are
done via virtual hosting.  General public who would want to
access the primary one can come with older clients that way, and
only the narrower audiences have to be told to upgrade.

The client-side host= support was done post 1.4.0-rc1 timeframe
so we have to be nicer to 1.3 based people, at least give them a
way to slurp newer version with their client ;-).

Haven't looked at the rest of the patch yet.  Will comment
later.

-

From: Pierre Habouzit
Date: Thursday, August 24, 2006 - 12:50 am

hmm, yes, that's indeed fair. Well, adding a --default-hostname that is=20
used if no host=3D is passed is completely obvious and straightforward=20
and solves that issue.

I also spotted a bug in the git_path_fmt function, I test 'pos' at the=20
end of the loop, but the overflow test is not done when there is no=20
format involved, wich could /theorically/ lead to buffer overflow. So=20
please disregard that issue, I will at least provide a patch that fixes=20
np, TIA

=2D-=20
=B7O=B7  Pierre Habouzit
=B7=B7O                                                madcoder@debian.org
OOO                                                http://www.madism.org
From: Junio C Hamano
Date: Saturday, August 26, 2006 - 11:12 pm

Nicely done, almost.

Having to have the distinction between %p and %P formats feels

I prefer these to be of type "static int".

Although I am not an authority of variable naming, these sound
funny to me.  "is_XXX()" as a function name feels natural,
"is_XXX" as a variable name does not --- it is not clear what
the predicate is talking about.

Maybe "use_fmt_for_base_path" is easier to understand?  I dunno.
Or "user_path_is_fmt"?  That's more logical but still somewhat
feels funny.


-

From: Pierre Habouzit
Date: Sunday, August 27, 2006 - 3:28 am

agreed.

There is also a second patch that never made it to the list that fixes:
 * some indentation problems due to a bad vimrc
 * --default-hostname switch (to handle virtual hosts even with older
   clients)
 * possible overflow in the formatting method.

I'll recompute a new patch that superseeds that one, and merge your=20
comments and my never sent patch too.

=2D-=20
=B7O=B7  Pierre Habouzit
=B7=B7O                                                madcoder@debian.org
OOO                                                http://www.madism.org
From: Junio C Hamano
Date: Sunday, August 27, 2006 - 3:52 am

I have to admit that I kinda liked JDL's simpler one first (and
it has been in production use for some time).  We'll see.

About vger potentially throwing things away, I use this script
(called "taboo.perl") to check my messages before sending them
out.

Obviously the taboo-word list itself is not attached here, but
the actual script should have a copy of it after the __DATA__
marker.

-- >8 --

#!/usr/bin/perl -w

my $tmpl = '	if (%%PATTERN%%) {
		print "$lineno ${_}matches %%QPATTERN%%\n";
		return;
	}
';
my $stmt = "";
my $in_header = 1;

while (<DATA>) {
	if (/^\$global_taboo_body =/) {
		$in_header = 0;
	}
	next if (/^\043/ || /^\$/ || /^END$/ || /^\s*$/);
	chomp;
	my $p = $_;
	if ($in_header) {
		$p = '/^[-\w_]*:/ && ' . $p;
	}
	my $q = quotemeta($p);
	my $stmt1 = $tmpl;
	$stmt1 =~ s|%%PATTERN%%|$p|g;
	$stmt1 =~ s|%%QPATTERN%%|$q|g;
	$stmt .= $stmt1;
}
close DATA;

$stmt = 'sub check {
	my ($line, $lineno) = @_;
' . $stmt . '
}
';
eval $stmt;
while (<>) {
	check($_, $.);
}

my $how_to_update_this_script = <<'EOF' ;
	( sed -e '/^__DATA__$$/q' taboo.perl && \
	  wget -q -O - http://vger.kernel.org/majordomo-taboos.txt ) \
		>taboo.perl+
	if diff -u taboo.perl taboo.perl+; \
	then \
		rm -f taboo.perl+; \
		echo >&2 No changes.; \
	else \
		mv taboo.perl+ taboo.perl; \
		chmod +x taboo.perl; \
	fi
EOF

__DATA__

-

From: Pierre Habouzit
Date: Sunday, August 27, 2006 - 4:40 am

that was not it, I was biten (again) by git-send-mail that uses strftime=20
(localized) to generate rfc822 dates, making them unparseable :|

I've resent the aggregated patch, that should work right now.
=2D-=20
=B7O=B7  Pierre Habouzit
=B7=B7O                                                madcoder@debian.org
OOO                                                http://www.madism.org
From: Randal L. Schwartz
Date: Sunday, August 27, 2006 - 9:06 am

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> About vger potentially throwing things away, I use this script
Junio> (called "taboo.perl") to check my messages before sending them
Junio> out.

Junio> Obviously the taboo-word list itself is not attached here, but
Junio> the actual script should have a copy of it after the __DATA__
Junio> marker.

With "Inline::Files" from the CPAN, you could have a switch where the
script updates itself with the new list:

    use Inline::Files;
    if (@ARGV == 1 and $ARGV[0] eq "-update") {
      use LWP::Simple;
      my $list = get "http://example.com/foo/bar.txt";
      open DATA, ">$DATA" or die "cannot write myself: $!";
      print DATA $list;
      close DATA;
      exit 0;
    }

    ... rest of your program here ...
    ... read using <DATA> as before ...

    __DATA__
    the list will magically go here.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
-

Previous thread: Proposal for new git Merge Strategy by Jon Loeliger on Wednesday, August 23, 2006 - 11:40 am. (3 messages)

Next thread: Re: Proposal for new git Merge Strategy by Jakub Narebski on Wednesday, August 23, 2006 - 12:02 pm. (1 message)