Re: [RFC] Configuring (future) committags support in gitweb

Previous thread: Init on push by Robin Rosenberg on Saturday, November 8, 2008 - 12:08 pm. (4 messages)

Next thread: Re: Init on push by Jakub Narebski on Saturday, November 8, 2008 - 4:06 pm. (7 messages)
To: <git@...>
Cc: Francis Galiegue <fg@...>
Date: Saturday, November 8, 2008 - 3:07 pm

Francis Galiegue <fg@one2team.net> writes

Here below there is proposal how the committags support could look like
for gitweb _user_, which means how to configure gitweb to use (or do not
use) committags, how to configure committags, and how to define new
committags.

Committags are "tags" in commit messages, expanded when rendering commit
message, like gitweb now does for (shortened) SHA-1, converting them to
'object' view link. It should be done in a way to make it easy
configurable, preferably having to configure only variable part, and not
having to write whole replacement rule.

Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n),
Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting
of email addresses, "rich text formatting" like *bold* and _underline_,
syntax highlighting of signoff lines.

I think it would be good idea to use repository config file for
setting-up repository-specific committags, and use whatever Perl
structure for global configuration. The config language can be
borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).

So the example configuration could look like this:

[gitweb]
committags = sha1 signoff bugzilla

[committag "bugzilla"]
match = "\\b(?:#?)(\\d+)\\b"
link = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1"

where 'sha1' and 'signoff' are built-in committags, committags are
applied in the order they are put in gitweb.committags; possible actions
for committag driver include:
* link: replace $match by '_<a href="$link">_$match_</a>_'
* html: replace $match by '_$html_'
* text: replace $match by '$text'
where '_a_' means that 'a' is treated as HTML, and is not expanded
further, and 'b' means that it can be further expanded by later
committags, and finally is HTML-escaped (esc_html).

What do you think about this?
--
Jakub Narebski
Poland
--

To: Jakub Narebski <jnareb@...>
Cc: <git@...>
Date: Saturday, November 8, 2008 - 4:02 pm

Your proposal goes much further than my initial question, but I thought I'd

I don't understand what the "signoff" builtin is : is that a link to see only
commits "Signed-off-by:" a particular person?

If so, might I suggest that an "alt" tells "Only show commits signed off by
this person"?

And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a commit, a
tree, and others... In fact, it points to any of these right now, but how
would you tell apart these different SHA1s in a commit message? The only
obvious use I see for it is the builtin "Revert ..." commit message, that the
commiter _can_ override...

Or would that be:

my $sha1_re = qr/[a-z[0-9]{40}/;

/(?:(?i:commit\s+))?\b($sha1_re)\b/ => [link to commit $1]
/(?:(?i:tree\s+))\b($sha1_re)\b)/ => [link to tree $1]
/(?:(?:tag\s+))\b($sha1_re)\b)/ => [link to tag $1]

Finally, is there any reason to think that a sha1 or signoff committag will

What use do you see for the html match? Just asking...

And I don't see what you '_a_' and '_b_' are about...

--
fge
--

To: Francis Galiegue <fg@...>
Cc: <git@...>
Date: Saturday, November 8, 2008 - 6:35 pm

Like in example with 'link' rule, not having to write whole
<a href="http://example.com/bugzilla.php?id=$1">$&</a>

Committags doesn't need to be replaced by links. In this case I meant
here using 'signoff' class for Signed-off-by: (and the like) lines, by

SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be
even more exact something that looks like SHA1) is replaced by link
to 'object' view, which in turn finds type of object and _redirect_
to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'.

We could have used instead gitweb link with 'h' (hash) parameter, but
without 'a' (action) parameter, which currently finds type of object

One might not want to link SHA1, for example if there are lots of false
positives because of commit message conventions or something, or refine
'signoff' committag to use different styles for different types of
signoff: Signed-off-by, Acked-by, Tested-by, other. Having explicit
'signoff' committag allows us also to put some committags _after_ it,
for example SPAM-protection of emails, or add some committag before

For example 'signoff' committag... well, it is not exactly pure "html"
but rather something like template.

[committag "signoff"]
match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
templ = "{<span class=\"signoff\">}$1{</span>}"

Or simpler

[committag "signoff"]
match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"

For example in link match, the text of the link can be further refined
by committags later in sequence.

--
Jakub Narebski
Poland
--

To: Jakub Narebski <jnareb@...>
Cc: Francis Galiegue <fg@...>, Petr Baudis <pasky@...>, git@vger.kernel.org <git@...>
Date: Tuesday, February 17, 2009 - 11:32 am

I'm interested in cross-linking bug references in commit messages to a
bug tracking system. I started tinkering a couple weeks ago and am
finally understanding that committags encompass this functionality.
(From the subject line I first understood "tags" to mean git tags rather
than commit message munging.)

Is the committags idea still under active development?

I'd be happy to share what I have, which is for bug linking only, but
very little code would apply to the more general concept of committags.
Here are some ideas that might apply...

Two regexes would make it easier to configure a driver without needing
look-ahead and look-behind assertions. For example, if you want to
match non-negative integers but only in the context of a Resolves-bug
header:

Resolves-bug: 1234, 1235

With two regexes you can write:

/^Resolves-bug: \d+(, \d+)*/
/\d+/

But with only one, you'd have to write:

/(?<=^Resolves-bug: (\d+, )*)(\d+)/

The need for a lookbehind assertion means I need to stop at perlreref to
lookup syntax. Hrm... and with testing I see that it's worse than that:

$ perl -wpe 's/(?<=^Resolves-bug: (\d+, )*)(\d+)/[$2]/g'
Variable length lookbehind not implemented in regex; marked by <--
HERE in m/(?<=^Resolves-bug: (\d+, )*)(\d+) <-- HERE / at -e line 1.

I guess it can't be done even with the look-behind assertion.

I got the two-regex idea from a spec I ran across while evaluating
Subversion:

http://guest:@tortoisesvn.tigris.org/svn/tortoisesvn/trunk/doc/issuetrac...

If there is interest in BTS integration beyond gitweb, for example in
git-gui, gitk, or the Windows UIs, then perhaps this spec would be worth
considering. It covers more than just hyperlinking. It also considers
issues like how to draw the form field for a bug ID as part of a commit
message form, how to validate that form field, and then how to munge the
log message to include the entered ID. Some details, like using a
newline to separat...

To: Marcel M. Cary <marcel@...>
Cc: Francis Galiegue <fg@...>, Petr Baudis <pasky@...>, git@vger.kernel.org <git@...>
Date: Tuesday, February 17, 2009 - 11:38 pm

Well, it is in my todo list, rather further on...

You don't need multiple regexps for that, and in above example it is

In current proposal the order of running committags drivers is

... but I guess that at first attempt we could support non-overlapping
committags only, i.e. replacement is always as whole not passed to
later committags.

Still there is a problem how to specify which parts of replacement for
committags have to be HTML escaped, and which are HTML and should not
be (and which are attributes, and have to be escaped too).

Well, I think if it would be supported, it would be a very special

Also 'v1.5.4' etc linking to tag; both would be a good idea. At this
point I think we have already list of all references (for ref markers)
so it wouldn't require additional call to git command.

P.S. I understand that this post is an exception (send after long, long
time), but please do not toppost in replies. It goes against natural
reading order.

--
Jakub Narebski
Poland
--

To: Jakub Narebski <jnareb@...>
Cc: Francis Galiegue <fg@...>, Petr Baudis <pasky@...>, git@vger.kernel.org <git@...>
Date: Thursday, February 19, 2009 - 1:08 pm

Heh, I'm not sure. It's like a filter in the unix pipeline sense, but
"commit message filter" sounds to me like some messages might be
rejected. Most of the drivers markup static text with HTML tags, but
not all of them. Maybe "commit message embellishment".

Perhaps a more important question is: will people find the feature once
it's implemented? I think that won't be a problem provided that it's

Is any code for it published in a repository anywhere? I see a branch
jn/gitweb-committag merged into master that looks relevant, but it only

I'm not sure what exactly you propose. In the second example in the
bugtraq spec, there are two regexes. Maybe you mean something like
this, but it breaks with three bugs:

$ perl -MData::Dumper -wne '
m/^Resolves-bug: (\d+)(?:, (\d+))*/;
print Dumper([$1, $2, $3, $4]);
'
Resolves-bug: 123
$VAR1 = [
'123',
undef,
undef,
undef
];
Resolves-bug: 123, 124
$VAR1 = [
'123',
'124',
undef,
undef
];
Resolves-bug: 123, 124, 125
$VAR1 = [
'123',
'125',
undef,
undef
];

Maybe something like this? But it's limited to an arbitrary number of
bug matches. Maybe it's good enough for pratical purposes, but it's
prone to unexpected breakage when the user exceeds the threshold of, in
this case, four bugs.

/^Resolves-bug: (\d+)(?:, (\d+))?(?:, (\d+))?(?:, (\d+))?/

Marcel

--

To: <git@...>
Cc: <giuseppe.bilotta@...>, <jnareb@...>, Marcel M. Cary <marcel@...>, <fg@...>, <pasky@...>
Date: Tuesday, February 17, 2009 - 11:00 pm

When a feature like "blame" is permitted to be overridden in the
repository configuration but it is not actually set in the
repository, a warning is emitted due to the undefined value
of the repository configuration, even though it's a perfectly
normal condition.

The warning is grounds for test failure in the gitweb test script,
so it causes some new feature tests of mine to fail.

This patch prevents warning and adds a test case to exercise it.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---

Here's a small patch I put together while tinkering with bug hyperlinking.
Does this look reasonable?

Marcel

gitweb/gitweb.perl | 8 +++++---
t/t9500-gitweb-standalone-no-errors.sh | 5 +++++
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7c48181..653f0be 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -402,13 +402,13 @@ sub feature_bool {
my $key = shift;
my ($val) = git_get_project_config($key, '--bool');

- if ($val eq 'true') {
+ if (!defined $val) {
+ return ($_[0]);
+ } elsif ($val eq 'true') {
return (1);
} elsif ($val eq 'false') {
return (0);
}
-
- return ($_[0]);
}

sub feature_snapshot {
@@ -1978,6 +1978,8 @@ sub git_get_project_config {
$config_file = "$git_dir/config";
}

+ return undef if (!defined $config{"gitweb.$key"});
+
# ensure given type
if (!defined $type) {
return $config{"gitweb.$key"};
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 7c6f70b..559045e 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -662,6 +662,11 @@ cat >>gitweb_config.perl <<EOF
EOF

test_expect_success \
+ 'config override: tree view, features not overridden in repo config' \
+ 'gitweb_run "p=.git;a=tree"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
'config override: tree view, feature...

To: Marcel M. Cary <marcel@...>
Cc: <giuseppe.bilotta@...>, <jnareb@...>, <fg@...>, <pasky@...>, <git@...>
Date: Wednesday, February 18, 2009 - 4:40 am

To my cursory look, it doesn't, and it is not entirely your fault, but if
we applied this patch, it would not improve things very much. It just

I think the warning you are talking about is to compare $val with 'true'
with 'eq' operator when $val could be undef. The check to see if $val is
undefined ato avoid that 'eq' comparison is fine, and the intent to return
false is also good, but I think feature_bool is meant to say "yes" or
"no", and existing code for 'false' is returning (0). I'd rather see your
new codepath normalize incoming undef the same way string 'false' is
normalized to return (0). Granted, the caller should be prepared to take
the answer as boolean and treat the usual Perl false values (numeric zero,
a string with single "0", or an undef) without barfing, so returning (undef)
from here ought to be safe (otherwise the callers are broken), but I'd
rather see this function play safe.

I think this change is missing a lot of necessary fixes associated with
it. Have you actually audited all the callers of this function you are
modifying? For example, feature_bool does this:

sub feature_bool {
my $key = shift;
my ($val) = git_get_project_config($key, '--bool');

if ($val eq 'true') {
return (1);
} elsif ($val eq 'false') {
...

With your above change, I think a missing configuration variable will
stuff undef in $val, and trigger the same "$val eq 'true'" comparison
warning here.

Granted, without your change the existing code triggers the same error in
another way, by calling config_to_bool sub with undef here:

# ensure given type
if (!defined $type) {
return $config{"gitweb.$key"};
} elsif ($type eq 'bool') {
# backward compatibility: 'git config --bool' returns true/false
return config_to_bool($config{"gitweb.$key"}) ? 'true' : 'false';

and config_to_bool sub is written in the same carelessness like so:

sub config_to_bool {
...

To: Junio C Hamano <gitster@...>
Cc: Giuseppe Bilotta <giuseppe.bilotta@...>, Marcel M. Cary <marcel@...>, Francis Galiegue <fg@...>, Petr Baudis <pasky@...>, <git@...>
Date: Wednesday, February 18, 2009 - 9:09 am

Fixed up patch at the bottom.

Somewhat.

Actually git_get_project_config($key, '--bool') can return only three
values:
* 'true' if gitweb.$key config variable is 'true', 'yes', 1, or (after
fixes in the fixed up patch at the bottom) is undefined, i.e.
[gitweb]
blame
case
* 'false' if gitweb.$key exists and has other value (that includes
empty string value: "[section] val" is git-config --bool true, while
"[section] val = " is --bool false).
* undef if gitweb.$key does not exist in the config;
earlier version which used "git config --bool <variable>" returned
empty string ('') here.

We want to fall back to %feature value (i.e. do not override) if
variable is not set in config. Variable not set was '', and now is undef,

Well, I think we can now get rid of backwards compatibility (which is
not complete anyway: '' for not existent variable for old version, undef
for new version) with the old version which ran git-config once for each
variable, and do not go through 'true'/'false' to imitate calling
"git config --bool", which has to be converted back to Perl boolean

It should be !exists, and not !defined here, see my fixed up patch

Right (but not complete).

We do check !defined $val, but too late: _after_ trying to strip leading
and trailing whitespace. When going through various versions of

Actually after the patch below I think that git_get_project_config
returns empty list () in the list (array) context now if variable
does not exist in the config thanks to "return ;" magic. And empty
list evaluates to false in scalar context: "if (@val)" would be false
if variable does not exist in the config... in which case we would not
override 'default' value in %feature.

Alternate solution would be to use

sub feature_patches {
my $val = git_get_project_config('patches', '--int');

if (defined $val) {
return ($val);
}

...

To: Marcel M. Cary <marcel@...>
Cc: <jnareb@...>, <fg@...>, <pasky@...>, <git@...>
Date: Wednesday, February 18, 2009 - 3:41 am

I'm no Perl expert, so I have no idea: how do non-bool config checks
(which expect arrays) cope with an undef? Also, you may want to add a
non-bool override test in the test suite.

--
Giuseppe "Oblomov" Bilotta
--

To: <git@...>
Cc: <giuseppe.bilotta@...>, <jnareb@...>, Marcel M. Cary <marcel@...>, <fg@...>, <pasky@...>
Date: Tuesday, February 17, 2009 - 11:00 pm

The current implementation only hyperlinks the first hash on
a given line of the commit message. It seems sensible to
highlight all of them if there are multiple, and it seems
plausible that there would be multiple even with a tidy line
length limit, because they can be abbreviated as short as 8
characters.

Benchmark:

I wanted to make sure that using the 'e' switch to the Perl regex
wasn't going to kill performance, since this is called once per commit
message line displayed.

In all three A/B scenarios I tried, the A and B yielded the same
results within 2%, where A is the version of code before this patch
and B is the version after.

1: View a commit message containing the last 1000 commit hashes
2: View a commit message containing 1000 lines of 40 dots to avoid
hyperlinking at the same message length
3: View a short merge commit message with a few lines of text and
no hashes

All were run in CGI mode on my sub-production hardware on a recent
clone of git.git. Numbers are the average of 10 reqests per second
with the first request discarded, since I expect this change to affect
primarily CPU usage. Measured with ApacheBench.

Note that the web page rendered was the same; while the new code
supports multiple hashes per line, there was at most one per line.

The primary purpose of scenarios 2 and 3 were to verify that the
addition of 1000 commit messages had an impact on how much of the time
was spent rendering commit messages. They were all within 2% of 0.80
requests per second (much faster).

So I think the patch has no noticeable effect on performance.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---

And here's another.

Marcel

gitweb/gitweb.perl | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 653f0be..51b7f56 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1384,13 +1384,11 @@ sub format_log_line_html {
my $line = shift;

$...

To: Marcel M. Cary <marcel@...>
Cc: <giuseppe.bilotta@...>, <fg@...>, <pasky@...>, <git@...>
Date: Wednesday, February 18, 2009 - 5:55 pm

That is a good catch. Code simply was not modified since we required

I don't think we should worry about that; after all esc_path and
unescape subroutines also use 'e' switch to Perl regexp.

So the benchmark is nice addition, but I don't think it is really
necessary, especially that the change results in shorter and easier
(I think) to maintain code.

Do I understand correctly that those patches are not related at all
semantically or textually, only in that you have them one after other
(and blob sha-1 in the index line reflects state after former), isn't

Almost correct... but for this unnecessary 'return' statement.

P.S. Why bare emails (without user names), e.g. "pasky@suse.cz"
and not "Petr Baudis <pasky@suse.cz>"? Just curious...

--
Jakub Narebski
Poland
--

To: Jakub Narebski <jnareb@...>
Cc: giuseppe.bilotta@gmail.com <giuseppe.bilotta@...>, Marcel M. Cary <marcel@...>, fg@one2team.net <fg@...>, pasky@suse.cz <pasky@...>, git@vger.kernel.org <git@...>
Date: Tuesday, February 24, 2009 - 12:33 pm

Yes, I think they were independent. They are related only in that they
are small tweaks to gitweb and products of my bug hyperlinking patch
(not submitted). I submitted them as a series mainly to group them.
Are you suggesting that I reserve threading patches together for when
one builds upon the previous one?

Marcel
--

To: Jakub Narebski <jnareb@...>
Cc: giuseppe.bilotta@gmail.com <giuseppe.bilotta@...>, Marcel M. Cary <marcel@...>, fg@one2team.net <fg@...>, pasky@suse.cz <pasky@...>, git@vger.kernel.org <git@...>
Date: Tuesday, February 24, 2009 - 11:38 am

Thanks for the two patch tweaks.

I've been using "git send-email" for patches, and have Thunderbird as my
MUA otherwise. (I'd use (al)pine if I could make it work with
Exchange/NTLM at work, but that's another story...) I've been
transfering recipients (--to and --cc) from Thunderbird to the
commandline with copy/paste.

In Thurderbird, copying an email address from a message only gets you
the user@domain part, not the "Full Name" <user@domain>. To get the
"Full Name" <user@domain> I would have to View Message Source and
pickout the CC line, which is marginally harder.

And on the commandline, instead of just pasting an email as a shell
word, I'd have to add single quotes (I think) to keep the whole "Full
Name" <user@domain> as one word and quote the shell meta characters.

Neither piece is all that onerous, I guess. Sounds like you would see
some value in the full names. Maybe I'll try including them on my next
patch. Looks like --cc-cmd or sendemail.aliasesfile might make it
easier, but I'd have to set them up.

Marcel
--

To: Marcel M. Cary <marcel@...>
Cc: <giuseppe.bilotta@...>, <fg@...>, <pasky@...>, <git@...>
Date: Tuesday, February 24, 2009 - 11:58 am

[...]

Well, 'technical reasons with copy'n'paste' would be enough for me.

P.S. But '"pasky@suse.cz" <pasky@suse.cz>' looks very silly...
--
Jakub Narebski
Poland
--

To: Jakub Narebski <jnareb@...>
Cc: <git@...>
Date: Saturday, November 8, 2008 - 7:27 pm

OK, you lost me somewhat.

What I understand is that right now, the SHA1 links are "pre-processed" by
gitweb so that the 'a' parameter is correct, right?

Out of curiosity, I just went to the kernel git repository (I don't know the
git version that git.kernel.org uses offhand) and altered the 'a' parameter
to something which is not even an 'a' command at all: 500...

However, if I try a VALID 'a' command with an "irrelevant" 'h' parameter, it
acts quite funny: it just looks like it wants to try the closest match, but
takes some time figuring it out... Sometimes to something relevant, sometimes
to nothing really relevant. See for instance [1], in which 'a' was
originally "commit".

Ow.

[1]

Hmmm, so this means you'd want to make styles customizable somewhat (signoff).
In fact, what you really want is span for CSS! Then why not, just, making a
document to say "This is what you can do with CSS for gitweb", and say "these
are the available CSS tags", and then be done with it?

I mean, when comes the day that someone will WANT other spans to be defined,

I still don't get it. Can you give an example?

[personal thoughts: it would be really, really nice if, somewhat, gitweb.perl
were splitted somewhat into different modules, and ideally use more
of "what's out there on CPAN". I'm convinced that some CPAN modules would be
of GREAT help to gitweb, as well as I'm convinced that not many people out
there use Windows to run gitweb anyway :p]

--
fge
--

To: Francis Galiegue <fg@...>
Cc: <git@...>
Date: Saturday, November 8, 2008 - 8:25 pm

Not necessary. Please remember that you can configure gitweb to either
use _alternate_ stylesheet (instead of provided gitweb.css), or use
_additional_ stylesheet (for example gitweb-commit.css in addition to

No, they are 'post-processed': finding correct action is left to
_after_ you click on the link (it is more natural, and helps
performance).

If you don't know action for given SHA-1 you can use either

http://example.com/gitweb.cgi?a=object;h=deadbeef

which finds correct type (for example 'commit') and redirects using
HTTP 302 Found redirection to

http://example.com/gitweb.cgi?a=commit;h=deadbeef

This way is used bu SHA-1 committag in commit messages.

Alternate solution (meant for bugtrackers), done by another author,
is to simply skip action ('a') parameter:

http://example.com/gitweb.cgi?h=deadbeef

and then gitweb finds type of object and act accordingly (without
redirect to correct view).

Errr... I don't understand.

The examples perhaps are not the best. One would be for example to use
different class (different style) for Signed-off-by signoff (which
denotes signing Certificate of Origin), and the rest of (informative)
signoff:

[gitweb]
committags = sha1 signoff_signed signoff

Another example would be to add SPAM-protection of emails, for example
replacing 'user@example.com' by 'user AT example DOT com', or something
more advanced. This have to be used _after_ signoff, because otherwise
regexp could have difficulties matching mangled email

[gitweb]

For example signoff line

Signed-off-by: A U Thor <author@example.com>

would be replaced by

{<span class="signoff">}Signed-off-by: A U Thor <author@example.com>{</a>}

where parts inside {...} is HTML code, and should be not expanded
further, and parts outside it could be expanded further by following
lower priority committags (like anti-SPAM for emails), and have to be
finally HTML escaped (like '<' and '>' in email in signoff).

...

Previous thread: Init on push by Robin Rosenberg on Saturday, November 8, 2008 - 12:08 pm. (4 messages)

Next thread: Re: Init on push by Jakub Narebski on Saturday, November 8, 2008 - 4:06 pm. (7 messages)