Re: [RFC] Git config file reader in Perl (WIP)

Previous thread: [PATCH] Document the -n option for git-log by Josef 'Jeff' Sipek on Sunday, January 14, 2007 - 5:25 pm. (2 messages)

Next thread: [ANNOUNCE] Guilt v0.18 by Josef Sipek on Sunday, January 14, 2007 - 6:02 pm. (1 message)
From: Jakub Narebski
Date: Sunday, January 14, 2007 - 5:44 pm

To make gitweb faster I thought about adding to it, or to Git.pm,
simple nonvalidation config file reader. Nonvalidating means that
it would accept some input which git-repo-config considers invalid.

Some of the trouble is caused because of coner cases like this 
example

[section "sub ; # sect \" ion\\"] ; "]
	key = a " b ; " ; " c ; "

which is valid, but strange.

I'm not proficent in Perl, so help is appreciated.

-- >8 --
#!/usr/bin/perl

use strict;
use warnings;

use Text::Balanced qw(extract_delimited);


sub read_config {
	my $configfile = shift;
	my $section = shift;
	my %config;

	open my $fd, $configfile
		or die "Cannot open $configfile: $!";

	my $sectfull;
	while (my $line = <$fd>) {
		chomp $line;

		if ($line =~ m/^\s*\[\s*([^][:space:]]*)\s*\](.*)$/) {
			# section without subsection

			my $sect = lc($1);

			$sectfull = $sect;

		} elsif ($line =~ m/\s*\[([^][:space:]]*)\s"((?:\\.|[^"])*)"\](.*)$/) {
			# section with subsection

			my $sect = lc($1);
			my $subsect = $2;
			$subsect =~ s/\\(.)/$1/g;

			$sectfull = "$sect.$subsect";

		} elsif ($line =~ m/\s*(\w+)\s*=\s*(.*?)\s*$/) {
			# variable assignment

			my $key = lc($1);
			my $rhs = $2;

			my $value = '';
			my ($next, $remainder, $prefix) = qw();
		DELIM: {
				do {
					($next, $remainder, $prefix) =
						extract_delimited($rhs, '"', qr/(?:\\.|[^"])*/);

					if ($prefix =~ s/\s*[;#].*$//) {
						# comment in unquoted part
						$value .= $prefix;
						last DELIM;
					} else {
						$value .= $prefix if $prefix;
						if ($next && $next =~ s/^"(.*)"$/$1/) {
							$value .= $next;
						}
					}

					$rhs = $remainder;
				} while ($rhs && $next);
			} # DELIM:

			if ($remainder) {
				$remainder =~ s/\s*[;#].*$//;
				$value .= $remainder;
			}

			$value =~ s/\\(.)/$1/g;

			if (exists $config{"$sectfull.$key"}) {
				push @{$config{"$sectfull.$key"}}, $value;
			} else {
				$config{"$sectfull.$key"} = [ $value ];
			}

		} ...
From: Eric Wong
Date: Monday, January 15, 2007 - 12:08 am

How about something like git-for-each-ref that dumps the entire output
of a config file into an eval()-able string?  That way we don't have to
deal with corner-cases and subtle differences between C and Perl
implementations.

-- 
Eric Wong
-

From: Jakub Narebski
Date: Monday, January 15, 2007 - 2:03 am

The idea is (at least for gitweb) to avoid cost of fork. And I think
if the format gets documented properly, there should be no differences
in config file parsing.

Please remember also that is first draft of git config file reader
in perl; an alpha version.
-- 
Jakub Narebski
Poland
-

From: Eric Wong
Date: Monday, January 15, 2007 - 2:56 am

If the Perl output is redirected to a file (say .git/config.perl) and
only regenerated when .git/config changes, `do(".git/config.perl")' will
likely be faster since all the parsing will be done by Perl itself.

-- 
Eric Wong
-

From: Shawn O. Pearce
Date: Monday, January 15, 2007 - 3:01 am

So long as its automatic in gitweb.cgi and based on the stat
attributes of .git/config, OK.  But my database background tells
me two copies of the same thing is fishy...

-- 
Shawn.
-

From: Jakub Narebski
Date: Monday, January 15, 2007 - 3:32 am

Would you write "git repo-config --perl", then? ;-)

Besides, I'd rather avoid the need for /tmp/gitweb, and I think usually
gitweb do not have (and should not have) write access to repository.

-- 
Jakub Narebski
Poland
-

From: Eric Wong
Date: Monday, January 15, 2007 - 4:26 am

The below patch should be a start (only tested on my fairly standard

Good point.  Having to maintain a .git/config.perl in the repository
would be a pain from an administrative standpoint; but on the other hand
.git/config is not often regenerated.

I don't think giving gitweb write access to a repo is a good idea;
either.  Perhaps it would be updated via hook like the HTTP stuff.
IMHO, there is nothing wrong with gitweb writing to /tmp; however.

diff --git a/builtin-repo-config.c b/builtin-repo-config.c
index 9063311..a9ef358 100644
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "quote.h"
 
 static const char git_config_set_usage[] =
 "git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -13,6 +14,7 @@ static int do_all;
 static int do_not_match;
 static int seen;
 static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
+static char *last_key;
 
 static int show_all_config(const char *key_, const char *value_)
 {
@@ -23,6 +25,30 @@ static int show_all_config(const char *key_, const char *value_)
 	return 0;
 }
 
+static int show_perl_config(const char *key_, const char *value_)
+{
+	if (last_key) {
+		if (strcmp(last_key, key_)) {
+			free(last_key);
+			last_key = xstrdup(key_);
+			fputs("\t],\n\t", stdout);
+			perl_quote_print(stdout, key_);
+			fputs(" => [\n", stdout);
+		}
+	} else {
+		last_key = xstrdup(key_);
+		fputc('\t', stdout);
+		perl_quote_print(stdout, key_);
+		fputs(" => [\n", stdout);
+	}
+	if (value_) {
+		fputs("\t\t", stdout);
+		perl_quote_print(stdout, value_);
+		fputs(",\n", stdout);
+	}
+	return 0;
+}
+
 static int show_config(const char* key_, const char* value_)
 {
 	char value[256];
@@ -138,6 +164,17 @@ int cmd_repo_config(int argc, const char **argv, const char *prefix)
 			type = ...
From: Johannes Schindelin
Date: Monday, January 15, 2007 - 5:15 am

Hi,


A bit shorter (and gets the booleans right, plus being even easier 
towards --python extension):

---

 builtin-repo-config.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/builtin-repo-config.c b/builtin-repo-config.c
index 9063311..8ebf436 100644
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "quote.h"
 
 static const char git_config_set_usage[] =
 "git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -12,11 +13,18 @@ static int use_key_regexp;
 static int do_all;
 static int do_not_match;
 static int seen;
+static const char *perl_prefix = NULL;
 static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
 
 static int show_all_config(const char *key_, const char *value_)
 {
-	if (value_)
+	if (perl_prefix) {
+		printf("%s", perl_prefix);
+		perl_quote_print(stdout, key_);
+		printf(" => ");
+		perl_quote_print(stdout, value_ ? value_ : "true");
+		perl_prefix = ",\n\t";
+	} else if (value_)
 		printf("%s=%s\n", key_, value_);
 	else
 		printf("%s\n", key_);
@@ -138,7 +146,14 @@ int cmd_repo_config(int argc, const char **argv, const char *prefix)
 			type = T_BOOL;
 		else if (!strcmp(argv[1], "--list") || !strcmp(argv[1], "-l"))
 			return git_config(show_all_config);
-		else if (!strcmp(argv[1], "--global")) {
+		else if (!strcmp(argv[1], "--perl")) {
+			int ret;
+			perl_prefix = "\n\t";
+			printf("%%git_config = (");
+			ret = git_config(show_all_config);
+			printf("\n);\n");
+			return ret;
+		} else if (!strcmp(argv[1], "--global")) {
 			char *home = getenv("HOME");
 			if (home) {
 				char *user_config = xstrdup(mkpath("%s/.gitconfig", home));
-

From: Nikolai Weibull
Date: Monday, January 15, 2007 - 8:34 am

If we're going down this slippery slope, why not just give up and add
a --xml switch instead?  Readable by all and a lot more flexible than
--perl, --python, --ruby, --tcl, --sh, --c++, --fortran, --lisp,
--html, --that-next-silver-bullet-language-that-hasnt-been-invented-yet-but-will-need-its-own-switch-once-it-has-been.

That said, parsing the config file as-is can't be so difficult that we
need to export it to separate files with a different syntax, now can
it?

  nikolai
-

From: Johannes Schindelin
Date: Monday, January 15, 2007 - 8:44 am

Hi,



The point is having one parser to rule them all, and avoid having 
different parsers, all with their own set of shortcomings.

Ciao,
Dscho

-

From: Nikolai Weibull
Date: Monday, January 15, 2007 - 9:22 am

As far as I can tell, comparing fork() vs. reading a dump with eval
vs. XML isn't meaningful - parsing a 20-line XML file can hardly be
much more (if it even is more) expensive than evaling a file of the

So then you must agree that having one export format makes a lot of
sense, for the same reasons.  Not that I think that an export format
makes sense in the first place.

  nikolai
-

From: Jakub Narebski
Date: Monday, January 15, 2007 - 9:00 am

Parsing the config file is not _that_ difficult (the first post in this 
thread had config reader in Perl), but it is not that easy: case 
(in)setiviness, quoting, escaping, comments, removing leading and 
trailing whitespace when not quoted...

P.S. I'd rather have an additional implementation (in Perl) conforming 
to yet to be written git ini-like config file specs, to find places 
where canonic parser, git-repo-config, doesn't conform to the specs.
-- 
Jakub Narebski
Poland
-

From: Junio C Hamano
Date: Tuesday, January 16, 2007 - 3:45 am

Perhaps all except humans.

At least YAML, please...

-

From: Johannes Schindelin
Date: Tuesday, January 16, 2007 - 4:12 am

Hi,


I am _strongly_ opposed to all that rubbish. _If_ we want to use 
repo-config to preformat the config variables, we should either

1) just use "git repo-config -l" and STFU, or
2) introduce something like "--dump" which Eric implemented.

Everything else is just _complicating_ matters, and for _what_? _Nothing_ 
at all. If we use repo-config for that task, it should cater for parsing 
by _script languages_, not _users_.

I work with XML everyday. It has its uses. But this here problem is _not_ 
one of them. How silly would that be: we parse an easy-to-read format, 
munge the easy-to-handle internal data format into another "easy-to-read" 
format which is then parsed by a script language into an easy-to-handle 
internal data format? No. NO.

Ciao,
Dscho

P.S.: The more I think about it, we should just use the output of 
"repo-config -l".
-

From: Jakub Narebski
Date: Tuesday, January 16, 2007 - 7:14 am

It wouldn't work. Subsection and value are (almost) free form, and
they can contain '=' in them.


It would be probably best to introduce --dump which would output
config file _as if_ it was written by git-repo-config (probably
without comments).

All values would be in separate lines, there would be only one
section header per each section, values would be quoted if needed
(if they contain comment delimiter, '=' or whitespace).

E.g.

  [section "subsection"] key=a " b; " c; " d; "

would get rewritten as

  [section "subsection"]
  	key = "a  b;  c"

-- 
Jakub Narebski
Poland
-

From: Nikolai Weibull
Date: Tuesday, January 16, 2007 - 3:17 pm

Sadly, yes.  It would be very nice if -l gave unambiguous output for

I don't know if it was clear from my first mail, but I wasn't
suggesting --xml as a serious alternative.  My point was that if we're
going to go through all the fuss of adding all these switches for
outputting the configuration file in some fixed format, why not go
with one that at least is universal in some sense (not necessarily
XML).  And, as Johannes already pointed out, it's very disturbing
having to dump a configuration file so that it is more easily read by
other programs.  That would suggest that the ini-based format for
git's configuration file is suboptimal.

Of course, once git is librified (which is still a long-term goal,
right?), languages could create bindings to the git library, which
would provide access functions to the configuration file.  Then we
would truly have that one parser to rule them all.

  nikolai
-

From: Jakub Narebski
Date: Tuesday, January 16, 2007 - 3:37 pm

No, ini-based, or rather ini-like format for git configuration
is nice, but I think git is too forgiving in accepting input.
Examples: section header and key/value pair in the same line,
allowing multiple quotes in in value part.

Well, the idea I had was to have --dump switch to git-repo-config
to dump init file as if it was created by git-repo-config invocations,
without any hand editing (canonical format).

-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Tuesday, January 16, 2007 - 3:56 pm

Hi,




My point still stands: if you already parse the user-friendly format, why 
not dump a parse friendly format? If it weren't for those darn non-alnums 
in the keys, out put of "git repo-config -l" would be perfectly 
acceptable.

So, how about a "git repo-config --dump" which outputs a stream of NUL 
separated keys and values? This should be really easy to "parse", and 
there are no ambiguities: No key or value can contain a NUL.

Ciao,
Dscho

-

From: Jakub Narebski
Date: Tuesday, January 16, 2007 - 4:24 pm

Dnia wtorek 16. stycznia 2007 23:56, Johannes Schindelin napisa
From: Johannes Schindelin
Date: Wednesday, January 17, 2007 - 1:51 am

Hi,


No it would not:

	[someSection]

but is equivalent to

	[section]
		noval = true


Yes, this is not a boolean. The difference is that the callback function 
is called with NULL in the former case, and with "" in the latter.

Hth,
Dscho

-

From: Jakub Narebski
Date: Wednesday, January 17, 2007 - 2:48 am

$ fatal: bad config file line <nn> in <config>

The same with quoted:

 	[someSection]
 		qthisKey = "has\na\nvalue\with\nseveral\nnewlines"

There is no escaping besides escaping " and escape character
i.e. escaping \ in git config. Se "\n" would work as well as NUL.

But only for "git repo-config --bool --get section.noval" output.
Semantically equivalent to "true".


-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Wednesday, January 17, 2007 - 3:44 am

Hi,



So you want "git-repo-config --dump" to output something which has to be 
scanned for escaping sequences?

If you call

	$ git repo-config -l

you will _no longer_ see "\n"s, but rather newlines.

I don't know why you insist on newlines, when a NUL makes perfect sense: 
Take everything until the next NUL. This is the key. Then take everything 

Yes, it returns "", but this is _wrong_. A single "[section] noval" _only_ 
makes sense as a boolean. The information lies in its _presence_, which is 
as good as saying "true".

Ciao,
Dscho

-

From: Jakub Narebski
Date: Wednesday, January 17, 2007 - 5:11 am

With "\n" as separator you can simply rrturn NUL in the noval case.

-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Wednesday, January 17, 2007 - 5:37 am

Hi,


I just tried this:

	$ cat > .git/config << EOF
	[section] key = "Hello\nWorld"
	EOF
	$ git-repo-config -l
	section.key=Hello
	World


Which would buy you what exactly? You can tell that the user did not say 
"noval = true", but "noval". Great. But the _effect_ should be the same!

Anyway, I realize you don't like my solution, so I will just shut up.

Ciao,
Dscho

-

From: Jakub Narebski
Date: Wednesday, January 17, 2007 - 7:00 am

Sorry, my mistake. I haven't noticed that your previous example
the error was "\w", not embedded newlines, and that embedded newlines

I like your solution.

The only ambiguity is how to deal with '[section] noval' case. You
propose to treat it as if it was '[section] noval = true' for --dump,
and not as if it was '[section] noval = ' or '[section] noval = ""'.
Good. But this _has_ to be explained in documentation.

That said, I still think that having alternate parser for a format
is a good idea. Otherwise it is not a format, but "something that
parser parses".

BTW. it looks like C escape sequences are parsed, but not octal
escape sequences, nor no-op escaping other character.

-- 
Jakub Narebski
Poland
-

From: Jakub Narebski
Date: Friday, January 19, 2007 - 5:10 am

From a bit of testing, as documentation of config file format
is woefully incomplete, (yes, I know I should use the source)
_some_ of C escape sequences aka. character escape codes (CEC)
are parsed:
 - "\t", tab character (HT, TAB)
 - "\n", newline (NL)
 - "\b", backspace (BS)
It's a bit strange that "\b" is parsed...

Other CEC cause "fatal: bad config file line 41 <nn> in <file>"
error:
 - "\r", return (CR)
 - "\f", form feed (FF)
 - "\a", alarm (bell) (BEL)
 - "\e", escape (ESC)
 - "\v", vertical tab (VT)
I think supporting "\e" could be useful, but also dangerous.

Octal char sequences (like \040 or \0 for NUL) are not supported
and result in "bad config file" error.

Literal/unknown escape sequences like "\w" or "\." also result
in "bad config file" error, with the obvious exception of escaping
quote character i.e. \" and escaping escape character i.e. \\

Values of configuration variables can span multiple lines by escaping
newline, i.e. putting \ as the last character. Nice. (This doesn't work
for comments, of course). This doesn't work for section headers nor key 
names.

-- 
Jakub Narebski
Poland
-

From: Jakub Narebski
Date: Friday, January 19, 2007 - 5:25 am

From config.c:parse_value:73

		if (c == '\\') {
			c = get_next_char();
			switch (c) {
			case '\n':
				continue;
			case 't':
				c = '\t';
				break;
			case 'b':
				c = '\b';
				break;
			case 'n':
				c = '\n';
				break;
			/* Some characters escape as themselves */
			case '\\': case '"':
				break;
			/* Reject unknown escape sequences */
			default:
				return NULL;
			}

-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Friday, January 19, 2007 - 6:20 am

Hi,


No, you should not just use the source. You should use the source _and_ 
complete the documentation.

Ciao,
Dscho

-

From: Jakub Narebski
Date: Friday, January 19, 2007 - 3:44 pm

Something like the patch below? Untested! ("make doc" up to 
git-repo-config.txt compiles, though).

I'm not sure how to tell that you can have [section] if you have
[section "subsection"], but you don't need to. And I probably forgot
to add some information.

And I'm not sure if some behavior should not be changed, for example
allowing _any_ line to be continued with `\`, or that other character
escape sequences and perhaps also octal character sequences should be
allowed (either that or `\b` should not be parsed).

I can send proper patch if requested, but I'd rather above issues
were resolved first.

diff --git a/Documentation/config.txt b/Documentation/config.txt
index da7fde5..9544308 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -14,14 +14,53 @@ dot-separated segment and the section name is everything before the last
 dot. The variable names are case-insensitive and only alphanumeric
 characters are allowed. Some variables may appear multiple times.
 
+Syntax
+~~~~~~
+
 The syntax is fairly flexible and permissive; whitespaces are mostly
-ignored. The '#' and ';' characters begin comments to the end of line,
-blank lines are ignored, lines containing strings enclosed in square
-brackets start sections and all the other lines are recognized
-as setting variables, in the form 'name = value'. If there is no equal
-sign on the line, the entire line is taken as 'name' and the variable
-is recognized as boolean "true". String values may be entirely or partially
-enclosed in double quotes; some variables may require special value format.
+ignored.  The '#' and ';' characters begin comments to the end of line,
+blank lines are ignored.
+
+The file consists of sections and variables.  A section begins with
+the name of the section in square brackets and continues until the next
+section begins.  Section names are not case sensitive.  Each variable
+must belong to some section, which means that there must be section
+header before first setting ...
From: Johannes Schindelin
Date: Friday, January 19, 2007 - 5:08 pm

Hi,



I wonder if we should also mention the (case insensitive) alternative 
"[section.subsection]", to give a better idea to people why we actually 

They cannot contain anything else than alphanumeric characters, in 

Maybe give the example of "remote.<name>.fetch" to explain why? Or maybe 


I think you can safely skip that; it should be evident that the format of 
the variables depends on the purpose.

Ciao,
Dscho

-

From: Jakub Narebski
Date: Friday, January 19, 2007 - 5:59 pm

I'm not sure. "section.subsection" can be treated as section with that
name, strange because it has '.' in it, and not subsection "subsection"
of section "section". git-repo-config writes section with subsection

Contrary to for example also ini-like smb.conf:
  Leading, trailing and _internal_ whitespace in section and parameter



This was in the original, and I think it is better left (at least for now).

I wonder if to write about --bool and --int formats...
-- 
Jakub Narebski
Poland
-

From: Junio C Hamano
Date: Friday, January 19, 2007 - 5:19 pm

I do not think we need continuation line for [] headers, if that
is what you mean.  I offhand do not think anybody would scream
if we start parsing full c-quote.

One thing that left me puzzled after reading the description was
what a user can do with "subsection".  It is unclear from the
description if [section "sub.section"], [section"sub.sec=ti.on"]
or worse yet, [section "sub\nsection with an embbedded LF"] are
allowed.  The rest seemed sane.

I think the current repo-config handles sane cases alright, but
it is still fragile in error cases.  For example:

	$ git repo-config 'foo.bar=bzz
          baz.boo' foobar

does not currently barf, but results in a corrupted config file.

	$ git repo-config 'foo.bar=bzz
          baz.boo'
	fatal: bad config file line 56 in .git/config

        $ sed -ne '55,$p' .git/config
        [foo "bar=bzz
        baz"]
                boo = foobar

The only way we use the subsection names for (almost arbitrary)
end user string is to store branch names, so LF is not an issue
and we could forbid it (if need arises loosening the restriction
while updating the code to behave sanely is easier than leaving
it open without properly checking).

-

From: Johannes Schindelin
Date: Friday, January 19, 2007 - 6:25 pm

This will no longer work:

$ git repo-config 'key.with
newline' some-value

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
	On Fri, 19 Jan 2007, Junio C Hamano wrote:
	
	> I think the current repo-config handles sane cases alright, but
	> it is still fragile in error cases.  For example:
	> 
	> 	$ git repo-config 'foo.bar=bzz
	>           baz.boo' foobar
	> 
	> does not currently barf, but results in a corrupted config file.

	Now it barfs.

 config.c               |    5 +++++
 t/t1300-repo-config.sh |    6 ++++++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/config.c b/config.c
index b6082f5..c08c668 100644
--- a/config.c
+++ b/config.c
@@ -661,6 +661,11 @@ int git_config_set_multivar(const char* key, const char* value,
 				goto out_free;
 			}
 			c = tolower(c);
+		} else if (c == '\n') {
+			fprintf(stderr, "invalid key (newline): %s\n", key);
+			free(store.key);
+			ret = 1;
+			goto out_free;
 		}
 		store.key[i] = c;
 	}
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 60acdd3..eb7455b 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -418,5 +418,11 @@ EOF
 
 test_expect_success 'quoting' 'cmp .git/config expect'
 
+test_expect_failure 'key with newline' 'git repo-config key.with\\\
+newline 123'
+
+test_expect_success 'value with newline' 'git repo-config key.sub value.with\\\
+newline'
+
 test_done
 
-- 
1.5.0.rc1.g5a400-dirty

-

From: Junio C Hamano
Date: Friday, January 19, 2007 - 6:40 pm

Thanks.

-

From: Alex Riesen
Date: Monday, January 22, 2007 - 8:06 am

BTW, why config.c never uses error() or warn()?
-

From: Johannes Schindelin
Date: Monday, January 22, 2007 - 8:21 am

Hi,


Mainly because error() was meant to be used as "return error("blabla");", 
and we tried to discern different failures by different return values.

But yeah, I think it would be acceptable to use error() instead of 
fprintf() even then.

BTW IMHO we will probably never libify git; too many too complicated cases 
exist already.

Ciao,
Dscho

-

From: Alex Riesen
Date: Monday, January 22, 2007 - 8:33 am

We probably don't need it libified completely. Just reading and writing of
the object database and tags/references and very simple revision walking
will be a very good start and possibly enough for anything which _uses_ the
database, like importers and exporters.from other VCS or just programs which
need a versioned storage. Comparing, archeology and maintenance tasks are
more often done manually and can wait (they do now, don't they?) until the
rest of diff, patch and revision walking machinery libified properly.
-

From: Johannes Schindelin
Date: Monday, January 22, 2007 - 8:44 am

Hi,


... and even these use xmalloc(), which die()s on out-of-memory. Note that 
it would be a really horrible work to libify that, since you basically 
have to insert gazillions of free() calls at the right point, which we 
don't have to, since exit() cleans up after us anyway.

Ciao,
Dscho
-

From: Alex Riesen
Date: Monday, January 22, 2007 - 9:09 am

Didn't say it'd be simple. Just not impossible.
-

From: Johannes Schindelin
Date: Tuesday, January 23, 2007 - 4:26 am

Hi,


I thought that was what you meant by "very simple revision walking".

Ciao,
Dscho

-

From: Alex Riesen
Date: Tuesday, January 23, 2007 - 5:47 am

that'd be "just a part of revision walking". Dunno what part yet,
because I still have to come up with convincing example of where
only a libgit could be used: just committers and history browsers
should not have any problems just using plumbing (no, stupid
microsoft's createprocess is not a reason enough).
-

From: Jakub Narebski
Date: Saturday, January 20, 2007 - 7:03 am

Separate part of Documentation/config.txt which deals with git config file
syntax into "Syntax" subsection, and expand it.  Add information about
subsections, boolean values, escaping and escape sequences in string
values, and continuing variable value on the next line.

Add also proxy settings to config file example to show example of
partially enclosed in double quotes string value.

Parts based on comments by Junio C Hamano, Johannes Schindelin,
and the smb.conf(5) man page.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---


It is mentioned above "Syntax" section, but perhaps it should be repeated.
I haven't took a look at code to check what values for section names and

This was in the original, and I think it is better left (at least for now).



Thanks for suggestion. I have used it (although perhaps the preceding

I'm not sure what is allowed in section name, and in subsection name,
so for now I have left it as is. I can amend this commit, or add new
commit explaining this.


BTW. currently one of examples from git-repo-config(1) doesn't work:

  To add a new proxy, without altering any of the existing ones, use
  
  ------------
  % git repo-config core.gitproxy '"proxy" for example.com'
  ------------

I think it would be better instead of adding quotes if needed, just
do _not_ escape quotes. But that leaves the problem what to do if one
puts value with trailing or leading whitespace, or comment character
outside quotes using git-repo-config...


 Documentation/config.txt |   69 +++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index da7fde5..03133e2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -14,14 +14,65 @@ dot-separated segment and the section name is everything before the last
 dot. The variable names are case-insensitive and only alphanumeric
 characters are allowed. Some variables may appear ...
From: Jakub Narebski
Date: Monday, January 22, 2007 - 8:25 am

Separate part of Documentation/config.txt which deals with git config file
syntax into "Syntax" subsection, and expand it.  Add information about
subsections, boolean values, escaping and escape sequences in string
values, and continuing variable value on the next line.

Add also proxy settings to config file example to show example of
partially enclosed in double quotes string value.

Parts based on comments by Junio C Hamano, Johannes Schindelin,
config.c, and the smb.conf(5) man page.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

I hope that this is satisfactory.


I haven't wrote about current limits on the lengths: 256/2 for section
plus subsection name length, 256 for fully qualified variable name,
1024 for value length; I think this does not belong to end user
documentation.

 Documentation/config.txt |   76 +++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f1f409d..77a2b16 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -14,14 +14,72 @@ dot-separated segment and the section name is everything before the last
 dot. The variable names are case-insensitive and only alphanumeric
 characters are allowed. Some variables may appear multiple times.
 
+Syntax
+~~~~~~
+
 The syntax is fairly flexible and permissive; whitespaces are mostly
-ignored. The '#' and ';' characters begin comments to the end of line,
-blank lines are ignored, lines containing strings enclosed in square
-brackets start sections and all the other lines are recognized
-as setting variables, in the form 'name = value'. If there is no equal
-sign on the line, the entire line is taken as 'name' and the variable
-is recognized as boolean "true". String values may be entirely or partially
-enclosed in double quotes; some variables may require special value format.
+ignored.  The '#' and ';' characters begin comments to the end of line,
+blank lines ...
From: Jakub Narebski
Date: Wednesday, January 24, 2007 - 7:14 am

Contrary to variable values, in subsection names parsing character
escape codes (besides literal escaping of " as \", and \ as \\)
is not performed; subsection name cannot contain newlines.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 Documentation/config.txt |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 77a2b16..d8244b1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -37,8 +37,8 @@ in the section header, like in example below:
 
 --------
 
-Subsection names can contain any characters (doublequote '`"`', backslash
-'`\`' and newline have to be entered escaped as '`\"`', '`\\`' and '`\n`',
+Subsection names can contain any characters except newline (doublequote
+'`"`' and backslash have to be escaped as '`\"`' and '`\\`',
 respecitvely) and are case sensitive.  Section header cannot span multiple
 lines.  Variables may belong directly to a section or to a given subsection.
 You can have `[section]` if you have `[section "subsection"]`, but you
-- 
1.4.4.4

-

From: Johannes Schindelin
Date: Tuesday, January 16, 2007 - 3:42 pm

Hi,


I never pointed out such a thing. The configuration file is meant to be 
user-friendly, as the inventor did not mean to have a program like 

Not at all. Again, git's configuration file is meant for human inspection. 
Therefore, an ini-style file is optimal.

And for scripts, we do have git-repo-config. But now some people want to 
be clever and read the whole config file in, to make it easier to have 
gazillions of configuration options for their script.

I was not happy when we introduced more relaxed section titles, and I am 

You are welcome to contribute to that goal! AFAICT nobody is seriously 
working on that.

Ciao,
Dscho

-

From: Nikolai Weibull
Date: Wednesday, January 17, 2007 - 11:08 am

Then what did you mean in your mail from Jan 16, 2007 12:12 PM:

  How silly would that be: we parse an easy-to-read format,
  munge the easy-to-handle internal data format into another "easy-to-read"
  format which is then parsed by a script language into an easy-to-handle
  internal data format? No. NO.


It is suboptimal because it's hard for computers to inspect.  An
optimal format would be accessible by all, whether human, machine,
robot, android, what have you.

I really don't like that people take my statements and somehow contort
them into meaning something else.  You can't take my statement and
then claim that because git's configuration file was meant for human
inspection my /suggestion/ about it being suboptimal is somehow flawed
because I didn't consider the original intent of the file in question.

Agreed, ini-style files are sweet for humans, but that's not what I
was saying.  My claim was that ini-style files aren't necessarily
optimal for parsing by computers.  I think that was made clear.  Also,
I only /suggested/ that it may be /suboptimal/ not that it in fact

I maintain the git completion-definition for Zsh.  Being able to get
at the configuration settings is important, both for the completion of
git-repo-config, but also to provide completions of remote
repositories.  'git-repo-config -l' works fine, but it's not
guaranteed to be unambiguous, as the name of a remote repository can
contain characters like '=', so it can be difficult to decide where to
split the name of the remote repository from the url it represents as

I'm with you on this one.

  nikolai
-

From: Jakub Narebski
Date: Wednesday, January 17, 2007 - 12:22 pm

Could you add it to contrib/completion/ or at least put link
in http://git.or.cz/gitwiki/InterfacesFrontendsAndTools ?

TIA
-- 
Jakub Narebski
Poland
-

From: Nikolai Weibull
Date: Wednesday, January 17, 2007 - 1:01 pm

I did announce it way back when, but since it's included with new
releases of Zsh there's really nothing interesting to add or link to.
It's also in Zsh's CVS repository, but I do my updates here:

  http://git.bitwi.se/?p=dot-home.git;a=tree

although I now realize that I haven't pushed some recent updates to
that repository yet.

  nikolai
-

From: Jakub Narebski
Date: Wednesday, January 17, 2007 - 12:25 pm

More relaxed section (actually _sub_section) titles are not the problem.
The fact that they are case sensitive (where section and key names
are not) can be.

-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Wednesday, January 17, 2007 - 5:50 pm

Hi,


It was about the human-readable -> internal -> human-readable -> internal
chain. Those are way too many transformations for little gain.

IMHO modern programs spend 99% of the time are spent transforming data. 

There's a reason we program in C, Java, etc. and not in Assembler. 
Sometimes computers and humans are too different to be able to cater for 
both of them at the same time.

I know, I know. The geek inside me disagrees, too. But my experience 
doesn't.

Ciao,
Dscho

-

From: Eric Wong
Date: Tuesday, January 16, 2007 - 12:09 pm

I agree with these statements.  I actually had YAML in my code
originally, but ripped it out because it would be another round of
parsing for any language using a YAML parser.

I like YAML, but no, this isn't the place for it IMHO.

-- 
Eric Wong
-

From: Eric Wong
Date: Tuesday, January 16, 2007 - 2:51 am

Your version doesn't get arrays right, however.

Here's a Perl/Python/Ruby version below.  It should be extendable for
other languages; feedback and additions appreciated:

Note that usage has been changed to --dump=(perl|python|ruby)

I may add key_suffix to lang_dump just to be consistent with pairings,
but array_start seems to handle all cases of it and it would be
redundant...

--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "quote.h"
 
 static const char git_config_set_usage[] =
 "git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -14,6 +15,90 @@ static int do_not_match;
 static int seen;
 static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
 
+struct lang_dump {
+	const char *name;
+	const char *decl_start;
+	const char *decl_end;
+	const char *key_prefix;
+	const char *array_start;
+	const char *array_end;
+	const char *val_prefix;
+	const char *val_suffix;
+	const char *true_val; /* should already be quoted, if needed */
+	void (*quote_key_fn)(FILE *, const char*);
+	void (*quote_val_fn)(FILE *, const char*);
+};
+static char *last_key;
+static struct lang_dump *lang;
+static struct lang_dump lang_dump_defs[] = {
+	{ "perl",
+		"\%git_config = (\n", ");\n",
+		"\t",
+		" => [\n", "\t],\n",
+		"\t\t", ",\n",
+		"'true'",
+		perl_quote_print, perl_quote_print },
+	{ "python",
+		"git_config = {\n", "}\n",
+		"    ",
+		" : [\n", "    ],\n",
+		"        ", ",\n",
+		"True",
+		python_quote_print, python_quote_print },
+	{ "ruby", /* Ruby is very Perl-like */
+		"git_config = {\n", "}\n",
+		"  ",
+		" => [\n", "  ],\n",
+		"    ", ",\n",
+		"true",
+		perl_quote_print, perl_quote_print },
+};
+
+static int show_lang_config(const char *key_, const char *value_)
+{
+	if (last_key) {
+		if (strcmp(last_key, ...
From: Johannes Schindelin
Date: Tuesday, January 16, 2007 - 3:47 am

Hi,


That's right.



I don't understand why you do not consolidate that into using tabs for 

So this makes _all_ config vars arrays? It is consistent, yes... but it is 

IMHO this would be much easier to read using a path_list:

	struct path_list_item *item = path_list_lookup(lang_name, &langs);

	if (item == NULL)
		return -1;



Ciao,
Dscho

-

From: Eric Wong
Date: Tuesday, January 16, 2007 - 12:53 pm

Yes, I was trying to imagine a corner case where quoting
for keys could be different than values.

I know Ruby can use :symbols but those don't support '.' and '-' as
far as I know, Perl doesn't require quoting for keys if they match
^-?\w+.  I guess just using quoted strings as keys is fine enough.

I wanted to make things look familiar to people using those languages.

Most Ruby programmers I've seen use 2-space indents.  I myself have given
into using them when I write Ruby.

Python doesn't seem to care about indentation for data structures, but I
think Python programmers prefer spaces for indentation.  I'm not very
experienced with Python, however.

Tabs are my own personal preference for Perl; but indentation is very
inconsistent in Perl code I've looked at :/.  perlstyle(1) actually
recommends 4 space indents...

On the other hand, we are writing for interpreters and not humans.  So
maybe just using tabs is good enough (some formatting makes debugging

Somewhat ugly, yes, but I think returning everything as an array would
make things easier for code using the data structures.  They could
always just reference the first element if they didn't want the array

Ah, I didn't know about path_list_lookup().  Now that I know about it, I
don't think it's worth it to create the extra data structures around
it.  We can just as easily switch to bsearch(3) when we add more


--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "quote.h"
 
 static const char git_config_set_usage[] =
 "git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -14,6 +15,89 @@ static int do_not_match;
 static int seen;
 static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
 
+struct lang_dump {
+	const char *name;
+	const char *decl_start;
+	const char ...
Previous thread: [PATCH] Document the -n option for git-log by Josef 'Jeff' Sipek on Sunday, January 14, 2007 - 5:25 pm. (2 messages)

Next thread: [ANNOUNCE] Guilt v0.18 by Josef Sipek on Sunday, January 14, 2007 - 6:02 pm. (1 message)