To make gitweb faster I thought about adding to it, or to Git.pm,
simple nonvalidation config file reader. Nonvalidating means that
it would accept some input which git-repo-config considers invalid.
Some of the trouble is caused because of coner cases like this
example
[section "sub ; # sect \" ion\\"] ; "]
key = a " b ; " ; " c ; "
which is valid, but strange.
I'm not proficent in Perl, so help is appreciated.
-- >8 --
#!/usr/bin/perl
use strict;
use warnings;
use Text::Balanced qw(extract_delimited);
sub read_config {
my $configfile = shift;
my $section = shift;
my %config;
open my $fd, $configfile
or die "Cannot open $configfile: $!";
my $sectfull;
while (my $line = <$fd>) {
chomp $line;
if ($line =~ m/^\s*\[\s*([^][:space:]]*)\s*\](.*)$/) {
# section without subsection
my $sect = lc($1);
$sectfull = $sect;
} elsif ($line =~ m/\s*\[([^][:space:]]*)\s"((?:\\.|[^"])*)"\](.*)$/) {
# section with subsection
my $sect = lc($1);
my $subsect = $2;
$subsect =~ s/\\(.)/$1/g;
$sectfull = "$sect.$subsect";
} elsif ($line =~ m/\s*(\w+)\s*=\s*(.*?)\s*$/) {
# variable assignment
my $key = lc($1);
my $rhs = $2;
my $value = '';
my ($next, $remainder, $prefix) = qw();
DELIM: {
do {
($next, $remainder, $prefix) =
extract_delimited($rhs, '"', qr/(?:\\.|[^"])*/);
if ($prefix =~ s/\s*[;#].*$//) {
# comment in unquoted part
$value .= $prefix;
last DELIM;
} else {
$value .= $prefix if $prefix;
if ($next && $next =~ s/^"(.*)"$/$1/) {
$value .= $next;
}
}
$rhs = $remainder;
} while ($rhs && $next);
} # DELIM:
if ($remainder) {
$remainder =~ s/\s*[;#].*$//;
$value .= $remainder;
}
$value =~ s/\\(.)/$1/g;
if (exists $config{"$sectfull.$key"}) {
push @{$config{"$sectfull.$key"}}, $value;
} else {
$config{"$sectfull.$key"} = [ $value ];
}
} ...How about something like git-for-each-ref that dumps the entire output of a config file into an eval()-able string? That way we don't have to deal with corner-cases and subtle differences between C and Perl implementations. -- Eric Wong -
The idea is (at least for gitweb) to avoid cost of fork. And I think if the format gets documented properly, there should be no differences in config file parsing. Please remember also that is first draft of git config file reader in perl; an alpha version. -- Jakub Narebski Poland -
If the Perl output is redirected to a file (say .git/config.perl) and only regenerated when .git/config changes, `do(".git/config.perl")' will likely be faster since all the parsing will be done by Perl itself. -- Eric Wong -
So long as its automatic in gitweb.cgi and based on the stat attributes of .git/config, OK. But my database background tells me two copies of the same thing is fishy... -- Shawn. -
Would you write "git repo-config --perl", then? ;-) Besides, I'd rather avoid the need for /tmp/gitweb, and I think usually gitweb do not have (and should not have) write access to repository. -- Jakub Narebski Poland -
The below patch should be a start (only tested on my fairly standard
Good point. Having to maintain a .git/config.perl in the repository
would be a pain from an administrative standpoint; but on the other hand
.git/config is not often regenerated.
I don't think giving gitweb write access to a repo is a good idea;
either. Perhaps it would be updated via hook like the HTTP stuff.
IMHO, there is nothing wrong with gitweb writing to /tmp; however.
diff --git a/builtin-repo-config.c b/builtin-repo-config.c
index 9063311..a9ef358 100644
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
#include "builtin.h"
#include "cache.h"
+#include "quote.h"
static const char git_config_set_usage[] =
"git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -13,6 +14,7 @@ static int do_all;
static int do_not_match;
static int seen;
static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
+static char *last_key;
static int show_all_config(const char *key_, const char *value_)
{
@@ -23,6 +25,30 @@ static int show_all_config(const char *key_, const char *value_)
return 0;
}
+static int show_perl_config(const char *key_, const char *value_)
+{
+ if (last_key) {
+ if (strcmp(last_key, key_)) {
+ free(last_key);
+ last_key = xstrdup(key_);
+ fputs("\t],\n\t", stdout);
+ perl_quote_print(stdout, key_);
+ fputs(" => [\n", stdout);
+ }
+ } else {
+ last_key = xstrdup(key_);
+ fputc('\t', stdout);
+ perl_quote_print(stdout, key_);
+ fputs(" => [\n", stdout);
+ }
+ if (value_) {
+ fputs("\t\t", stdout);
+ perl_quote_print(stdout, value_);
+ fputs(",\n", stdout);
+ }
+ return 0;
+}
+
static int show_config(const char* key_, const char* value_)
{
char value[256];
@@ -138,6 +164,17 @@ int cmd_repo_config(int argc, const char **argv, const char *prefix)
type = ...Hi,
A bit shorter (and gets the booleans right, plus being even easier
towards --python extension):
---
builtin-repo-config.c | 19 +++++++++++++++++--
1 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/builtin-repo-config.c b/builtin-repo-config.c
index 9063311..8ebf436 100644
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
#include "builtin.h"
#include "cache.h"
+#include "quote.h"
static const char git_config_set_usage[] =
"git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -12,11 +13,18 @@ static int use_key_regexp;
static int do_all;
static int do_not_match;
static int seen;
+static const char *perl_prefix = NULL;
static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
static int show_all_config(const char *key_, const char *value_)
{
- if (value_)
+ if (perl_prefix) {
+ printf("%s", perl_prefix);
+ perl_quote_print(stdout, key_);
+ printf(" => ");
+ perl_quote_print(stdout, value_ ? value_ : "true");
+ perl_prefix = ",\n\t";
+ } else if (value_)
printf("%s=%s\n", key_, value_);
else
printf("%s\n", key_);
@@ -138,7 +146,14 @@ int cmd_repo_config(int argc, const char **argv, const char *prefix)
type = T_BOOL;
else if (!strcmp(argv[1], "--list") || !strcmp(argv[1], "-l"))
return git_config(show_all_config);
- else if (!strcmp(argv[1], "--global")) {
+ else if (!strcmp(argv[1], "--perl")) {
+ int ret;
+ perl_prefix = "\n\t";
+ printf("%%git_config = (");
+ ret = git_config(show_all_config);
+ printf("\n);\n");
+ return ret;
+ } else if (!strcmp(argv[1], "--global")) {
char *home = getenv("HOME");
if (home) {
char *user_config = xstrdup(mkpath("%s/.gitconfig", home));
-
If we're going down this slippery slope, why not just give up and add a --xml switch instead? Readable by all and a lot more flexible than --perl, --python, --ruby, --tcl, --sh, --c++, --fortran, --lisp, --html, --that-next-silver-bullet-language-that-hasnt-been-invented-yet-but-will-need-its-own-switch-once-it-has-been. That said, parsing the config file as-is can't be so difficult that we need to export it to separate files with a different syntax, now can it? nikolai -
Hi, The point is having one parser to rule them all, and avoid having different parsers, all with their own set of shortcomings. Ciao, Dscho -
As far as I can tell, comparing fork() vs. reading a dump with eval vs. XML isn't meaningful - parsing a 20-line XML file can hardly be much more (if it even is more) expensive than evaling a file of the So then you must agree that having one export format makes a lot of sense, for the same reasons. Not that I think that an export format makes sense in the first place. nikolai -
Parsing the config file is not _that_ difficult (the first post in this thread had config reader in Perl), but it is not that easy: case (in)setiviness, quoting, escaping, comments, removing leading and trailing whitespace when not quoted... P.S. I'd rather have an additional implementation (in Perl) conforming to yet to be written git ini-like config file specs, to find places where canonic parser, git-repo-config, doesn't conform to the specs. -- Jakub Narebski Poland -
Perhaps all except humans. At least YAML, please... -
Hi, I am _strongly_ opposed to all that rubbish. _If_ we want to use repo-config to preformat the config variables, we should either 1) just use "git repo-config -l" and STFU, or 2) introduce something like "--dump" which Eric implemented. Everything else is just _complicating_ matters, and for _what_? _Nothing_ at all. If we use repo-config for that task, it should cater for parsing by _script languages_, not _users_. I work with XML everyday. It has its uses. But this here problem is _not_ one of them. How silly would that be: we parse an easy-to-read format, munge the easy-to-handle internal data format into another "easy-to-read" format which is then parsed by a script language into an easy-to-handle internal data format? No. NO. Ciao, Dscho P.S.: The more I think about it, we should just use the output of "repo-config -l". -
It wouldn't work. Subsection and value are (almost) free form, and they can contain '=' in them. It would be probably best to introduce --dump which would output config file _as if_ it was written by git-repo-config (probably without comments). All values would be in separate lines, there would be only one section header per each section, values would be quoted if needed (if they contain comment delimiter, '=' or whitespace). E.g. [section "subsection"] key=a " b; " c; " d; " would get rewritten as [section "subsection"] key = "a b; c" -- Jakub Narebski Poland -
Sadly, yes. It would be very nice if -l gave unambiguous output for I don't know if it was clear from my first mail, but I wasn't suggesting --xml as a serious alternative. My point was that if we're going to go through all the fuss of adding all these switches for outputting the configuration file in some fixed format, why not go with one that at least is universal in some sense (not necessarily XML). And, as Johannes already pointed out, it's very disturbing having to dump a configuration file so that it is more easily read by other programs. That would suggest that the ini-based format for git's configuration file is suboptimal. Of course, once git is librified (which is still a long-term goal, right?), languages could create bindings to the git library, which would provide access functions to the configuration file. Then we would truly have that one parser to rule them all. nikolai -
No, ini-based, or rather ini-like format for git configuration is nice, but I think git is too forgiving in accepting input. Examples: section header and key/value pair in the same line, allowing multiple quotes in in value part. Well, the idea I had was to have --dump switch to git-repo-config to dump init file as if it was created by git-repo-config invocations, without any hand editing (canonical format). -- Jakub Narebski Poland -
Hi, My point still stands: if you already parse the user-friendly format, why not dump a parse friendly format? If it weren't for those darn non-alnums in the keys, out put of "git repo-config -l" would be perfectly acceptable. So, how about a "git repo-config --dump" which outputs a stream of NUL separated keys and values? This should be really easy to "parse", and there are no ambiguities: No key or value can contain a NUL. Ciao, Dscho -
Dnia wtorek 16. stycznia 2007 23:56, Johannes Schindelin napisa
Hi, No it would not: [someSection] but is equivalent to [section] noval = true Yes, this is not a boolean. The difference is that the callback function is called with NULL in the former case, and with "" in the latter. Hth, Dscho -
$ fatal: bad config file line <nn> in <config> The same with quoted: [someSection] qthisKey = "has\na\nvalue\with\nseveral\nnewlines" There is no escaping besides escaping " and escape character i.e. escaping \ in git config. Se "\n" would work as well as NUL. But only for "git repo-config --bool --get section.noval" output. Semantically equivalent to "true". -- Jakub Narebski Poland -
Hi, So you want "git-repo-config --dump" to output something which has to be scanned for escaping sequences? If you call $ git repo-config -l you will _no longer_ see "\n"s, but rather newlines. I don't know why you insist on newlines, when a NUL makes perfect sense: Take everything until the next NUL. This is the key. Then take everything Yes, it returns "", but this is _wrong_. A single "[section] noval" _only_ makes sense as a boolean. The information lies in its _presence_, which is as good as saying "true". Ciao, Dscho -
With "\n" as separator you can simply rrturn NUL in the noval case. -- Jakub Narebski Poland -
Hi, I just tried this: $ cat > .git/config << EOF [section] key = "Hello\nWorld" EOF $ git-repo-config -l section.key=Hello World Which would buy you what exactly? You can tell that the user did not say "noval = true", but "noval". Great. But the _effect_ should be the same! Anyway, I realize you don't like my solution, so I will just shut up. Ciao, Dscho -
Sorry, my mistake. I haven't noticed that your previous example the error was "\w", not embedded newlines, and that embedded newlines I like your solution. The only ambiguity is how to deal with '[section] noval' case. You propose to treat it as if it was '[section] noval = true' for --dump, and not as if it was '[section] noval = ' or '[section] noval = ""'. Good. But this _has_ to be explained in documentation. That said, I still think that having alternate parser for a format is a good idea. Otherwise it is not a format, but "something that parser parses". BTW. it looks like C escape sequences are parsed, but not octal escape sequences, nor no-op escaping other character. -- Jakub Narebski Poland -
From a bit of testing, as documentation of config file format is woefully incomplete, (yes, I know I should use the source) _some_ of C escape sequences aka. character escape codes (CEC) are parsed: - "\t", tab character (HT, TAB) - "\n", newline (NL) - "\b", backspace (BS) It's a bit strange that "\b" is parsed... Other CEC cause "fatal: bad config file line 41 <nn> in <file>" error: - "\r", return (CR) - "\f", form feed (FF) - "\a", alarm (bell) (BEL) - "\e", escape (ESC) - "\v", vertical tab (VT) I think supporting "\e" could be useful, but also dangerous. Octal char sequences (like \040 or \0 for NUL) are not supported and result in "bad config file" error. Literal/unknown escape sequences like "\w" or "\." also result in "bad config file" error, with the obvious exception of escaping quote character i.e. \" and escaping escape character i.e. \\ Values of configuration variables can span multiple lines by escaping newline, i.e. putting \ as the last character. Nice. (This doesn't work for comments, of course). This doesn't work for section headers nor key names. -- Jakub Narebski Poland -
From config.c:parse_value:73
if (c == '\\') {
c = get_next_char();
switch (c) {
case '\n':
continue;
case 't':
c = '\t';
break;
case 'b':
c = '\b';
break;
case 'n':
c = '\n';
break;
/* Some characters escape as themselves */
case '\\': case '"':
break;
/* Reject unknown escape sequences */
default:
return NULL;
}
--
Jakub Narebski
Poland
-
Hi, No, you should not just use the source. You should use the source _and_ complete the documentation. Ciao, Dscho -
Something like the patch below? Untested! ("make doc" up to git-repo-config.txt compiles, though). I'm not sure how to tell that you can have [section] if you have [section "subsection"], but you don't need to. And I probably forgot to add some information. And I'm not sure if some behavior should not be changed, for example allowing _any_ line to be continued with `\`, or that other character escape sequences and perhaps also octal character sequences should be allowed (either that or `\b` should not be parsed). I can send proper patch if requested, but I'd rather above issues were resolved first. diff --git a/Documentation/config.txt b/Documentation/config.txt index da7fde5..9544308 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -14,14 +14,53 @@ dot-separated segment and the section name is everything before the last dot. The variable names are case-insensitive and only alphanumeric characters are allowed. Some variables may appear multiple times. +Syntax +~~~~~~ + The syntax is fairly flexible and permissive; whitespaces are mostly -ignored. The '#' and ';' characters begin comments to the end of line, -blank lines are ignored, lines containing strings enclosed in square -brackets start sections and all the other lines are recognized -as setting variables, in the form 'name = value'. If there is no equal -sign on the line, the entire line is taken as 'name' and the variable -is recognized as boolean "true". String values may be entirely or partially -enclosed in double quotes; some variables may require special value format. +ignored. The '#' and ';' characters begin comments to the end of line, +blank lines are ignored. + +The file consists of sections and variables. A section begins with +the name of the section in square brackets and continues until the next +section begins. Section names are not case sensitive. Each variable +must belong to some section, which means that there must be section +header before first setting ...
Hi, I wonder if we should also mention the (case insensitive) alternative "[section.subsection]", to give a better idea to people why we actually They cannot contain anything else than alphanumeric characters, in Maybe give the example of "remote.<name>.fetch" to explain why? Or maybe I think you can safely skip that; it should be evident that the format of the variables depends on the purpose. Ciao, Dscho -
I'm not sure. "section.subsection" can be treated as section with that name, strange because it has '.' in it, and not subsection "subsection" of section "section". git-repo-config writes section with subsection Contrary to for example also ini-like smb.conf: Leading, trailing and _internal_ whitespace in section and parameter This was in the original, and I think it is better left (at least for now). I wonder if to write about --bool and --int formats... -- Jakub Narebski Poland -
I do not think we need continuation line for [] headers, if that
is what you mean. I offhand do not think anybody would scream
if we start parsing full c-quote.
One thing that left me puzzled after reading the description was
what a user can do with "subsection". It is unclear from the
description if [section "sub.section"], [section"sub.sec=ti.on"]
or worse yet, [section "sub\nsection with an embbedded LF"] are
allowed. The rest seemed sane.
I think the current repo-config handles sane cases alright, but
it is still fragile in error cases. For example:
$ git repo-config 'foo.bar=bzz
baz.boo' foobar
does not currently barf, but results in a corrupted config file.
$ git repo-config 'foo.bar=bzz
baz.boo'
fatal: bad config file line 56 in .git/config
$ sed -ne '55,$p' .git/config
[foo "bar=bzz
baz"]
boo = foobar
The only way we use the subsection names for (almost arbitrary)
end user string is to store branch names, so LF is not an issue
and we could forbid it (if need arises loosening the restriction
while updating the code to behave sanely is easier than leaving
it open without properly checking).
-
This will no longer work:
$ git repo-config 'key.with
newline' some-value
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
On Fri, 19 Jan 2007, Junio C Hamano wrote:
> I think the current repo-config handles sane cases alright, but
> it is still fragile in error cases. For example:
>
> $ git repo-config 'foo.bar=bzz
> baz.boo' foobar
>
> does not currently barf, but results in a corrupted config file.
Now it barfs.
config.c | 5 +++++
t/t1300-repo-config.sh | 6 ++++++
2 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/config.c b/config.c
index b6082f5..c08c668 100644
--- a/config.c
+++ b/config.c
@@ -661,6 +661,11 @@ int git_config_set_multivar(const char* key, const char* value,
goto out_free;
}
c = tolower(c);
+ } else if (c == '\n') {
+ fprintf(stderr, "invalid key (newline): %s\n", key);
+ free(store.key);
+ ret = 1;
+ goto out_free;
}
store.key[i] = c;
}
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 60acdd3..eb7455b 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -418,5 +418,11 @@ EOF
test_expect_success 'quoting' 'cmp .git/config expect'
+test_expect_failure 'key with newline' 'git repo-config key.with\\\
+newline 123'
+
+test_expect_success 'value with newline' 'git repo-config key.sub value.with\\\
+newline'
+
test_done
--
1.5.0.rc1.g5a400-dirty
-
BTW, why config.c never uses error() or warn()? -
Hi, Mainly because error() was meant to be used as "return error("blabla");", and we tried to discern different failures by different return values. But yeah, I think it would be acceptable to use error() instead of fprintf() even then. BTW IMHO we will probably never libify git; too many too complicated cases exist already. Ciao, Dscho -
We probably don't need it libified completely. Just reading and writing of the object database and tags/references and very simple revision walking will be a very good start and possibly enough for anything which _uses_ the database, like importers and exporters.from other VCS or just programs which need a versioned storage. Comparing, archeology and maintenance tasks are more often done manually and can wait (they do now, don't they?) until the rest of diff, patch and revision walking machinery libified properly. -
Hi, ... and even these use xmalloc(), which die()s on out-of-memory. Note that it would be a really horrible work to libify that, since you basically have to insert gazillions of free() calls at the right point, which we don't have to, since exit() cleans up after us anyway. Ciao, Dscho -
Didn't say it'd be simple. Just not impossible. -
Hi, I thought that was what you meant by "very simple revision walking". Ciao, Dscho -
that'd be "just a part of revision walking". Dunno what part yet, because I still have to come up with convincing example of where only a libgit could be used: just committers and history browsers should not have any problems just using plumbing (no, stupid microsoft's createprocess is not a reason enough). -
Separate part of Documentation/config.txt which deals with git config file syntax into "Syntax" subsection, and expand it. Add information about subsections, boolean values, escaping and escape sequences in string values, and continuing variable value on the next line. Add also proxy settings to config file example to show example of partially enclosed in double quotes string value. Parts based on comments by Junio C Hamano, Johannes Schindelin, and the smb.conf(5) man page. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- It is mentioned above "Syntax" section, but perhaps it should be repeated. I haven't took a look at code to check what values for section names and This was in the original, and I think it is better left (at least for now). Thanks for suggestion. I have used it (although perhaps the preceding I'm not sure what is allowed in section name, and in subsection name, so for now I have left it as is. I can amend this commit, or add new commit explaining this. BTW. currently one of examples from git-repo-config(1) doesn't work: To add a new proxy, without altering any of the existing ones, use ------------ % git repo-config core.gitproxy '"proxy" for example.com' ------------ I think it would be better instead of adding quotes if needed, just do _not_ escape quotes. But that leaves the problem what to do if one puts value with trailing or leading whitespace, or comment character outside quotes using git-repo-config... Documentation/config.txt | 69 +++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 62 insertions(+), 7 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index da7fde5..03133e2 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -14,14 +14,65 @@ dot-separated segment and the section name is everything before the last dot. The variable names are case-insensitive and only alphanumeric characters are allowed. Some variables may appear ...
Separate part of Documentation/config.txt which deals with git config file syntax into "Syntax" subsection, and expand it. Add information about subsections, boolean values, escaping and escape sequences in string values, and continuing variable value on the next line. Add also proxy settings to config file example to show example of partially enclosed in double quotes string value. Parts based on comments by Junio C Hamano, Johannes Schindelin, config.c, and the smb.conf(5) man page. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- I hope that this is satisfactory. I haven't wrote about current limits on the lengths: 256/2 for section plus subsection name length, 256 for fully qualified variable name, 1024 for value length; I think this does not belong to end user documentation. Documentation/config.txt | 76 +++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 69 insertions(+), 7 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index f1f409d..77a2b16 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -14,14 +14,72 @@ dot-separated segment and the section name is everything before the last dot. The variable names are case-insensitive and only alphanumeric characters are allowed. Some variables may appear multiple times. +Syntax +~~~~~~ + The syntax is fairly flexible and permissive; whitespaces are mostly -ignored. The '#' and ';' characters begin comments to the end of line, -blank lines are ignored, lines containing strings enclosed in square -brackets start sections and all the other lines are recognized -as setting variables, in the form 'name = value'. If there is no equal -sign on the line, the entire line is taken as 'name' and the variable -is recognized as boolean "true". String values may be entirely or partially -enclosed in double quotes; some variables may require special value format. +ignored. The '#' and ';' characters begin comments to the end of line, +blank lines ...
Contrary to variable values, in subsection names parsing character escape codes (besides literal escaping of " as \", and \ as \\) is not performed; subsection name cannot contain newlines. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- Documentation/config.txt | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 77a2b16..d8244b1 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -37,8 +37,8 @@ in the section header, like in example below: -------- -Subsection names can contain any characters (doublequote '`"`', backslash -'`\`' and newline have to be entered escaped as '`\"`', '`\\`' and '`\n`', +Subsection names can contain any characters except newline (doublequote +'`"`' and backslash have to be escaped as '`\"`' and '`\\`', respecitvely) and are case sensitive. Section header cannot span multiple lines. Variables may belong directly to a section or to a given subsection. You can have `[section]` if you have `[section "subsection"]`, but you -- 1.4.4.4 -
Hi, I never pointed out such a thing. The configuration file is meant to be user-friendly, as the inventor did not mean to have a program like Not at all. Again, git's configuration file is meant for human inspection. Therefore, an ini-style file is optimal. And for scripts, we do have git-repo-config. But now some people want to be clever and read the whole config file in, to make it easier to have gazillions of configuration options for their script. I was not happy when we introduced more relaxed section titles, and I am You are welcome to contribute to that goal! AFAICT nobody is seriously working on that. Ciao, Dscho -
Then what did you mean in your mail from Jan 16, 2007 12:12 PM: How silly would that be: we parse an easy-to-read format, munge the easy-to-handle internal data format into another "easy-to-read" format which is then parsed by a script language into an easy-to-handle internal data format? No. NO. It is suboptimal because it's hard for computers to inspect. An optimal format would be accessible by all, whether human, machine, robot, android, what have you. I really don't like that people take my statements and somehow contort them into meaning something else. You can't take my statement and then claim that because git's configuration file was meant for human inspection my /suggestion/ about it being suboptimal is somehow flawed because I didn't consider the original intent of the file in question. Agreed, ini-style files are sweet for humans, but that's not what I was saying. My claim was that ini-style files aren't necessarily optimal for parsing by computers. I think that was made clear. Also, I only /suggested/ that it may be /suboptimal/ not that it in fact I maintain the git completion-definition for Zsh. Being able to get at the configuration settings is important, both for the completion of git-repo-config, but also to provide completions of remote repositories. 'git-repo-config -l' works fine, but it's not guaranteed to be unambiguous, as the name of a remote repository can contain characters like '=', so it can be difficult to decide where to split the name of the remote repository from the url it represents as I'm with you on this one. nikolai -
Could you add it to contrib/completion/ or at least put link in http://git.or.cz/gitwiki/InterfacesFrontendsAndTools ? TIA -- Jakub Narebski Poland -
I did announce it way back when, but since it's included with new releases of Zsh there's really nothing interesting to add or link to. It's also in Zsh's CVS repository, but I do my updates here: http://git.bitwi.se/?p=dot-home.git;a=tree although I now realize that I haven't pushed some recent updates to that repository yet. nikolai -
More relaxed section (actually _sub_section) titles are not the problem. The fact that they are case sensitive (where section and key names are not) can be. -- Jakub Narebski Poland -
Hi, It was about the human-readable -> internal -> human-readable -> internal chain. Those are way too many transformations for little gain. IMHO modern programs spend 99% of the time are spent transforming data. There's a reason we program in C, Java, etc. and not in Assembler. Sometimes computers and humans are too different to be able to cater for both of them at the same time. I know, I know. The geek inside me disagrees, too. But my experience doesn't. Ciao, Dscho -
I agree with these statements. I actually had YAML in my code originally, but ripped it out because it would be another round of parsing for any language using a YAML parser. I like YAML, but no, this isn't the place for it IMHO. -- Eric Wong -
Your version doesn't get arrays right, however.
Here's a Perl/Python/Ruby version below. It should be extendable for
other languages; feedback and additions appreciated:
Note that usage has been changed to --dump=(perl|python|ruby)
I may add key_suffix to lang_dump just to be consistent with pairings,
but array_start seems to handle all cases of it and it would be
redundant...
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
#include "builtin.h"
#include "cache.h"
+#include "quote.h"
static const char git_config_set_usage[] =
"git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -14,6 +15,90 @@ static int do_not_match;
static int seen;
static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
+struct lang_dump {
+ const char *name;
+ const char *decl_start;
+ const char *decl_end;
+ const char *key_prefix;
+ const char *array_start;
+ const char *array_end;
+ const char *val_prefix;
+ const char *val_suffix;
+ const char *true_val; /* should already be quoted, if needed */
+ void (*quote_key_fn)(FILE *, const char*);
+ void (*quote_val_fn)(FILE *, const char*);
+};
+static char *last_key;
+static struct lang_dump *lang;
+static struct lang_dump lang_dump_defs[] = {
+ { "perl",
+ "\%git_config = (\n", ");\n",
+ "\t",
+ " => [\n", "\t],\n",
+ "\t\t", ",\n",
+ "'true'",
+ perl_quote_print, perl_quote_print },
+ { "python",
+ "git_config = {\n", "}\n",
+ " ",
+ " : [\n", " ],\n",
+ " ", ",\n",
+ "True",
+ python_quote_print, python_quote_print },
+ { "ruby", /* Ruby is very Perl-like */
+ "git_config = {\n", "}\n",
+ " ",
+ " => [\n", " ],\n",
+ " ", ",\n",
+ "true",
+ perl_quote_print, perl_quote_print },
+};
+
+static int show_lang_config(const char *key_, const char *value_)
+{
+ if (last_key) {
+ if (strcmp(last_key, ...Hi, That's right. I don't understand why you do not consolidate that into using tabs for So this makes _all_ config vars arrays? It is consistent, yes... but it is IMHO this would be much easier to read using a path_list: struct path_list_item *item = path_list_lookup(lang_name, &langs); if (item == NULL) return -1; Ciao, Dscho -
Yes, I was trying to imagine a corner case where quoting
for keys could be different than values.
I know Ruby can use :symbols but those don't support '.' and '-' as
far as I know, Perl doesn't require quoting for keys if they match
^-?\w+. I guess just using quoted strings as keys is fine enough.
I wanted to make things look familiar to people using those languages.
Most Ruby programmers I've seen use 2-space indents. I myself have given
into using them when I write Ruby.
Python doesn't seem to care about indentation for data structures, but I
think Python programmers prefer spaces for indentation. I'm not very
experienced with Python, however.
Tabs are my own personal preference for Perl; but indentation is very
inconsistent in Perl code I've looked at :/. perlstyle(1) actually
recommends 4 space indents...
On the other hand, we are writing for interpreters and not humans. So
maybe just using tabs is good enough (some formatting makes debugging
Somewhat ugly, yes, but I think returning everything as an array would
make things easier for code using the data structures. They could
always just reference the first element if they didn't want the array
Ah, I didn't know about path_list_lookup(). Now that I know about it, I
don't think it's worth it to create the extra data structures around
it. We can just as easily switch to bsearch(3) when we add more
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,5 +1,6 @@
#include "builtin.h"
#include "cache.h"
+#include "quote.h"
static const char git_config_set_usage[] =
"git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
@@ -14,6 +15,89 @@ static int do_not_match;
static int seen;
static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
+struct lang_dump {
+ const char *name;
+ const char *decl_start;
+ const char ...