Re: [PATCH 1/4] Split up default "core" config parsing into helper routine

Previous thread: RE: cvs2git with modules? by Kelly F. Hickel on Wednesday, June 18, 2008 - 4:24 pm. (1 message)

Next thread: [ANNOUNCE] GIT 1.5.6 by Junio C Hamano on Wednesday, June 18, 2008 - 7:24 pm. (14 messages)
To: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Cc: Denis Bueno <dbueno@...>
Date: Wednesday, June 18, 2008 - 6:29 pm

So these four patches end up adding support for conditionally enabling
fsync() on loose object creation in the .gitconfig file with something
like

[core]
FsyncObjectFiles = true

which can be useful on filesystems that don't already guarantee data
consistency for other reasons (whether due to ordered writes or due to
full data journalling).

Actually, just the last one adds the fairly trivial feature, the three
first patches are just cleanups of the config parsing that I needed in
order to not gouge my eyes out when looking at the code. The config file
parser is kind of ad-hoc and people have added more and more options to it
without ever trying to clean it up, and I refuse to do that.

The full patch-series is

Split up default "core" config parsing into helper routine
Split up default "user" config parsing into helper routine
Split up default "i18n" and "branch" config parsing into helper
routines
Add config option to enable 'fsync()' of object files

resulting in the following diffstat:

Documentation/config.txt | 8 ++++
cache.h | 1 +
config.c | 82 ++++++++++++++++++++++++++++++++++-----------
environment.c | 1 +
sha1_file.c | 3 +-
5 files changed, 74 insertions(+), 21 deletions(-)

and the patches themselves will follow..

Linus
--

To: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Cc: Denis Bueno <dbueno@...>
Date: Wednesday, June 18, 2008 - 6:30 pm

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 18 Jun 2008 14:37:18 -0700
Subject: [PATCH 1/4] Split up default "core" config parsing into helper routine

It makes the code a bit easier to read, and in theory a bit faster too
(no need to compare all the different "core.*" strings against non-core
config options).

The config system really should get something of a complete overhaul,
but in the absense of that, this at least improves on it a tiny bit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
config.c | 42 ++++++++++++++++++++++++++----------------
1 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/config.c b/config.c
index c2f2bbb..c3597e0 100644
--- a/config.c
+++ b/config.c
@@ -332,7 +332,7 @@ int git_config_string(const char **dest, const char *var, const char *value)
return 0;
}

-int git_default_config(const char *var, const char *value, void *dummy)
+static int git_default_core_config(const char *var, const char *value)
{
/* This needs a better name */
if (!strcmp(var, "core.filemode")) {
@@ -444,6 +444,31 @@ int git_default_config(const char *var, const char *value, void *dummy)
return 0;
}

+ if (!strcmp(var, "core.pager"))
+ return git_config_string(&pager_program, var, value);
+
+ if (!strcmp(var, "core.editor"))
+ return git_config_string(&editor_program, var, value);
+
+ if (!strcmp(var, "core.excludesfile"))
+ return git_config_string(&excludes_file, var, value);
+
+ if (!strcmp(var, "core.whitespace")) {
+ if (!value)
+ return config_error_nonbool(var);
+ whitespace_rule_cfg = parse_whitespace_rule(value);
+ return 0;
+ }
+
+ /* Add other config variables here and to Documentation/config.txt. */
+ return 0;
+}
+
+int git_default_config(const char *var, const char *value, void *dummy)
+{
+ if (!prefixcmp(var, "core."))
+ return git_default_core_config(var, value);
+
if (!strcmp(var, "user.name")) {
if (!value)
return confi...

To: Linus Torvalds <torvalds@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 6:49 pm

Maybe it would be easier still to read (and unmeasurably more efficient)
to actually do it like:

if (!prefixcmp(var, "core."))
return git_default_core_config(var+5, value);
...
int git_default_core_config(const char *var, const char *value)
{
if (!strcmp(var, "pager"))

I was curious a while ago and instrumented git_config to write the PID
to a tempfile each time it was called. Most git programs parse the
config files (.git/config, ~/.gitconfig, /etc/gitconfig) three times
each, with some doing it as many as five times.

Most of the config functions are simply "if this key, then set this
value". I wonder if it would be simpler to just load the whole thing at
once, using a table similar to parseopt.

Then we could do useful things like say "you specified core.foobar, but
there is no such variable." I know we can't know all values, since some
non-git programs put values in the config, but I don't think it's
unreasonable for us to claim all of core.*, especially if it helps us
catch simple configuration errors.

-Peff
--

To: Jeff King <peff@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 6:58 pm

I considered it, but I think that screws up error reporting (ie if some
value is unparseable, it would then print out the wrong variable name).

It would also have made the patches much less obvious. So it's a "future

Yeah, I know. I love the config file format (quite frankly, anybody who
thinks XML and friends are sane is a total moron and should be shot before
they reproduce), but the whole parsing code was a really quick hack.

I've several times wanted to rewrite it so that it does something smarter
(parse it once, save it in a nice data structure), but let's face it, the

No. We could already do that (just add it to the end of
git_default_core_config - it should be called last even if there was a
chain), but avoid doing that very much on purpose.

Why? Because it's really irritating to have a parser that complains about
newer values (or old deprecated ones) that don't matter for that version
of the program.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 7:13 pm

I actually find the hierarchy a bit nonsensical. We talk about
core.pager and branch.master.remote, but the config file doesn't look
like that. So I buy that it's

[core]
pager

but then it seems kind of arbitrary to have

[branch "master"]
remote

instead of

[branch]
master.remote

or

[branch]
[master]
remote

which is of course impossible because we don't have a "close section"
syntax element.

It seems like the '.' implies hierarchy, but the syntax of the file
doesn't really follow it. But that's a minor issue, and really not worth

I think we need to refactor what happens at the beginning of the 'git'
wrapper, and this may be part of it. But as it stands now, the worktree
setup is a minefield of bugs (e.g., calling a command via an alias
versus directly can get you different results for whether you have a
worktree!), and a simple patch I did to make pager use configurable blew
up in my face due to implicit constraints in when we look at the config
file (i.e., it mostly worked, but reading the config file implied doing
a setup_git_directory, which then screwed up a later invocation of
setup_git_directory).

So I think those things need cleaning up, and refactoring the config
reading could probably be part of that. I might try to look at it in the

Even a "check my config for bogosity" program would be nice, and then it
wouldn't bug you all the time. But you _can't_ do it now, because the
callback mechanism only looks at a subset of the values. IOW, there is
no single function that you call that represents every config value;
different programs use different callbacks.

In the case of just core.*, though, you might be able to get away with
it (I don't know how closely we've adhered to "all of core.* is in
git_default_config).

-Peff
--

To: Jeff King <peff@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 7:34 pm

Hierarchy is idiotic in a human-readable file. It's not how people work.

And you're looking at the wrong part. You're looking at the _code_ part,
which is not the primary thing. The primary thing is the config file
syntax. And

[branch "mybranch"]
url = xyz

is a hell of a lot more readable than any alternatives I've ever seen, and
no, there is no hierarchy _anywhere_ there, and shouldn't be.

Forget about "branch.mybranch.url". It has no meaning. It's not what you
are supposed to ever use as a human (it's purely for scripting). It's not
worth even thinking about.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 8:08 pm

Have you read "man git-config" lately?

-Peff
--

To: Jeff King <peff@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 8:23 pm

That whole program is there just for scripting. Of _course_ it talks about
the machine format!

Nobody should ever use "git-config" normally. You should fire up your
editor and just edit the damn file in place. But no, we don't have a
man-page for that.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 9:23 pm

Forget the command git-config; all of the config _variables_ are
documented in "foo.bar" form. Name one place in all of the documentation
where the user can find out about core.pager, but the "foo.bar" form is
not used.

-Peff
--

To: Linus Torvalds <torvalds@...>
Cc: Denis Bueno <dbueno@...>, Jeff King <peff@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 8:32 pm

I was asked to proofread a git book somebody is starting to write, and the
first thing the book taught was to use "config user.name".

I gave a suggestion that we should not teach config as the first thing but
instead we should show editing the $HOME/.gitconfig file manually, to give
the feeling that what git deals with is approachable, but I realize the
suggestion is not convincing when our own Documentation/gittutorial.txt
does it fairly early in it, too.
--

To: Jeff King <peff@...>
Cc: Denis Bueno <dbueno@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 8:28 pm

Btw, one reason people end up talking about git-config is that it's easier
to tell *other* people to "run this command" than it is to tell them to
"edit this file so it looks like <xyz>".

And I think that's understandable, but still very sad. That file was
designed to be edited by humans, and it's actually much easier to edit
that way than to do lots of "git config" runs.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Denis Bueno <dbueno@...>, Jeff King <peff@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Wednesday, June 18, 2008 - 10:10 pm

The other factor is that you sometimes want to tell somebody one option
setting, and you're using a medium where there's a comprehension cost to
newlines. So 'remote.origin.url blah' is more clear than '[remote
"origin"]
url = blah' in those contexts. And we don't seem to encourage
'[remote origin] url = blah', even though it turns out to work fine (at
least, I didn't know until I just now tried it that I could suggest it).

-Daniel
*This .sig left intentionally blank*
--

To: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Cc: Denis Bueno <dbueno@...>
Date: Wednesday, June 18, 2008 - 6:31 pm

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 18 Jun 2008 14:40:35 -0700
Subject: [PATCH 2/4] Split up default "user" config parsing into helper routine

This follows the example of the "core" config, and splits out the
default "user" config option parsing into a helper routine.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
config.c | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index c3597e0..ee7642b 100644
--- a/config.c
+++ b/config.c
@@ -464,11 +464,8 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}

-int git_default_config(const char *var, const char *value, void *dummy)
+static int git_default_user_config(const char *var, const char *value)
{
- if (!prefixcmp(var, "core."))
- return git_default_core_config(var, value);
-
if (!strcmp(var, "user.name")) {
if (!value)
return config_error_nonbool(var);
@@ -487,6 +484,18 @@ int git_default_config(const char *var, const char *value, void *dummy)
return 0;
}

+ /* Add other config variables here and to Documentation/config.txt. */
+ return 0;
+}
+
+int git_default_config(const char *var, const char *value, void *dummy)
+{
+ if (!prefixcmp(var, "core."))
+ return git_default_core_config(var, value);
+
+ if (!prefixcmp(var, "user."))
+ return git_default_user_config(var, value);
+
if (!strcmp(var, "i18n.commitencoding"))
return git_config_string(&git_commit_encoding, var, value);

--
1.5.6.rc3.7.g336d0.dirty

--

To: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Cc: Denis Bueno <dbueno@...>
Date: Wednesday, June 18, 2008 - 6:31 pm

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 18 Jun 2008 15:00:11 -0700
Subject: [PATCH 3/4] Split up default "i18n" and "branch" config parsing into helper routines

.. just to finish it off. We'll leave the pager color config alone,
since it is such an odd-ball special case anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
config.c | 40 +++++++++++++++++++++++++++++-----------
1 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/config.c b/config.c
index ee7642b..9d14a74 100644
--- a/config.c
+++ b/config.c
@@ -488,25 +488,20 @@ static int git_default_user_config(const char *var, const char *value)
return 0;
}

-int git_default_config(const char *var, const char *value, void *dummy)
+static int git_default_i18n_config(const char *var, const char *value)
{
- if (!prefixcmp(var, "core."))
- return git_default_core_config(var, value);
-
- if (!prefixcmp(var, "user."))
- return git_default_user_config(var, value);
-
if (!strcmp(var, "i18n.commitencoding"))
return git_config_string(&git_commit_encoding, var, value);

if (!strcmp(var, "i18n.logoutputencoding"))
return git_config_string(&git_log_output_encoding, var, value);

- if (!strcmp(var, "pager.color") || !strcmp(var, "color.pager")) {
- pager_use_color = git_config_bool(var,value);
- return 0;
- }
+ /* Add other config variables here and to Documentation/config.txt. */
+ return 0;
+}

+static int git_default_branch_config(const char *var, const char *value)
+{
if (!strcmp(var, "branch.autosetupmerge")) {
if (value && !strcasecmp(value, "always")) {
git_branch_track = BRANCH_TRACK_ALWAYS;
@@ -535,6 +530,29 @@ int git_default_config(const char *var, const char *value, void *dummy)
return 0;
}

+int git_default_config(const char *var, const char *value, void *dummy)
+{
+ if (!prefixcmp(var, "core."))
+ return git_default_core_config(var, value);
+
+ if (!prefixcmp(var, "user."))
+ ...

To: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Cc: Denis Bueno <dbueno@...>
Date: Wednesday, June 18, 2008 - 6:32 pm

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 18 Jun 2008 15:18:44 -0700
Subject: [PATCH 4/4] Add config option to enable 'fsync()' of object files

As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.

It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.

[*] Yes, I updated the docs for the thing. Hell really _has_ frozen
over, and the four horsemen are probably just beyond the horizon.
EVERYBODY PANIC!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, this is the actual real and trivial patch.

Documentation/config.txt | 8 ++++++++
cache.h | 1 +
config.c | 5 +++++
environment.c | 1 +
sha1_file.c | 3 ++-
5 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5331b45..01689f1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -372,6 +372,14 @@ core.whitespace::
does not trigger if the character before such a carriage-return
is not a whitespace (not enabled by default).

+core.fsyncobjectfiles::
+ This boolean will enable 'fsync()' when writing object files.
++
+This is a total waste of time and effort on a filesystem that orders
+data writes properly, but can be useful for filesystems that do not use
+journalling (traditional UNIX filesystems) or that only journal metadata
+and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+
alias.*::
Command aliases for the linkgit:git[1] command wrapper - e.g.
after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/cache.h b/cache.h
index 81b7e17..01c8502 100644
--- a/cache.h
+++ b/cache.h
@@ -435,6 +4...

Previous thread: RE: cvs2git with modules? by Kelly F. Hickel on Wednesday, June 18, 2008 - 4:24 pm. (1 message)

Next thread: [ANNOUNCE] GIT 1.5.6 by Junio C Hamano on Wednesday, June 18, 2008 - 7:24 pm. (14 messages)