Re: git status --porcelain is a mess that needs fixing

Previous thread: Git and Linux tarball size evolution by Victor Grishchenko on Friday, April 9, 2010 - 9:33 am. (2 messages)

Next thread: git rebase command and docs questions by Eugene Sajine on Friday, April 9, 2010 - 11:49 am. (8 messages)
From: Eric Raymond
Date: Friday, April 9, 2010 - 11:46 am

I'm going to gripe a lot in this mail, possibly verging on flaming.
Therefore I want to start by making clear that I am not here to
complain without pitching in to help fix the problems.  If I can get
responsive answers to my questions, I will take responsibility for
editing them into the relevant git documentation,

Short version: "git status --porcelain" is horribly badly documented
and appears to be seriously maldesigned.  Both these problems need to
be fixed before git causes a lot of unnecessary grief for people
trying to use it.

Here is the entire documentation on this feature in HEAD:

=============================================================================

In short-format, the status of each path is shown as

	XY PATH1 -> PATH2

where `PATH1` is the path in the `HEAD`, and ` -> PATH2` part is
shown only when `PATH1` corresponds to a different path in the
index/worktree (i.e. renamed).

For unmerged entries, `X` shows the status of stage #2 (i.e. ours) and `Y`
shows the status of stage #3 (i.e. theirs).

For entries that do not have conflicts, `X` shows the status of the index,
and `Y` shows the status of the work tree.  For untracked paths, `XY` are
`??`.

    X          Y     Meaning
    -------------------------------------------------
              [MD]   not updated
    M        [ MD]   updated in index
    A        [ MD]   added to index
    D        [ MD]   deleted from index
    R        [ MD]   renamed in index
    C        [ MD]   copied in index
    [MARC]           index and work tree matches
    [ MARC]     M    work tree changed since index
    [ MARC]     D    deleted in work tree
    -------------------------------------------------
    D           D    unmerged, both deleted
    A           U    unmerged, added by us
    U           D    unmerged, deleted by them
    U           A    unmerged, added by them
    D           U    unmerged, deleted by us
    A           A    unmerged, both added
    U           U    unmerged, both ...
From: Junio C Hamano
Date: Friday, April 9, 2010 - 1:30 pm

These all fall within "Patches welcome" category (meaning: I agree the

Is that DD really "illustrative", or did you mean to say "only/sole"?

You should never get "DD" in non-conflicting case.  I think I was fairly
careful not to make them ambiguous when I did that code, but apparently I
wasn't so careful about the documentation.

Thanks for going through this area with fine comb.

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 1cab91b..313dd04 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -86,7 +86,7 @@ and `Y` shows the status of the work tree.  For untracked paths, `XY` are
               [MD]   not updated
     M        [ MD]   updated in index
     A        [ MD]   added to index
-    D        [ MD]   deleted from index
+    D         [ M]    deleted from index
     R        [ MD]   renamed in index
     C        [ MD]   copied in index
     [MARC]           index and work tree matches

--

From: Jeff King
Date: Friday, April 9, 2010 - 9:09 pm

Note that "status --porcelain" is brand new in v1.7.0, so you may be
among the first to be seriously reading the documentation. As Junio
said, I think patches in this area are very welcome.

My answers below are meant to help you understand. I omitted the "...and
yes, this should be documented better" from the end of each, but you can


It's a space. But more importantly, the path columns are actually
C-quoted. E.g.:

  $ perl -e 'open foo, ">", "foo\n"'\
  $ git add .
  $ git status --porcelain
  A  "foo\n"

If your parser supports it, it will almost certainly be easier to use
"-z":

  $ git status --porcelain -z | cat -A
  A  foo$
  ^@

Do note that for the 'R'ename status, you will get _two_ NUL-terminated
entries, and they will be in the order of "to\0from\0", whereas the
non-NUL form is "from -> to" (and no, I doubt this is adequately

They are the same as in "git diff --name-status", which in turn has kind

The terms "us / ours" and "them / theirs" are frequently used in the git
documentation.  I'm not sure if they are ever defined rigorously. They
are only meaningful in a merging context, and basically refer to the two
sides of a merge. If I am on branch "master" and do "git merge foo",
then "us" refers to the master branch and the the contents of index
stage 2 (bear with me a moment, I'll define that in a second). "Them"
refers to branch "foo" and index stage 3.

Git's "index" is where it keeps uncommitted state about files it tracks
(sort of like CVS/Entries, if that helps, except that git exposes the
concept much more). Most of the time, you use it for building a commit
incrementally. You "git add" files to the index, and then "git commit"
creates a new commit from the contents of your index.

But the index actually has several different slots for each file entry,
which are called stages, and each has a number. "Stage #0" is the
"normal" stage, which you use as described in the last paragraph. During
a merge, entries with conflicts use the other ...
From: Jonathan Nieder
Date: Friday, April 9, 2010 - 10:46 pm

"git clean -n -d" may help.

Just my 2¢,
Jonathan
--

From: Jonathan Nieder
Date: Friday, April 9, 2010 - 10:51 pm

err, "git clean -n -d -X".

I am also not sure how stable the "Would remove " output format is,
or how stable we want it to be.  Probably not stable at all, so
sorry about that.

Jonathan
--

From: Jeff King
Date: Friday, April 9, 2010 - 11:03 pm

That's the same information, isn't it? You do "git clean -ndX" to see
_everything_ that is untracked, and "git clean -nd" to see things that
are untracked but not ignored. So I think it is just as painful to use
as ls-files, but as you noted, it is not really plumbing.

-Peff
--

From: Jonathan Nieder
Date: Friday, April 9, 2010 - 11:12 pm

No, the capital X tells clean to only list excluded files.  The
standard use is as a poor man’s “make maintainer-clean”, leaving
unrelated files alone.

I only learned about it just now.  I’m glad I did (I often use the
lowercase version for this because I just didn’t know about -X), but
as you mentioned, it is not so applicable here because not plumbing.

Jonathan
--

From: Jeff King
Date: Friday, April 9, 2010 - 11:32 pm

Ah, I read it as "-x" (probably because I had never heard of "-X"
either...).

So yes, it would do the right thing. I still think a --show-ignored
option to git-status would probably be better (in addition to being
sanctioned plumbing, it means we only have to traverse the tree once

The "-X" mode seems much safer to me, as you are less likely to blow
away things you actually wanted to keep while cleaning the tree of
crufty build products. It seems like it should have been the
easier-to-type "-x", but it is far too late for such bikeshedding at
this point.

Thanks for the pointer.

-Peff
--

From: Eric Raymond
Date: Friday, April 9, 2010 - 10:59 pm

They do that quite well.  Thank you.

I've got a couple of other things on my plate, including prepping for 
a GPSD point release early next week, so I can't respond immediately.
Expect a response and some patches Tueday, Wednesday, or Thursday
of next week.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Julian Phillips
Date: Saturday, April 10, 2010 - 6:35 am

Not true.  If the second form was used, then you _can_ split on \0.  It
will tokenise the data for you, and then you consume ether two or three
tokens depending on the status flags.  So it would make the parsing
simpler.  But to make it even easier, how about adding a -Z that makes the
output format "XY\0file1\0[file2]\0" (i.e. always three tokens per record,
with the third token being empty if there is no second filename)?  Though
if future expandability was wanted you could end each record with \0\0 and
then parsing would be a two stages of split on \0\0 for records and then
split on \0 for entries?  The is already precedence for the -z option to
change the output format, so a second similar switch should be ok?  Then
the updated documentation could recommend --porcelain -Z for new users
without affecting old ones.

-- 
Julian
--

From: Eric Raymond
Date: Saturday, April 10, 2010 - 7:43 am

+1

-Z could fix some of the other issues, as well, like use of space
as a flag character.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Jon Seymour
Date: Saturday, April 10, 2010 - 7:56 am

On Sat, Apr 10, 2010 at 11:35 PM, Julian Phillips

Surely that won't work - if file2 can be empty, \0[file2]\0 reduces to
\0\0 which would be confused with the \0\0 proposed as a record
separator.

jon.
--

From: Julian Phillips
Date: Saturday, April 10, 2010 - 8:50 am

On Sun, 11 Apr 2010 00:56:47 +1000, Jon Seymour <jon.seymour@gmail.com>

Yes.  But they were alternative suggestions, so if using \0\0 as the
record marker you would omit the second filename when empty as is currently
done.

-- 
Julian
--

From: Jon Seymour
Date: Saturday, April 10, 2010 - 4:33 pm

On Sun, Apr 11, 2010 at 1:50 AM, Julian Phillips

Ah, apologies. I appear to have failed to parse a necessary disjunctive :-)

jon.
--

From: Julian Phillips
Date: Saturday, April 10, 2010 - 12:25 pm

Add a new output format option to git-status that is a more extreme
form of the -z output that places a NUL between all parts of the
record, and always has three entries per record, even when only two
are relevant.  This make the parsing of --porcelain output much
simpler for the consumer.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---


Something like this for the first variant (fixed three entries per record)
perhaps ... (though a proper patch would probably want some tests too)

 builtin/commit.c |    6 ++++--
 wt-status.c      |   19 ++++++++++++++-----
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index c5ab683..acbcefc 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1025,8 +1025,10 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		OPT_SET_INT(0, "porcelain", &status_format,
 			    "show porcelain output format",
 			    STATUS_FORMAT_PORCELAIN),
-		OPT_BOOLEAN('z', "null", &null_termination,
-			    "terminate entries with NUL"),
+		OPT_SET_INT('z', "null", &null_termination,
+			    "terminate entries with NUL", 1),
+		OPT_SET_INT('Z', "intense-null", &null_termination,
+			    "use NUL for all seperators, including absent values", 2),
 		{ OPTION_STRING, 'u', "untracked-files", &untracked_files_arg,
 		  "mode",
 		  "show untracked files, optional modes: all, normal, no. (Default: all)",
diff --git a/wt-status.c b/wt-status.c
index 8ca59a2..9f23ec6 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -663,7 +663,9 @@ static void wt_shortstatus_unmerged(int null_termination, struct string_list_ite
 	case 7: how = "UU"; break; /* both modified */
 	}
 	color_fprintf(s->fp, color(WT_STATUS_UNMERGED, s), "%s", how);
-	if (null_termination) {
+	if (null_termination == 2) {
+		fprintf(stdout, "%c%s%c%c", 0, it->string, 0, 0);
+	} else if (null_termination) {
 		fprintf(stdout, " %s%c", it->string, 0);
 	} else {
 		struct strbuf onebuf = STRBUF_INIT;
@@ -687,14 ...
From: Eric Raymond
Date: Saturday, April 10, 2010 - 12:50 pm

If you're open to changing this to lose the exiguous "-> " and use "-"
instead of " " as a status character, that would make me happy 
and fix the rest of the design problems with the format.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Julian Phillips
Date: Saturday, April 10, 2010 - 1:34 pm

If you use "--porcelain -Z" then you don't get the "->", the format is
always XY<NUL><file1><NUL><file2><NUL>, with <file2> being an empty string
if only file1 is relevant.

I didn't use "-" instead of " " as that seemed out of scope for a output
formatting option.  Though I don't personally have an objection to it, I
also don't see a particularly strong need for it as with the -Z format
there is no ambiguity.

If you're talking about the output without -Z, then changing the format
raises compatibility issues, and were talking about something more like
--porcelain2 or --porcelain=new and I don't know if that would be
considered acceptable.

-- 
Julian

--

From: Eric Raymond
Date: Saturday, April 10, 2010 - 2:12 pm

Good point.  OK, the combinaation of -Z and a switch to list ignored
files should solve Emacs VC's problem.  

Having some sort of JSON dump might still not be a bad idea.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Julian Phillips
Date: Saturday, April 10, 2010 - 4:03 pm

This adds a --json switch to status, which enables a json output
format.  This provides a standard output format that should be easily
parsed by scripts using any of the large number of readily available
json libraries.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---

Starter for 10 ...

 builtin/commit.c |   10 ++++
 wt-status.c      |  132 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 wt-status.h      |    1 +
 3 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index c5ab683..f2b5cfa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -91,6 +91,7 @@ static enum {
 	STATUS_FORMAT_LONG,
 	STATUS_FORMAT_SHORT,
 	STATUS_FORMAT_PORCELAIN,
+	STATUS_FORMAT_JSON,
 } status_format = STATUS_FORMAT_LONG;
 
 static int opt_parse_m(const struct option *opt, const char *arg, int unset)
@@ -422,6 +423,9 @@ static int run_status(FILE *fp, const char *index_file, const char *prefix, int
 	case STATUS_FORMAT_PORCELAIN:
 		wt_porcelain_print(s, null_termination);
 		break;
+	case STATUS_FORMAT_JSON:
+		wt_json_print(s);
+		break;
 	case STATUS_FORMAT_LONG:
 		wt_status_print(s);
 		break;
@@ -1025,6 +1029,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		OPT_SET_INT(0, "porcelain", &status_format,
 			    "show porcelain output format",
 			    STATUS_FORMAT_PORCELAIN),
+		OPT_SET_INT(0, "json", &status_format,
+			    "show json output format",
+			    STATUS_FORMAT_JSON),
 		OPT_BOOLEAN('z', "null", &null_termination,
 			    "terminate entries with NUL"),
 		{ OPTION_STRING, 'u', "untracked-files", &untracked_files_arg,
@@ -1068,6 +1075,9 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	case STATUS_FORMAT_PORCELAIN:
 		wt_porcelain_print(&s, null_termination);
 		break;
+	case STATUS_FORMAT_JSON:
+		wt_json_print(&s);
+		break;
 	case STATUS_FORMAT_LONG:
 		s.verbose = verbose;
 		wt_status_print(&s);
diff --git a/wt-status.c b/wt-status.c
index ...
Previous thread: Git and Linux tarball size evolution by Victor Grishchenko on Friday, April 9, 2010 - 9:33 am. (2 messages)

Next thread: git rebase command and docs questions by Eugene Sajine on Friday, April 9, 2010 - 11:49 am. (8 messages)