Re: git diff woes

Previous thread: [PATCH] git-clean: Fix error message if clean.requireForce is not set. by Johannes Sixt on Monday, November 12, 2007 - 1:27 am. (5 messages)

Next thread: Re: [PATCH] Simplify strchrnul() compat code by Jakub Narebski on Monday, November 12, 2007 - 3:07 am. (1 message)
From: Andreas Ericsson
Subject: git diff woes
Date: Monday, November 12, 2007 - 2:44 am

I recently ran into an oddity with the excellent git diff output
format. When a function declaration changes in the same patch as
something else in a function, the old declaration is used with the
diff hunk-headers.

Consider this hunk:
---%<---%<---%<---
@@ -583,75 +346,100 @@ double jitter_request(const char *host, int *status){
        if(verbose) printf("%d candiate peers available\n", num_candidates);
        if(verbose && syncsource_found) printf("synchronization source found\n")
        if(! syncsource_found){
-               *status = STATE_UNKNOWN;
+               status = STATE_WARNING;
                if(verbose) printf("warning: no synchronization source found\n")
        }
---%<---%<---%<---

It definitely looks like a bug, but really isn't, since an earlier hunk
(pasted below) changes the declaration. There were several hunks between
these two, so it was far from obvious when I saw it first.

---%<---%<---%<---
@@ -517,19 +276,22 @@ setup_control_request(ntp_control_message *p, uint8_t opco
 }
 
 /* XXX handle responses with the error bit set */
-double jitter_request(const char *host, int *status){
-       int conn=-1, i, npeers=0, num_candidates=0, syncsource_found=0;
-       int run=0, min_peer_sel=PEER_INCLUDED, num_selected=0, num_valid=0;
+int ntp_request(const char *host, double *offset, int *offset_result, double *j
+       int conn=-1, i, npeers=0, num_candidates=0;
+       int min_peer_sel=PEER_INCLUDED;
        int peers_size=0, peer_offset=0;
+       int status;
---%<---%<---%<--- 
 
This makes it impossible to trust the hunk-header info if the declaration
changes. It might be better to not write it out when the header-line is
also part of the patch. That would at least force one to go back and find
the real declaration. Best would probably be to write the new declaration,
but I'm unsure if that could cause some other confusion.

I haven't started looking into it yet, and as I'm sure there are others
who are much more familiar with the xdiff ...
From: Johannes Schindelin
Date: Monday, November 12, 2007 - 3:01 am

Hi,


Huh?  You admit yourself that it is not a bug.  And sure you can trust the 
hunk header.  Like most of the things, the relate to the _original_ 
version, since the diff is meant to be applied as a forward patch.

So for all practical matters, the diff shows the correct thing: "in this 
hunk, which (still) belongs to that function, change this and this."

Of course, that is only the case if you accept that the diff should be 
applied _in total_, not piecewise.  IOW if you are a fan of GNU patch 
which happily clobbers your file until it fails with the last hunk, you 
will not be happy.

Ciao,
Dscho

-

From: Andreas Ericsson
Date: Monday, November 12, 2007 - 3:35 am

In the check_ntpd.c program, there is no bug. I found the git diff output

You're right. GNU patch will apply one hunk and then happily churn on even
if it fails. git-apply will apply all hunks or none, so all hunks can assume
that all previous hunks were successfully applied. So what was your point
again?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Johannes Schindelin
Date: Monday, November 12, 2007 - 3:50 am

Hi,


My point was that this diff is not to be read as if the previous hunks had 
been applied.  Just look at the context: it is also the original file.

It seems I am singularly unable to explain plain concepts as this: a diff 
assumes that the file is yet unchanged.

So I'll stop.

Ciao,
Dscho

-

From: Andreas Ericsson
Date: Monday, November 12, 2007 - 4:19 am

The context is ambiguous, as it must be present in both the new and the
old file for it to actually *be* context. Otherwise it would be part of

Sure, but the useraid with writing the apparent function declaration in
the hunk header *will* be confusing if the function declaration changes

Give me something valuable instead, such as your opinion on whether it
would be better to not print the function declaration at all if it will
be changed by applying the same patch, or if one should pick one of the
declarations from old or new and, if so, which one to pick.

I simply refuse to believe that you wouldn't immediately think the hunk
below holds an obvious bug. I thought so because of the helpful function
context git diff prints (which is a helper for human reviewers, and not
something git-apply or GNU patch needs to work), and now I want to do
something about it so others won't have to suffer the same confusion.

@@ -583,75 +346,100 @@ double jitter_request(const char *host, int *status){
       if(verbose) printf("%d candiate peers available\n", num_candidates);
       if(verbose && syncsource_found) printf("synchronization source found\n")
       if(! syncsource_found){
-               *status = STATE_UNKNOWN;
+               status = STATE_WARNING;
               if(verbose) printf("warning: no synchronization source found\n")
       }

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Junio C Hamano
Date: Monday, November 12, 2007 - 2:30 pm

This is what I get from "GNU diff -pu" which makes me surpried
that anybody finds "git diff" hunk header surprising.  Notice
that hunk at line 84.

--- read-cache.c	2007-11-12 12:08:00.000000000 -0800
+++ read-cache.c+	2007-11-12 12:07:54.000000000 -0800
@@ -60,7 +60,7 @@ static int ce_compare_data(struct cache_
 	return match;
 }
 
-static int ce_compare_link(struct cache_entry *ce, size_t expected_size)
+static int ce_compare_lonk(struct cache_entry *ce, size_t expected_size)
 {
 	int match = -1;
 	char *target;
@@ -84,7 +84,7 @@ static int ce_compare_link(struct cache_
 		match = memcmp(buffer, target, size);
 	free(buffer);
 	free(target);
-	return match;
+	return match + 0;
 }
 
 static int ce_compare_gitlink(struct cache_entry *ce)
-

From: Andreas Ericsson
Date: Monday, November 12, 2007 - 5:03 pm

I notice it, and I don't like it. I guess I'm just used to git being
smarter than their GNU tool equivalents, especially since it only ever
applies patches in full.

I have a patch ready to make it configurable but it lacks doc updates
and tests, so I'll send it tomorrow morning when I've had time to
fiddle a bit with that.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Johannes Schindelin
Date: Monday, November 12, 2007 - 5:59 pm

Hi,


I still think the existing behaviour is reasonable.  When I read a diff 
(and remember, the hunk headers are _only_ there for the reviewer's 
pleasure), the function names are a hint for _me_ where to look, and which 
is the context, in my existing, _original_ file.

That is, unless I have already applied the patch, and am looking for the 
reverse patch.  And, lo and behold, the reverse patch generated by 
git-diff really shows the now-current function name!

So IMO "fixing" this behaviour would be a regression.

Ciao,
Dscho

-

From: Miles Bader
Date: Monday, November 12, 2007 - 7:53 pm

It's not at all obvious that this behavior is actually wrong -- it seems
perfectly reasonable to use either old or new text for the hunk headers.

It hardly matters really, since that particular output is just "useful
noise" to provide a bit of helpful context for human readers, and humans
(unlike programs) are notoriously good at not being bothered by such
things.  Er, well most humans anyway.

-Miles

-- 
Americans are broad-minded people.  They'll accept the fact that a person can
be an alcoholic, a dope fiend, a wife beater, and even a newspaperman, but if a
man doesn't drive, there is something wrong with him.  -- Art Buchwald
-

From: Andreas Ericsson
Date: Tuesday, November 13, 2007 - 12:40 am

I wouldn't have reacted either, except that this time someone asked me to
review a branch early in the morning because he had introduced a bug in the
process, and the hunk header information made me assume the wrong hunk of
the patch was the culprit.

On the one hand, it wouldn't have been so much of a problem if the developer
in question would have followed my suggestion of committing small and making
sure the commit message describes everything that's done. On the other hand,
a tool fooling a human isn't a good thing either, even if said human is not
really in shape for using said tool.

Granted, the new form can still fool people, but for archeology excursions
I think it's definitely right to use the "new" funcname in the hunk header.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Andreas Ericsson
Date: Tuesday, November 13, 2007 - 2:15 am

My git hacking has been stalled at the office for now, and I'm swanked at
home since my girlfriend just moved in and brought temporary pandemonium
with her. Here's what I've got now. It passes all tests, but there are no
new ones added. Documentation also needs updating.

Extract with
	sed -n -e /^#CUTSTART/,/^#CUTEND/p -e /^#/d

#CUTSTART---%<---%<---%<---
From: Andreas Ericsson <ae@op5.se>
Date: Tue, 13 Nov 2007 09:47:43 +0100
Subject: [PATCH] diffcore: Allow users to decide what funcname to use

The function name being printed with the header of each
hunk is fetched from the "old" file today. Since git by
default applies patches either in full or not at all it's
arguably more correct to use the function from the "new"
file, at least when manually reviewing commits.

I stumbled upon this hunk when reviewing a series of
commits which caused the resulting code to segfault
under certain circumstances. Several hunks before, the
function declaration was changed and "status" was now
declared as an auto variable of type "int". The hunk
looks obviously bogus, and since I wasn't properly
awake, I reported this hunk to be the bogus one.

  @@ -583,75 +346,100 @@ double jitter_request(int *status){
     context
     context
     if(!syncsource_found){
  -    *status = STATE_UNKNOWN;
  +    status = STATE_WARNING;
       if(verbose) printf("warning: no sync source found\n")
     }

This is what GNU "diff -p" would have reported under the
same circumstances, but GNU diff has no notion of version
control, and as such will not know if it's being used on
content where the patch by definition will apply in full.

Git can be smarter than that, and imo it should. This
patch lets the diffcore grok a new configuration variable,
"diff.funcnames", which can be set to "new", "old", or a
boolean value, which will cause it to be "old" (for 'true')
and 'none' (for 'false').

Signed-off-by: Andreas Ericsson <ae@op5.se>
---
 diff.c           |   20 ++++++++++++++++++--
 ...
Previous thread: [PATCH] git-clean: Fix error message if clean.requireForce is not set. by Johannes Sixt on Monday, November 12, 2007 - 1:27 am. (5 messages)

Next thread: Re: [PATCH] Simplify strchrnul() compat code by Jakub Narebski on Monday, November 12, 2007 - 3:07 am. (1 message)