Re: [PATCH 1/2 v3] Make diffcore_std only can run once before a diff_flush.

Previous thread: use case, advices (SVN/GIT) by Mihamina Rakotomandimby on Thursday, April 22, 2010 - 5:14 am. (2 messages)

Next thread: Re: Please default to 'commit -a' when no changes were added by Jonathan Nieder on Thursday, April 22, 2010 - 8:58 am. (76 messages)
From: Bo Yang
Date: Thursday, April 22, 2010 - 7:05 am

I have tried to make --follow to support finding copies among unmodified files. And the first patch is to fix a bug introduced by '--follow' and 'git log' combination.
We use the code:

    else if (--p->one->rename_used > 0)
        p->status = DIFF_STATUS_COPIED;

to detect copies and renames. So, if diffcore_std run more than one time, p->one->rename_used will be reduced to a 'R' from 'C'. And this patch will fix this by allowing diffcore_std can only run once before a diff_flush, which seems rationale for our code.

Bo Yang (2):
  Make diffcore_std only can run once before a diff_flush.
  Make git log --follow find copies among unmodified files.

 Documentation/git-log.txt           |    2 +-
 diff.c                              |   21 ++++++++-----
 diffcore-break.c                    |    6 +--
 diffcore-pickaxe.c                  |    3 +-
 diffcore-rename.c                   |    3 +-
 diffcore.h                          |    6 ++++
 t/t4205-log-follow-harder-copies.sh |   56 +++++++++++++++++++++++++++++++++++
 tree-diff.c                         |    2 +-
 8 files changed, 81 insertions(+), 18 deletions(-)
 create mode 100755 t/t4205-log-follow-harder-copies.sh

--

From: Bo Yang
Date: Thursday, April 22, 2010 - 7:05 am

'git log --follow <path>' don't track copies from unmodified
files, and this patch fix it.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
 Documentation/git-log.txt           |    2 +-
 t/t4205-log-follow-harder-copies.sh |   56 +++++++++++++++++++++++++++++++++++
 tree-diff.c                         |    2 +-
 3 files changed, 58 insertions(+), 2 deletions(-)
 create mode 100755 t/t4205-log-follow-harder-copies.sh

diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index fb184ba..0727818 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -56,7 +56,7 @@ include::diff-options.txt[]
 	commits, and doesn't limit diff for those commits.
 
 --follow::
-	Continue listing the history of a file beyond renames.
+	Continue listing the history of a file beyond renames/copies.
 
 --log-size::
 	Before the log message print out its size in bytes. Intended
diff --git a/t/t4205-log-follow-harder-copies.sh b/t/t4205-log-follow-harder-copies.sh
new file mode 100755
index 0000000..ad29e65
--- /dev/null
+++ b/t/t4205-log-follow-harder-copies.sh
@@ -0,0 +1,56 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Bo Yang
+#
+
+test_description='Test --follow should always find copies hard in git log.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+echo >path0 'Line 1
+Line 2
+Line 3
+'
+
+test_expect_success \
+    'add a file path0 and commit.' \
+    'git add path0 &&
+     git commit -m "Add path0"'
+
+echo >path0 'New line 1
+New line 2
+New line 3
+'
+test_expect_success \
+    'Change path0.' \
+    'git add path0 &&
+     git commit -m "Change path0"'
+
+cat <path0 >path1
+test_expect_success \
+    'copy path0 to path1.' \
+    'git add path1 &&
+     git commit -m "Copy path1 from path0"'
+
+test_expect_success \
+    'find the copy path0 -> path1 harder' \
+    'git log --follow --name-status --pretty="format:%s"  path1 > current'
+
+cat >expected <<\EOF
+Copy path1 from path0
+C100	path0	path1
+
+Change ...
From: Bo Yang
Date: Thursday, April 22, 2010 - 7:05 am

When file renames/copies detection is turned on, the
second diffcore_std will degrade a 'C' pair to a 'R' pair.

And this may happen when we run 'git log --follow' with
hard copies finding. That is, the try_to_follow_renames()
will run diffcore_std to find the copies, and then
'git log' will issue another diffcore_std, which will reduce
'src->rename_used' and recognize this copy as a rename.
This is not what we want.

So, I think we really don't need to run diffcore_std more
than one time.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
 diff.c             |   21 +++++++++++++--------
 diffcore-break.c   |    6 ++----
 diffcore-pickaxe.c |    3 +--
 diffcore-rename.c  |    3 +--
 diffcore.h         |    6 ++++++
 5 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/diff.c b/diff.c
index d0ecbc3..d32fc68 100644
--- a/diff.c
+++ b/diff.c
@@ -2544,6 +2544,7 @@ static void run_checkdiff(struct diff_filepair *p, struct diff_options *o)
 void diff_setup(struct diff_options *options)
 {
 	memset(options, 0, sizeof(*options));
+	memset(&diff_queued_diff, 0, sizeof(diff_queued_diff));
 
 	options->file = stdout;
 
@@ -3462,8 +3463,7 @@ int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1)
 		diff_free_filepair(q->queue[i]);
 
 	free(q->queue);
-	q->queue = NULL;
-	q->nr = q->alloc = 0;
+	DIFF_QUEUE_CLEAR(q);
 
 	return result;
 }
@@ -3591,8 +3591,7 @@ void diff_flush(struct diff_options *options)
 		diff_free_filepair(q->queue[i]);
 free_queue:
 	free(q->queue);
-	q->queue = NULL;
-	q->nr = q->alloc = 0;
+	DIFF_QUEUE_CLEAR(q);
 	if (options->close_file)
 		fclose(options->file);
 
@@ -3614,8 +3613,7 @@ static void diffcore_apply_filter(const char *filter)
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
 	struct diff_queue_struct outq;
-	outq.queue = NULL;
-	outq.nr = outq.alloc = 0;
+	DIFF_QUEUE_CLEAR(&outq);
 
 	if (!filter)
 		return;
@@ -3683,8 +3681,7 @@ static void diffcore_skip_stat_unmatch(struct ...
From: Junio C Hamano
Date: Thursday, April 22, 2010 - 1:41 pm

It actually is stronger than that; we should never run it more than once,
and it would be a bug if we did so.  Which codepath tries to call *_std()
twice?

The standard calling sequence is:

 - start from an empty queue.

 - use diff_change() and diff_addremove() to populate the queue.

 - call diffcore_std(). if you need to use a non-standard chain of
   diffcore transformations, you _could_ call the diffcore_* routines that
   diffcore_std() calls, if you choose to, but as you found out, some of
   them are not idempotent operations, and shouldn't be called twice.


Shouldn't this be a BUG() instead?

The trivial rewrite to use this macro is a good idea, but it probably
--

From: Bo Yang
Date: Thursday, April 22, 2010 - 8:55 pm

In command 'git log --follow ...'
log_tree_diff call diff_tree_sha1 and then diff_tree_diff_flush, when
'--follow' is given, the former function will call
try_to_follow_renames, which will call diffcore_std to detect rename.
And then, diff_tree_diff_flush call 'diffcore_std' again
unconditional. (and I have try to find a condition to make the call,
but I fail, so I figure out this patch.)

Breakpoint 1, diffcore_std (options=0xbf9cc044) at diff.c:3748
3748		if (diff_queued_diff.run)
(gdb) bt
#0  diffcore_std (options=0xbf9cc044) at diff.c:3748
#1  0x08124206 in try_to_follow_renames (t1=0xbf9cc130, t2=0xbf9cc11c,
base=0x81571c9 "", opt=0xbf9cc468) at tree-diff.c:358
#2  0x08124480 in diff_tree_sha1 (old=0x9c51d8c
"$\033\222T���\a\035\200T����\210;8\235i", new=0x9c51d2c
"\201�\017<�\v��n]\226{�+�\001\003\232\232\230",
    base=0x81571c9 "", opt=0xbf9cc468) at tree-diff.c:418
#3  0x080e660e in log_tree_diff (opt=0xbf9cc220, commit=0x9c51d28,
log=0xbf9cc1ac) at log-tree.c:536
#4  0x080e668f in log_tree_commit (opt=0xbf9cc220, commit=0x9c51d28)
at log-tree.c:560
#5  0x0807faa1 in cmd_log_walk (rev=0xbf9cc220) at builtin/log.c:237
#6  0x080806e2 in cmd_log (argc=5, argv=0xbf9cc788, prefix=0x0) at
builtin/log.c:481
#7  0x0804b8eb in run_builtin (p=0x8161524, argc=5, argv=0xbf9cc788)
at git.c:260
#8  0x0804ba51 in handle_internal_command (argc=5, argv=0xbf9cc788) at git.c:416
#9  0x0804bb2c in run_argv (argcp=0xbf9cc700, argv=0xbf9cc704) at git.c:458
#10 0x0804bcbe in main (argc=5, argv=0xbf9cc788) at git.c:529
(gdb) c
Continuing.

Breakpoint 1, diffcore_std (options=0xbf9cc468) at diff.c:3748
3748		if (diff_queued_diff.run)
(gdb) bt
#0  diffcore_std (options=0xbf9cc468) at diff.c:3748
#1  0x080e6356 in log_tree_diff_flush (opt=0xbf9cc220) at log-tree.c:449
#2  0x080e6619 in log_tree_diff (opt=0xbf9cc220, commit=0x9c51d28,
log=0xbf9cc1ac) at log-tree.c:537
#3  0x080e668f in log_tree_commit (opt=0xbf9cc220, commit=0x9c51d28)
at log-tree.c:560
#4  0x0807faa1 in cmd_log_walk ...
From: Bo Yang
Date: Tuesday, April 27, 2010 - 8:37 pm

Hi Junio,

   I have not receive any comments on this thread from you, but I
think it worth some words. I want to make these series patches landed
and could you please give some more advice on this?

Regards!
Bo

--

Previous thread: use case, advices (SVN/GIT) by Mihamina Rakotomandimby on Thursday, April 22, 2010 - 5:14 am. (2 messages)

Next thread: Re: Please default to 'commit -a' when no changes were added by Jonathan Nieder on Thursday, April 22, 2010 - 8:58 am. (76 messages)