Re: warning: too many files, skipping inexact rename detection

Previous thread: [PATCH] rev-parse: fix --verify to error out when passed junk after a good rev by Christian Couder on Saturday, April 26, 2008 - 6:19 am. (2 messages)

Next thread: [PATCH] Improve git bisect error message by Andi Kleen on Saturday, April 26, 2008 - 6:39 am. (2 messages)
From: Andrew Morton
Date: Saturday, April 26, 2008 - 6:32 am

I get the above message all the time when pulling all the git trees.

I'm frightened!

y:/usr/src/git26> git --version
git version 1.5.5.rc1
--

From: Jeff King
Date: Saturday, April 26, 2008 - 6:57 am

Rename detection is O(n^2), so when it looks like it will take a really
long time, we skip it. This has been happening for a while, but 1.5.5
only recently started telling the user (based on some people wondering
why renames weren't found during their enormous merges).

The default rename limit is 100, but you can bump it via the
diff.renamelimit config option. A few tests that I did imply that
200-400 is reasonable for logging, and 800-1000 for a merge:

  http://article.gmane.org/gmane.comp.version-control.git/73519

Are you running into actual problems with rename detection, or is the
message just too scary and confusing?

-Peff
--

From: Andrew Morton
Date: Saturday, April 26, 2008 - 7:06 am

No observed problems, just scared!

I don't use rename detection anyway - I use git to extract plain old diffs
only.

Perhaps the default should be bumped up a bit based on your measurements,
dunno.

--

From: Jeff King
Date: Saturday, April 26, 2008 - 7:52 am

git config --global diff.renamelimit 200


Ah, OK. If you are just doing a fast-forward merge, there is no rename
detection going on as part of the merge. But the diffstat for a large is
probably enough to trigger this behavior. So perhaps we should only

Probably. I'll work up a patch for that, as well as suppressing the
message on diffstat (where you really shouldn't care, and it serves only
to scare users).

-Peff
--

From: Jeff King
Date: Wednesday, April 30, 2008 - 10:21 am

Here's a patch series trying to improve rename limits, based on my
discussion with Andrew and from some previous comments about the
settings[1].

Patch 1 allows separate renamelimit values for diff vs merge. Patch 2
bumps up the default values based on some measurements I did in
February. Patch 3 turns off the mostly cluttering warning message except
for merges.

[1]: http://permalink.gmane.org/gmane.comp.version-control.git/73470

-Peff
--

From: Jeff King
Date: Wednesday, April 30, 2008 - 10:23 am

The point of rename limiting is to bound the amount of time
we spend figuring out inexact renames. Currently we use a
single value, diff.renamelimit, for all situations. However,
it is probably the case that a user is willing to spend more
time finding renames during a merge than they are while
looking at git-log.

This patch provides a way of setting those values separately
(though for backwards compatibility, merge still falls back
on the diff renamelimit).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/merge-config.txt |    5 +++
 builtin-merge-recursive.c      |   13 +++++--
 t/t6032-merge-large-rename.sh  |   73 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 3 deletions(-)
 create mode 100755 t/t6032-merge-large-rename.sh

diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt
index 9719311..48ce747 100644
--- a/Documentation/merge-config.txt
+++ b/Documentation/merge-config.txt
@@ -6,6 +6,11 @@ merge.log::
 	Whether to include summaries of merged commits in newly created
 	merge commit messages. False by default.
 
+merge.renameLimit::
+	The number of files to consider when performing rename detection
+	during a merge; if not specified, defaults to the value of
+	diff.renameLimit.
+
 merge.tool::
 	Controls which merge resolution program is used by
 	linkgit:git-mergetool[1].  Valid built-in values are: "kdiff3",
diff --git a/builtin-merge-recursive.c b/builtin-merge-recursive.c
index 910c0d2..1293e3d 100644
--- a/builtin-merge-recursive.c
+++ b/builtin-merge-recursive.c
@@ -92,7 +92,8 @@ static struct path_list current_directory_set = {NULL, 0, 0, 1};
 
 static int call_depth = 0;
 static int verbosity = 2;
-static int rename_limit = -1;
+static int diff_rename_limit = -1;
+static int merge_rename_limit = -1;
 static int buffer_output = 1;
 static struct strbuf obuf = STRBUF_INIT;
 
@@ -361,7 +362,9 @@ static struct path_list *get_renames(struct tree *tree,
 	diff_setup(&opts);
 ...
From: Jeff King
Date: Wednesday, April 30, 2008 - 11:18 am

Aside: this is based on next due to the reorganization of the merge
options into merge-config.txt.

-Peff
--

From: Junio C Hamano
Date: Saturday, May 3, 2008 - 1:15 pm

Makes one wonder where the magic 100 comes from.  Wouldn't this

	opts.rename_limit = (merge_rename_limit >= 0)
        		  ? merge_rename_limit
                          : diff_rename_limit;

be easier to maintain, with the same semantics?

--

From: Jeff King
Date: Sunday, May 4, 2008 - 12:31 pm

Yes, though I was being a little sneaky in perparation for 2/3, which
changes the defaults differently for diff versus merge. I could maybe
have put that particular change into 2/3, but it looks like you have
applied this to next as-is already.

-Peff
--

From: Jeff King
Date: Wednesday, April 30, 2008 - 10:24 am

The current rename limit default of 100 was arbitrarily
chosen. Testing[1] has shown that on modern hardware, a
limit of 200 adds about a second of computation time, and a
limit of 500 adds about 5 seconds of computation time.

This patch bumps the default limit to 200 for viewing diffs,
and to 500 for performing a merge. The limit for generating
git-status templates is set independently; we bump it up to
200 here, as well, to match the diff limit.

[1]: See <20080211113516.GB6344@coredump.intra.peff.net>

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin-merge-recursive.c |    2 +-
 diff.c                    |    2 +-
 wt-status.c               |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin-merge-recursive.c b/builtin-merge-recursive.c
index 1293e3d..3902e91 100644
--- a/builtin-merge-recursive.c
+++ b/builtin-merge-recursive.c
@@ -364,7 +364,7 @@ static struct path_list *get_renames(struct tree *tree,
 	opts.detect_rename = DIFF_DETECT_RENAME;
 	opts.rename_limit = merge_rename_limit >= 0 ? merge_rename_limit :
 			    diff_rename_limit >= 0 ? diff_rename_limit :
-			    100;
+			    500;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	if (diff_setup_done(&opts) < 0)
 		die("diff setup failed");
diff --git a/diff.c b/diff.c
index 3632b55..f735519 100644
--- a/diff.c
+++ b/diff.c
@@ -19,7 +19,7 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_rename_limit_default = 100;
+static int diff_rename_limit_default = 200;
 int diff_use_color_default = -1;
 static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
diff --git a/wt-status.c b/wt-status.c
index 532b4ea..a44c543 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -206,7 +206,7 @@ static void wt_status_print_updated(struct wt_status *s)
 	rev.diffopt.format_callback = wt_status_print_updated_cb;
 	rev.diffopt.format_callback_data = s;
 	rev.diffopt.detect_rename = 1;
-	rev.diffopt.rename_limit = 100;
+	rev.diffopt.rename_limit = ...
From: Jeff King
Date: Wednesday, April 30, 2008 - 10:25 am

In many cases, the warning ends up as clutter, because the
diff is being done "behind the scenes" from the user (e.g.,
when generating a commit diffstat), and whether we show
renames or not is not particularly interesting to the user.

However, in the case of a merge (which is what motivated the
warning in the first place), it is a useful hint as to why a
merge with renames might have failed.

This patch makes the warning optional based on the code
calling into diffcore. We default to not showing the
warning, but turn it on for merges.

Signed-off-by: Jeff King <peff@peff.net>
---
This neglects the case where the user specifically does a diff asking
for renames, but we turn it off. Maybe when "-M" is specified on the
commandline to git-diff, we should set this option as well.

 builtin-merge-recursive.c |    1 +
 diff.h                    |    1 +
 diffcore-rename.c         |    3 ++-
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/builtin-merge-recursive.c b/builtin-merge-recursive.c
index 3902e91..46e636f 100644
--- a/builtin-merge-recursive.c
+++ b/builtin-merge-recursive.c
@@ -365,6 +365,7 @@ static struct path_list *get_renames(struct tree *tree,
 	opts.rename_limit = merge_rename_limit >= 0 ? merge_rename_limit :
 			    diff_rename_limit >= 0 ? diff_rename_limit :
 			    500;
+	opts.warn_on_too_large_rename = 1;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	if (diff_setup_done(&opts) < 0)
 		die("diff setup failed");
diff --git a/diff.h b/diff.h
index f2c7739..8931116 100644
--- a/diff.h
+++ b/diff.h
@@ -83,6 +83,7 @@ struct diff_options {
 	int pickaxe_opts;
 	int rename_score;
 	int rename_limit;
+	int warn_on_too_large_rename;
 	int dirstat_percent;
 	int setup;
 	int abbrev;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1369a5e..1b2ebb4 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -492,7 +492,8 @@ void diffcore_rename(struct diff_options *options)
 		rename_limit = 32767;
 	if ((num_create > ...
From: Ramsay Jones
Date: Saturday, May 3, 2008 - 10:34 am

This will also fix the problem I had with gitk on cygwin; namely "gitk --all &" on my
"git" repo pops up an error dialog (see below), after which the diff display disappears.

The error dialog shows:
--- >8 ---
warning: too many files, skipping inexact rename detection
warning: too many files, skipping inexact rename detection
    while executing
"close $bdf"
    (procedure "getblobdiffline" line 89)
    invoked from within
"getblobdiffline file102daa00 1d6aeb410dc19893adbc0209bcf859f35ff1c7d6"
    ("eval" body line 1)
    invoked from within
"eval $script"
    (procedure "dorunq" line 9)
    invoked from within
"dorunq"
    ("after" script)
--- 8< ---

The git command issued by gitk appears to be:
   git diff-tree -r -p -C --no-commit-id -U3 1d6aeb410dc19893adbc0209bcf859f35ff1c7d6

However, if I type the above into bash (and include --no-pager), then the error
message does not appear! Ho hum. Also, the same repo on Linux does not exhibit this
problem at all.

In any event, if the above warning() call is commented out, then gitk is quite happy
on cygwin as well.

All the best,

Ramsay Jones

--

From: Jeff King
Date: Sunday, May 4, 2008 - 12:23 pm

Hrm. Is gitk on cygwin somehow squishing stderr and stdout together? Or
does gitk in general look at what happens on stderr?

Because while I am happy that removing this message fixes your problem,
it is a little disconcerting to think that we can break gitk just by
issuing a warning diagnostic on stderr.

-Peff
--

From: Paul Mackerras
Date: Sunday, May 4, 2008 - 4:28 pm

It's a more general Tcl thing - if you are reading from a process, and
the process writes to stderr, and the script hasn't explicitly
redirected stderr, the Tcl infrastructure assumes that the process is
signalling an error, even if the exit status is 0.  Gitk does redirect
stderr (to stdout) when it does a git reset, but not for other
commands.

At the moment I don't think there is a good way in Tcl to get hold of
the stderr output if a subcommand returns a non-zero exit status, but
ignore it if the exit status is 0, other than by redirecting stderr to
a temporary file, which has its own problems.  Tcl can bundle stderr
in with stdout, or ignore it, or take it as an error indication, or
send it to a file.

So if git commands can avoid writing non-error messages to stderr,
that will make my life easier...

Paul.
--

From: Jeff King
Date: Monday, May 5, 2008 - 6:59 am

In that case, Junio, perhaps we should restrict this particular warning
just to merge.

However, there a number of other warnings that can get printed on
stderr, many of them related to reflogs. So I suspect with some reflog
conditions (like a reflog that only goes back 1 day) you could end up
with a tcl error for "gitk HEAD@{2.days.ago}".

-Peff
--

From: Jeff King
Date: Monday, May 5, 2008 - 10:02 am

Indeed, I tested this:

  $ git clone git://git.kernel.org/pub/scm/git/git.git
  $ cd git
  $ sleep 10
  $ gitk HEAD@{10.seconds.ago} ;# works fine
  $ gitk HEAD@{10.minutes.ago} ;# barfs due to warning

-Peff
--

From: Junio C Hamano
Date: Monday, May 5, 2008 - 12:52 pm

I am not sure if we really want to work around Tcl's braindamage like
that.

There is no stdwarn or stdinfo stream and I think it is a bug on the
receiving end to assume that anything that comes to stderr is an error.
--

From: Paul Mackerras
Date: Monday, May 12, 2008 - 4:15 am

If this is the warning about too many files to do inexact rename
detection, I find that one annoying because I don't care about that
(it's just for the diffstat when doing a pull AFAIK) and I don't know

There are apparently some programs on some platforms for which this
behaviour is necessary...

Paul.
--

From: Ramsay Jones
Date: Tuesday, May 6, 2008 - 3:33 pm

Ah, OK. That explains it.

Actually, I seem to have a vague recollection of having had this
discussion before...

All the Best,

Ramsay Jones


--

From: Ramsay Jones
Date: Tuesday, May 6, 2008 - 3:29 pm

maybe. After further testing, the above "git diff-tree..." does indeed issue the
warning on stderr; on both cygwin and Linux.

[I forgot to type ./git-diff-tree as I had not installed the new git yet, since
it does not pass "make test".  That is a different problem for another day...]

It appears that gitk on Linux is quite happy with that message on stderr, but on

Indeed.

Also note that the warning is issued twice, since gitk issues that same
command twice; viz:

    $ GIT_TRACE=/home/ramsay/git-trace gitk --all & # exit asap
    $ cat /home/ramsay/git-trace
    trace: built-in: git 'config' '--get' 'i18n.commitencoding'
    trace: built-in: git 'rev-parse' '--git-dir'
    trace: built-in: git 'rev-parse' '--no-revs' '--no-flags' '--all'
    trace: built-in: git 'rev-parse' '--is-inside-work-tree'
    trace: built-in: git 'show-ref' '-d'
    trace: built-in: git 'symbolic-ref' 'HEAD'
    trace: built-in: git 'log' '--no-color' '-z' '--pretty=raw' '--topo-order' '--parents' '--boundary' '--all' '--'
    trace: built-in: git 'diff-index' '--cached' 'HEAD'
    trace: built-in: git 'rev-parse' '--git-dir'
    trace: built-in: git 'diff-tree' '-r' '--no-commit-id' '1d6aeb410dc19893adbc0209bcf859f35ff1c7d6'
    trace: built-in: git 'diff-files'
    trace: built-in: git 'diff-tree' '-r' '-p' '-C' '--no-commit-id' '-U3' '1d6aeb410dc19893adbc0209bcf859f35ff1c7d6'
    trace: built-in: git 'diff-tree' '-r' '-p' '-C' '--no-commit-id' '-U3' '1d6aeb410dc19893adbc0209bcf859f35ff1c7d6'
    $

NOTE: I get exactly the same trace on Linux and cygwin.

As a quick-fix, I added a "-l300" parameter to the above git command in ~/bin/gitk.
[diff.renamelimit only affects git-diff]

All the Best,

Ramsay Jones


--

Previous thread: [PATCH] rev-parse: fix --verify to error out when passed junk after a good rev by Christian Couder on Saturday, April 26, 2008 - 6:19 am. (2 messages)

Next thread: [PATCH] Improve git bisect error message by Andi Kleen on Saturday, April 26, 2008 - 6:39 am. (2 messages)