Re: [PATCH 0/2] Submodules: Add the new config option "ignore"

Previous thread: [RFC PATCH 0/2] Allow detached forms (--option arg) for git log options. by Matthieu Moy on Monday, July 26, 2010 - 11:14 am. (8 messages)

Next thread: [PATCH 2/24] Documentation: Add variable-substitution script by Thomas Rast on Monday, July 26, 2010 - 11:48 am. (5 messages)
From: Jens Lehmann
Date: Monday, July 26, 2010 - 11:26 am

These two patches are extremly useful for those users who chose submodules
because they wanted to reduce the time it takes for "git status" and "git
diff" to recurse through the whole work tree by putting sometimes large
amounts of files into submodules, which weren't scanned in the past.

Since 1.7.0 "git status" and "git diff" recurse through submodule work
trees, which has a - sometimes drastic - performance impact when large
submodules are present. Using the "ignore=dirty" config option restores
the former behaviour.

This option can be set in the .gitmodules file or in .git/config, any
settings found in the latter override those from .gitmodules. This
enables the distribution of a default setting for this option by simply
committing a modified .gitmodules file without each developer having to
call "git submodule sync". When using "git status" or "git diff" with
"--ignore-submodules=none" option the default behavior of scanning all
submodules work trees can be restored temporarily.


Comments?


Jens Lehmann (2):
  Submodules: Add the new "ignore" config option for diff and status
  Submodules: Use "ignore" settings from .gitmodules too for diff and
    status

 Documentation/config.txt       |   13 ++++
 Documentation/diff-options.txt |    6 ++-
 Documentation/git-status.txt   |    6 ++-
 Documentation/gitmodules.txt   |   15 ++++
 builtin/commit.c               |    2 +
 builtin/diff-files.c           |    2 +
 builtin/diff-index.c           |    2 +
 builtin/diff-tree.c            |    2 +
 builtin/diff.c                 |    2 +
 diff-lib.c                     |   15 +++--
 diff.c                         |   35 ++++++++--
 diff.h                         |    1 +
 submodule.c                    |   63 ++++++++++++++++-
 submodule.h                    |    4 +
 t/t4027-diff-submodule.sh      |  139 ++++++++++++++++++++++++++++++++++++
 t/t7508-status.sh              |  154 ++++++++++++++++++++++++++++++++++++++-
 wt-status.c                    |    8 ++-
 17 ...
From: Jens Lehmann
Date: Monday, July 26, 2010 - 11:27 am

The new "ignore" config option controls the default behavior for "git
status" and the diff family. It specifies under what circumstances they
consider submodules as modified and can be set separately for each
submodule.

The command line option "--ignore-submodules=" has been extended to accept
the new parameter "none" for both status and diff.

Users that chose submodules to get rid of long work tree scanning times
might want to set the "dirty" option for those submodules. This brings
back the pre 1.7.0 behavior, where submodule work trees were never
scanned for modifications. By using "--ignore-submodules=none" on the
command line the status and diff commands can be told to do a full scan.

This option can be set to the following values (which have the same name
and meaning as for the "--ignore-submodules" option of status and diff):

"all": All changes to the submodule will be ignored.

"dirty": Only differences of the commit recorded in the superproject and
	the submodules HEAD will be considered modifications, all changes
	to the work tree of the submodule will be ignored. When using this
	value, the submodule will not be scanned for work tree changes at
	all, leading to a performance benefit on large submodules.

"untracked": Only untracked files in the submodules work tree are ignored,
	a changed HEAD and/or modified files in the submodule will mark it
	as modified.

"none" (which is the default): Either untracked or modified files in a
	submodules work tree or a difference between the subdmodules HEAD
	and the commit recorded in the superproject will make it show up
	as changed. This value is added as a new parameter for the
	"--ignore-submodules" option of the diff family and "git status"
	so the user can override the settings in the configuration.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---
 Documentation/config.txt       |   10 ++++
 Documentation/diff-options.txt |    6 ++-
 Documentation/git-status.txt   |    6 ++-
 diff-lib.c                    ...
From: Jens Lehmann
Date: Monday, July 26, 2010 - 11:28 am

The .gitmodules file is parsed for "submodule.<name>.ignore" entries
before looking for them in .git/config. Thus settings found in .git/config
will override those from .gitmodules, thereby allowing the local developer
to ignore settings given by the remote side while also letting upstream
set defaults for those users who don't have special needs.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---
 Documentation/config.txt       |    3 ++
 Documentation/diff-options.txt |    2 +-
 Documentation/git-status.txt   |    2 +-
 Documentation/gitmodules.txt   |   15 +++++++++
 builtin/commit.c               |    2 +
 builtin/diff-files.c           |    2 +
 builtin/diff-index.c           |    2 +
 builtin/diff-tree.c            |    2 +
 builtin/diff.c                 |    2 +
 submodule.c                    |   12 +++++++
 submodule.h                    |    1 +
 t/t4027-diff-submodule.sh      |   66 ++++++++++++++++++++++++++++++++++++++++
 t/t7508-status.sh              |   63 +++++++++++++++++++++++++++++++++++++-
 13 files changed, 171 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 783f264..06e70cb 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1756,6 +1756,9 @@ submodule.<submodulename>.ignore::
 	let submodules with modified tracked files in their work tree show up.
 	Using "none" (the default when this option is not set) also shows
 	submodules that have untracked files in their work tree as changed.
+	This setting overrides any setting made in .gitmodules for this submodule,
+	both settings can be overriden on the command line by using the
+	"--ignore-submodule" option.

 tar.umask::
 	This variable can be used to restrict the permission bits of
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 9cf7506..faa467b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -334,7 +334,7 @@ endif::git-format-patch[]
 	Using "none" ...
From: Junio C Hamano
Date: Wednesday, July 28, 2010 - 2:35 pm

Hmph.

The value of "submodule.<name>.path" does not have to be "<name>".  There
seems to be a bit of confusion here.

I thought that [PATCH 1/2] used the path to the submodule, not its name,
as the key.  Even though the patch to Documentation/config.txt talks about
"submodule.<submodulename>.ignore", what the code actually seems to do in
set-from-submodule-config is to use path.  

But this patch seems to key off of <name>.

I suspect that it would make more sense to use <name> so that once the
user configures ignore in .git/config, it will persist across moving of
the submodule in the superproject tree.

In either case, we would want to be consistent between the two, no?
--

From: Jens Lehmann
Date: Thursday, August 5, 2010 - 3:37 pm

Thanks for clearing up my confusion, this is v2 of this patch using the
path configured for the submodule instead of its name.


Jens Lehmann (2):
  Submodules: Add the new "ignore" config option for diff and status
  Submodules: Use "ignore" settings from .gitmodules too for diff and
    status

 Documentation/config.txt       |   13 +++
 Documentation/diff-options.txt |    6 +-
 Documentation/git-status.txt   |    6 +-
 Documentation/gitmodules.txt   |   15 ++++
 builtin/commit.c               |    2 +
 builtin/diff-files.c           |    2 +
 builtin/diff-index.c           |    2 +
 builtin/diff-tree.c            |    2 +
 builtin/diff.c                 |    2 +
 diff-lib.c                     |   15 +++-
 diff.c                         |   35 +++++++--
 diff.h                         |    1 +
 submodule.c                    |   76 +++++++++++++++++-
 submodule.h                    |    4 +
 t/t4027-diff-submodule.sh      |  152 ++++++++++++++++++++++++++++++++++
 t/t7508-status.sh              |  175 +++++++++++++++++++++++++++++++++++++++-
 wt-status.c                    |    8 ++-
 17 files changed, 496 insertions(+), 20 deletions(-)

-- 
1.7.2.1.54.g6bed1

--

From: Jens Lehmann
Date: Thursday, August 5, 2010 - 3:39 pm

The new "ignore" config option controls the default behavior for "git
status" and the diff family. It specifies under what circumstances they
consider submodules as modified and can be set separately for each
submodule.

The command line option "--ignore-submodules=" has been extended to accept
the new parameter "none" for both status and diff.

Users that chose submodules to get rid of long work tree scanning times
might want to set the "dirty" option for those submodules. This brings
back the pre 1.7.0 behavior, where submodule work trees were never
scanned for modifications. By using "--ignore-submodules=none" on the
command line the status and diff commands can be told to do a full scan.

This option can be set to the following values (which have the same name
and meaning as for the "--ignore-submodules" option of status and diff):

"all": All changes to the submodule will be ignored.

"dirty": Only differences of the commit recorded in the superproject and
	the submodules HEAD will be considered modifications, all changes
	to the work tree of the submodule will be ignored. When using this
	value, the submodule will not be scanned for work tree changes at
	all, leading to a performance benefit on large submodules.

"untracked": Only untracked files in the submodules work tree are ignored,
	a changed HEAD and/or modified files in the submodule will mark it
	as modified.

"none" (which is the default): Either untracked or modified files in a
	submodules work tree or a difference between the subdmodules HEAD
	and the commit recorded in the superproject will make it show up
	as changed. This value is added as a new parameter for the
	"--ignore-submodules" option of the diff family and "git status"
	so the user can override the settings in the configuration.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---
 Documentation/config.txt       |   10 ++++
 Documentation/diff-options.txt |    6 ++-
 Documentation/git-status.txt   |    6 ++-
 diff-lib.c                    ...
From: Jens Lehmann
Date: Thursday, August 5, 2010 - 3:40 pm

The .gitmodules file is parsed for "submodule.<name>.ignore" entries
before looking for them in .git/config. Thus settings found in .git/config
will override those from .gitmodules, thereby allowing the local developer
to ignore settings given by the remote side while also letting upstream
set defaults for those users who don't have special needs.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---
 Documentation/config.txt       |    3 +
 Documentation/diff-options.txt |    2 +-
 Documentation/git-status.txt   |    2 +-
 Documentation/gitmodules.txt   |   15 +++++++
 builtin/commit.c               |    2 +
 builtin/diff-files.c           |    2 +
 builtin/diff-index.c           |    2 +
 builtin/diff-tree.c            |    2 +
 builtin/diff.c                 |    2 +
 submodule.c                    |   19 ++++++++
 submodule.h                    |    1 +
 t/t4027-diff-submodule.sh      |   76 +++++++++++++++++++++++++++++++++
 t/t7508-status.sh              |   91 ++++++++++++++++++++++++++++++++++++----
 13 files changed, 209 insertions(+), 10 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 06faa02..d97b986 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1756,6 +1756,9 @@ submodule.<name>.ignore::
 	let submodules with modified tracked files in their work tree show up.
 	Using "none" (the default when this option is not set) also shows
 	submodules that have untracked files in their work tree as changed.
+	This setting overrides any setting made in .gitmodules for this submodule,
+	both settings can be overriden on the command line by using the
+	"--ignore-submodule" option.

 tar.umask::
 	This variable can be used to restrict the permission bits of
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 9cf7506..faa467b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -334,7 +334,7 @@ endif::git-format-patch[]
 	Using "none" will consider ...
From: Junio C Hamano
Date: Thursday, August 5, 2010 - 4:27 pm

Hmph, I thought that it would make more sense to use <name> so that once
the user configures ignore in .git/config, it will persist across moving
of the submodule in the superproject tree.  I wonder what others would
think.

Will queue, anyway.

Thanks.
--

From: Jens Lehmann
Date: Thursday, August 5, 2010 - 4:51 pm

I think that is just what this patch does, it uses the
"config_name_for_path" string_list to get the submodules <name> for the
path to then look up what is configured for the "ignore" option under
that <name>. So the "ignore" option should survive renames, no? Or did
I manage to confuse myself again?
--

From: Junio C Hamano
Date: Monday, August 9, 2010 - 9:57 am

Yeah, what you said is all correct.  I just misread "... this patch using
the path ... instead of its name" to mean that the user-level interface is
now keying off of a path, but that is not what you meant.  You instead
meant that the name is used as the key and it used as if it is a path in
the previous round, but now it goes through name-to-path mapping.

Thanks.
--

From: Johannes Schindelin
Date: Thursday, August 5, 2010 - 2:21 am

Hi,


This is incredibly useful stuff. Indeed, the time for one "git status" 
went down from > 50 sec to < 2 sec again in my project. (I have the 
impression that it still does a little too much, but I will have to have 
some time for strace()ing to find out for real.) Therefore, I applied it 
to 4msysgit.git's 'devel' branch.

However, one thing is inconvenient: there are config options per 
submodule, but no single option to just turn the behavior back to before. 
I know the reasoning behind showing dirty submodules, so I am fine with 
that feature being opt-out, but for my real world scenario, it just slows 
me down, so I _need_ to turn it off.

To put my money where my mouth is (since I need it myself, I also pushed 
it to 4msysgit.git's 'devel' branch):

-- snipsnap --
From f7b1e98a0d850e5b1bc56a825fcfe5c87076dfa9 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Thu, 5 Aug 2010 10:49:55 +0200
Subject: [PATCH] Add the 'diff.ignoreSubmodules' config setting

When you have a lot of submodules checked out, the time penalty to check
for dirty submodules can easily imply a multiplication of the total time
by the factor 20. This makes the difference between almost instantaneous
(< 2 seconds) and unbearably slow (> 50 seconds) here, since the disk
caches are constantly overloaded.

To this end, the submodule.*.ignore config option was introduced, but it
is per-submodule.

This commit introduces a global config setting to set a default
(porcelain) value for the --ignore-submodules option, keeping the
default at 'none'. It can be overridden by the submodule.*.ignore
setting and by the --ignore-submodules option.

Incidentally, this commit fixes an issue with the overriding logic:
multiple --ignore-submodules options would not clear the previously
set flags.

While at it, fix a typo in the documentation for submodule.*.ignore.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/config.txt |    7 ...
From: Junio C Hamano
Date: Thursday, August 5, 2010 - 9:49 am

Nice, at least from a cursory look.  

diff_setup() is shared by everybody both plumbing and Porcelain that use
diff and log, but you placed the smudging of the default based on config
is in diff_ui_config(), so this patch looks safe.

And the default setting can be overriden with --ignore-submodules from the
command line during diff_opt_parse(), which is also safe and sane.

--

From: Jens Lehmann
Date: Thursday, August 5, 2010 - 12:08 pm

Yup, makes sense to me too.

Acked-by: Jens Lehmann <Jens.Lehmann@web.de>
--

From: Jens Lehmann
Date: Thursday, August 5, 2010 - 4:27 pm

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---


And here are some test cases for this new option.

 t/t4027-diff-submodule.sh |   10 +++++++++-
 t/t7508-status.sh         |   12 ++++++++++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh
index 1bc6e77..d99814a 100755
--- a/t/t4027-diff-submodule.sh
+++ b/t/t4027-diff-submodule.sh
@@ -115,6 +115,9 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree, refs match)'
 '

 test_expect_success 'git diff HEAD with dirty submodule (work tree, refs match) [.git/config]' '
+	git config diff.ignoreSubmodules all &&
+	git diff HEAD >actual &&
+	! test -s actual &&
 	git config submodule.subname.ignore none &&
 	git config submodule.subname.path sub &&
 	git diff HEAD >actual &&
@@ -136,10 +139,14 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree, refs match)
 	sed -e "1,/^@@/d" actual >actual.body &&
 	expect_from_to >expect.body $subprev $subprev-dirty &&
 	test_cmp expect.body actual.body &&
-	git config --remove-section submodule.subname
+	git config --remove-section submodule.subname &&
+	git config --unset diff.ignoreSubmodules
 '

 test_expect_success 'git diff HEAD with dirty submodule (work tree, refs match) [.gitmodules]' '
+	git config diff.ignoreSubmodules dirty &&
+	git diff HEAD >actual &&
+	! test -s actual &&
 	git config --add -f .gitmodules submodule.subname.ignore none &&
 	git config --add -f .gitmodules submodule.subname.path sub &&
 	git diff HEAD >actual &&
@@ -166,6 +173,7 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree, refs match)
 	test_cmp expect.body actual.body &&
 	git config --remove-section submodule.subname &&
 	git config --remove-section -f .gitmodules submodule.subname &&
+	git config --unset diff.ignoreSubmodules &&
 	rm .gitmodules
 '

diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index 1aae762..9c14b85 100755
--- a/t/t7508-status.sh
+++ ...
From: Junio C Hamano
Date: Monday, August 9, 2010 - 9:57 am

Thanks, both.  Will queue.
--

Previous thread: [RFC PATCH 0/2] Allow detached forms (--option arg) for git log options. by Matthieu Moy on Monday, July 26, 2010 - 11:14 am. (8 messages)

Next thread: [PATCH 2/24] Documentation: Add variable-substitution script by Thomas Rast on Monday, July 26, 2010 - 11:48 am. (5 messages)