Re: [regression?] "git status -a" reports modified for empty submodule directory

Previous thread: [PATCH] git-svn bug with blank commits and author file by Thomas Guyot-Sionnest on Tuesday, April 22, 2008 - 3:07 am. (2 messages)

Next thread: bash completion only provides revs, not paths by Uwe on Tuesday, April 22, 2008 - 4:21 am. (5 messages)
From: Ping Yin
Date: Tuesday, April 22, 2008 - 4:01 am

# create a super project super
$ mkdir super && cd super && git init
$ touch foo && git add foo && git commit -m "add foo"

# create a sub project sub
$ mkdir sub && cd sub && git init
$ touch bar && git add bar && git commit -m "add bar"

# add sub project to super project
$ cd ..
$ git add sub && git commit -m 'add sub'

# remote contents of subproject
$ rm -rf sub/* sub/.git

# git status -a regression
$ git status
# On branch master
nothing to commit (working directory clean)
$ git status -a
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       deleted:    sub
#


-- 
Ping Yin
--

From: Ping Yin
Date: Tuesday, April 22, 2008 - 4:04 am

-- 
Ping Yin
--

From: Ping Yin
Date: Tuesday, April 29, 2008 - 8:31 am

Another regression following

In the super project super with empty submodule directory sub
$ git diff
diff --git a/sub b/sub
deleted file mode 160000
index f2c0d45..0000000
--- a/sub
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit f2c0d4509a3178c...


-- 
Ping Yin
--

From: Ping Yin
Date: Tuesday, April 29, 2008 - 9:07 am

For regression about empty submodule directory

* "git status -a" reports modified
* "git diff" generates diff


--

From: Ping Yin
Date: Tuesday, April 29, 2008 - 9:07 am

Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
 t/t4027-diff-submodule.sh |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh
index 1fd3fb7..ba6679c 100755
--- a/t/t4027-diff-submodule.sh
+++ b/t/t4027-diff-submodule.sh
@@ -50,4 +50,11 @@ test_expect_success 'git diff-files --raw' '
 	test_cmp expect actual.files
 '
 
+test_expect_success 'git diff (empty submodule dir)' '
+	: >empty &&
+	rm -rf sub/* sub/.git &&
+	git diff > actual.empty &&
+	test_cmp empty actual.empty
+'
+
 test_done
-- 
1.5.5.1.96.gef4a.dirty

--

From: Ping Yin
Date: Tuesday, April 29, 2008 - 9:07 am

Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
 t/t7506-status-submodule.sh |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)
 create mode 100755 t/t7506-status-submodule.sh

diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
new file mode 100755
index 0000000..a75130c
--- /dev/null
+++ b/t/t7506-status-submodule.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='git-status for submodule'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_create_repo sub
+	cd sub &&
+	: >bar &&
+	git add bar &&
+	git commit -m " Add bar" &&
+	cd .. &&
+	git add sub &&
+	git commit -m "Add submodule sub"
+'
+
+test_expect_success 'status clean' '
+	git status |
+	grep "nothing to commit"
+'
+test_expect_success 'status -a clean' '
+	git status -a |
+	grep "nothing to commit"
+'
+test_expect_success 'rm submodule contents' '
+	rm -rf sub/* sub/.git
+'
+test_expect_success 'status clean (empty submodule dir)' '
+	git status |
+	grep "nothing to commit"
+'
+test_expect_success 'status -a clean (empty submodule dir)' '
+	git status -a |
+	grep "nothing to commit"
+'
+
+test_done
-- 
1.5.5.1.96.gef4a.dirty

--

From: Junio C Hamano
Date: Tuesday, April 29, 2008 - 2:40 pm

The repository used to have a subproject and now it doesn't.  Why
shouldn't it report the removal?
--

From: Johannes Sixt
Date: Tuesday, April 29, 2008 - 11:39 pm

Because you are not required to have a subproject checked out?

-- Hannes

--

From: Junio C Hamano
Date: Wednesday, April 30, 2008 - 12:29 am

Yes, but.

This is a policy issue, which is not very well enforced currently.

I have been scratching my head to figure out where the right balance we
should strike at for the past week and a half.

For example, it has long been known ever since submodules were introduced
that if you work inside a sparsely checked out supermodule you have to use
"commit -a" with care, because the command notices missing submodule, and
there is no way for it to differenciate between the case you _want to_
remove it and the case you did not care about it, so you will end up
removing unchecked out submodules.

That quirk was something people could live with while submodule support
was merely a newly invented curiosity.  But I do not think a command at
high level (iow Porcelain) such as commit and status should be left
unaware of the Policy that equate missing submodule and unmodified one
forever.  We should actively enforce the policy, so that unless you
explicitly ask nicely, the command should consider a missing submodule
just as unmodified, e.g. "commit -a" should not remove unchecked out
submodules.

But then you would need a way to ask nicely.  How?  Perhaps using "git rm",
and low level "update-index --remove".  Do we even need "commit -A"?  I
doubt it --- you do not remove submodules every day.

We'd like to keep the lowest-level unaware of the Policy, which means that
"diff-files" and "diff-index" should report unchecked out submodules.
Otherwise script writers will be left with no way to differenciate missing
and removed submodules.

Once we start doing this, I think "git diff" Porcelain should fall into
Policy-aware category.





--

From: Ping Yin
Date: Wednesday, April 30, 2008 - 8:56 am

If i am not misunderstood, there is one way now (although broken now,
it used this kind of behaviour before git-status became built-in):
empty directory to denote that the submodule is not checked out and
unaware, and deleted directory to denote that the submodule is
deleted.

Actually, i don't like the empty directory trick, since it will leave
so many empty directories in a repository with many submodules


Agree this pilicy.

If this change needs a long way. Should we fix the regression first?
Anyway, 'git status' and 'git status -a' should behave the same for
submodules unchecked out. I have tried but i failed. I just found this
regression was introduced on the first day of built-in status

-- 
Ping Yin
--

From: Ping Yin
Date: Friday, May 2, 2008 - 6:35 am

Ok, this is my try to fix the regression.


--

From: Ping Yin
Date: Friday, May 2, 2008 - 6:35 am

Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
 t/t4027-diff-submodule.sh |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh
index 1fd3fb7..ba6679c 100755
--- a/t/t4027-diff-submodule.sh
+++ b/t/t4027-diff-submodule.sh
@@ -50,4 +50,11 @@ test_expect_success 'git diff-files --raw' '
 	test_cmp expect actual.files
 '
 
+test_expect_success 'git diff (empty submodule dir)' '
+	: >empty &&
+	rm -rf sub/* sub/.git &&
+	git diff > actual.empty &&
+	test_cmp empty actual.empty
+'
+
 test_done
-- 
1.5.5.1.116.ge4b9c.dirty

--

From: Ping Yin
Date: Friday, May 2, 2008 - 6:35 am

Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
 t/t7506-status-submodule.sh |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)
 create mode 100755 t/t7506-status-submodule.sh

diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
new file mode 100755
index 0000000..a75130c
--- /dev/null
+++ b/t/t7506-status-submodule.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='git-status for submodule'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_create_repo sub
+	cd sub &&
+	: >bar &&
+	git add bar &&
+	git commit -m " Add bar" &&
+	cd .. &&
+	git add sub &&
+	git commit -m "Add submodule sub"
+'
+
+test_expect_success 'status clean' '
+	git status |
+	grep "nothing to commit"
+'
+test_expect_success 'status -a clean' '
+	git status -a |
+	grep "nothing to commit"
+'
+test_expect_success 'rm submodule contents' '
+	rm -rf sub/* sub/.git
+'
+test_expect_success 'status clean (empty submodule dir)' '
+	git status |
+	grep "nothing to commit"
+'
+test_expect_success 'status -a clean (empty submodule dir)' '
+	git status -a |
+	grep "nothing to commit"
+'
+
+test_done
-- 
1.5.5.1.116.ge4b9c.dirty

--

From: Ping Yin
Date: Friday, May 2, 2008 - 6:35 am

This regression is introduced by f58dbf23c3, which calls
check_work_tree_entity in run_diff_files. While check_work_tree_entity
treats submodule not checked out as non stagable which causes that
diff-files shows these submodules as deleted.

check_work_tree_entity considers a worktree entity having two statuses:
stagable and inexistent. Actually, there is a 3rd status: a submodule
entity can be existent but not stagable (for example, empty directory for
non-checked-out submodule)

This patch redesigns the return value of check_work_tree_entity to
consider both the 3 statuses.

Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
 diff-lib.c |   22 +++++++++++++---------
 1 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index cfd629d..72c2a7b 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -337,25 +337,29 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 	}
 	return run_diff_files(revs, options);
 }
+
+#define ENT_STAGABLE 1
+#define ENT_INEXISTENT 2
+#define ENT_NOTGITDIR 3		/* Existent but not stagable (not a git dir) */
 /*
- * See if work tree has an entity that can be staged.  Return 0 if so,
- * return 1 if not and return -1 if error.
+ * Check the status of a work tree entity
+ * Return ENT_{STAGABLE,INEXISTENT,NOTGITDIR} or -1 if error
  */
 static int check_work_tree_entity(const struct cache_entry *ce, struct stat *st, char *symcache)
 {
 	if (lstat(ce->name, st) < 0) {
 		if (errno != ENOENT && errno != ENOTDIR)
 			return -1;
-		return 1;
+		return ENT_INEXISTENT;
 	}
 	if (has_symlink_leading_path(ce->name, symcache))
-		return 1;
+		return ENT_INEXISTENT;
 	if (S_ISDIR(st->st_mode)) {
 		unsigned char sub[20];
 		if (resolve_gitlink_ref(ce->name, "HEAD", sub))
-			return 1;
+			return ENT_NOTGITDIR;
 	}
-	return 0;
+	return ENT_STAGABLE;
 }
 
 int run_diff_files(struct rev_info *revs, unsigned int option)
@@ -403,7 +407,7 @@ int run_diff_files(struct rev_info *revs, unsigned int ...
From: Ping Yin
Date: Friday, May 2, 2008 - 6:35 am

For submodules non checked out, ie_match_stat should always return 0.
So in this case avoid calling is_racy_timestamp.

Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
 read-cache.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index a92b25b..8c9c8e2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -294,7 +294,7 @@ int ie_match_stat(const struct index_state *istate,
 	 * whose mtime are the same as the index file timestamp more
 	 * carefully than others.
 	 */
-	if (!changed && is_racy_timestamp(istate, ce)) {
+	if (!changed && !S_ISGITLINK(ce->ce_mode) && is_racy_timestamp(istate, ce)) {
 		if (assume_racy_is_modified)
 			changed |= DATA_CHANGED;
 		else
-- 
1.5.5.1.116.ge4b9c.dirty

--

From: Junio C Hamano
Date: Friday, May 2, 2008 - 2:57 pm

It is conceptually wrong to have this check in that function, I think.

Look at what ce_match_stat_basic() does.  For S_IFGITLINK entries, we do
not even compare the timestamps, so is_racy_timestamp() check should not
even care and should return Ok for them.

Perhaps this patch would be a better, in that it covers the other caller
on the write_index() callpath as well.

---
diff --git a/read-cache.c b/read-cache.c
index a92b25b..0a0ea3b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -258,6 +258,7 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 static int is_racy_timestamp(const struct index_state *istate, struct cache_entry *ce)
 {
 	return (istate->timestamp &&
+		!S_ISGITLINK(ce->ce_mode) &&
 		((unsigned int)istate->timestamp) <= ce->ce_mtime);
 }
 
--

From: Ping Yin
Date: Friday, May 2, 2008 - 4:34 pm

fine



-- 
Ping Yin
--

From: Junio C Hamano
Date: Friday, May 2, 2008 - 3:23 pm

Earier I said we may have to teach the Porcelain layer (status, diff) to
equate a submodule that is not checked out and a submodule that is not
modified while keeping low-level plumbing (diff-files and diff-index)
still aware that the submodule is missing from the work tree, but that was
because I incorrectly thought there are only two cases (either the
submodule is fully checked out or the submodule directory itself does not
even exist) and treating the latter the same as an unmodified case would
mean there won't be an easy way to remove the submodule from the
superproject for Porcelains that are written in terms of diff-files and
diff-index.

But that was a faulty thinking on my part (heh, why didn't anybody correct
me?).  Actually, there are three cases:

 - submodule directory exists and it is a full fledged repository.  It may
   or may not be modified, but we can tell by looking at its .git/HEAD.

 - submodule directory exists but there is nothing there (no "init" nor
   "update" was done, just an empty directory checked out).  This is how a
   superproject with a submodule is checked out by default.

 - submodule directory itself does not even exist.

The second case is "not checked out -- treat me as unmodified", and the
third case is "the user does not want the submodule there", and the latter
is still reported as "removed".  That is exactly what your patch does.

I like it.

By the way, "inexistent" is a word, but somehow it sounds quite awkward.
Perhaps one of NONEXISTENT (more common), REMOVED (run_diff_files() takes
a SILENT_ON_REMOVED option) or or MISSING (update-index --refresh takes an
IGNORE_MISSING option) is better?  I dunno.
--

From: Ping Yin
Date: Friday, May 2, 2008 - 4:55 pm

Actually, i pointed out in early discussion that we use empty
directory (or directory without .git subdirectory) to represent the
3rd case. However, i don't like the trick. And i don't think you are
thinking faultily because I prefer your idea: we treat nonexistent
directory and directory without .git as the same for 'git diff'. "git
diff" will show "no modified" for both case.

One point i don't agree with you is that i think diff-files should
also show "no modified" for both case. By doing this, we can avoid the
empty directory for nonchecked out submodules. For a project with
hundreds of submodules, it's really really annoying to see so many
unused empty directories.

So how we diffrentiate removed and unchecked out? For submodule, we
don't differentiate it. If you want to remove a submodule, use "git

Great, you make the 3 cases more clear.



-- 
Ping Yin
--

From: Ping Yin
Date: Friday, May 2, 2008 - 5:07 pm

Signed-off-by: Ping Yin <pkufranky@gmail.com>

I prefer nonexistent because removed or missing has the meaning that the
user has removed it. However, it may be not this case (althogh it is at
current time).


 diff-lib.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 72c2a7b..61a1b7c 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -339,7 +339,7 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 }
 
 #define ENT_STAGABLE 1
-#define ENT_INEXISTENT 2
+#define ENT_NONEXISTENT 2
 #define ENT_NOTGITDIR 3		/* Existent but not stagable (not a git dir) */
 /*
  * Check the status of a work tree entity
@@ -350,10 +350,10 @@ static int check_work_tree_entity(const struct cache_entry *ce, struct stat *st,
 	if (lstat(ce->name, st) < 0) {
 		if (errno != ENOENT && errno != ENOTDIR)
 			return -1;
-		return ENT_INEXISTENT;
+		return ENT_NONEXISTENT;
 	}
 	if (has_symlink_leading_path(ce->name, symcache))
-		return ENT_INEXISTENT;
+		return ENT_NONEXISTENT;
 	if (S_ISDIR(st->st_mode)) {
 		unsigned char sub[20];
 		if (resolve_gitlink_ref(ce->name, "HEAD", sub))
@@ -407,7 +407,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			       sizeof(struct combine_diff_parent)*5);
 
 			changed = check_work_tree_entity(ce, &st, symcache);
-			if (changed != ENT_INEXISTENT)
+			if (changed != ENT_NONEXISTENT)
 				dpath->mode = ce_mode_from_stat(ce, st.st_mode);
 			else {
 				if (changed < 0) {
@@ -471,7 +471,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			continue;
 
 		changed = check_work_tree_entity(ce, &st, symcache);
-		if (changed == ENT_INEXISTENT) {
+		if (changed == ENT_NONEXISTENT) {
 			if (changed < 0) {
 				perror(ce->name);
 				continue;
@@ -531,7 +531,7 @@ static int get_stat_data(struct cache_entry *ce,
 		changed = check_work_tree_entity(ce, &st, cbdata->symcache);
 		if (changed < 0)
 			return -1;
-		else if (changed == ...
From: Johannes Schindelin
Date: Saturday, May 3, 2008 - 5:36 am

Hi,


In the same vein, I had to think about "stagable" for some time.  I think 
the correct term is "stageable", but then, I am not sure if there is no 
better word anyway.

Ciao,
Dscho

--

From: Junio C Hamano
Date: Saturday, May 3, 2008 - 9:59 am

Oh, I did not even mention that because it quite obviously is an typo of a
non-word ;-)
--

From: Johannes Sixt
Date: Tuesday, April 22, 2008 - 5:15 am

This should have reported:

# On branch master
# Changed but not updated:
#   (use "git add/rm <file>..." to update what will be committed)
#
#       deleted:    sub
no changes added to commit (use "git add" and/or "git commit -a")


There's nothing wrong with this.

-- Hannes

--

From: Ping Yin
Date: Tuesday, April 22, 2008 - 5:39 am

It seems that in 1.5.4, both 'git status' and 'git status -a' report
"no changes added to commit". And i think this is the right behaviour.
Because when a super project is cloned, all submodule directories are
empty in the beginning. In this case 'git status' and 'git status -a'
should report " no changes added to commit".



-- 
Ping Yin
--

From: Roman Shaposhnik
Date: Tuesday, April 22, 2008 - 11:00 am

Agreed 100%

Thanks,
Roman.

--

From: Johannes Sixt
Date: Tuesday, April 22, 2008 - 5:46 am

That makes sense.

-- Hannes

--

Previous thread: [PATCH] git-svn bug with blank commits and author file by Thomas Guyot-Sionnest on Tuesday, April 22, 2008 - 3:07 am. (2 messages)

Next thread: bash completion only provides revs, not paths by Uwe on Tuesday, April 22, 2008 - 4:21 am. (5 messages)