Re: [PATCH 0/7] Case-insensitive filesystem support, take 1

Previous thread: hi by Elena on Saturday, March 22, 2008 - 9:30 am. (1 message)

Next thread: Re: [SoC RFC] git statistics - information about commits by Junio C Hamano on Saturday, March 22, 2008 - 12:40 pm. (7 messages)
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:21 am

Ok, so I said it wasn't a high priority, but I had already done all the 
really core support for this in that we have the name hashes now that I 
wanted to use for looking up names case-insensitively, so I took it as a 
challenge to do this cleanly. I already knew how I wanted to do it, so how 
hard could it be?

First, a few caveats:

 - I've tested this series, both on a case-sensitive one (using hardlinks 
   to test corner cases) and on a vfat filesystem under Linux (which is 
   case-insensitive and *really* odd wrt case preservation - it remembers 
   the name of removed files, so it preserves case even across removal and 
   re-creation!)

 - HOWEVER. The testing has been very targeted, and I only convered a few 
   cases to really care. Things like case-renaming, for example, will be 
   trivial to do, but I didn't do it. So if you want to do a 

	git mv Abc abc

   on a case-insensitive filesystem, you currently still have to do it as 
   the sequence

	git mv Abc xyz
	git mv xyz abc

   because I did *not* make git-mv know about case-insensitivity.

 - The only two operations that care about case-insensitivity after this 
   series of seven patches are

    (a) "git status" and friends (like "git add") that use the directory 
        traversal code will know to ignore files that have a case- 
        insensitive version in the index. So if you have messed up the 
        case in the working tree (or the filesystem isn't even case- 
        preserving), then "git add" and "git status" won't show the 
        case-different file as being "unknown".

    (b) merging trees (git read-tree) knows about unexpectedly found files
        that are due to case-insensitive filesystems, and knows to ignore 
        them. This means that switching branches where the case of a file 
        changes works, and it means that going a merge across that case 
        also works.

 - I made this all conditional on

	[core]
		ignorecase = true

   because obviously I'm ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:22 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 13:14:47 -0700

Instead of wasting space with whole integers for a single bit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is really unrelated to the rest of the series, except for the fact 
that it irritated me when I was thinking about the required changes to 
unpack-trees.

 unpack-trees.h |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/unpack-trees.h b/unpack-trees.h
index 50453ed..ad8cc65 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -9,16 +9,16 @@ typedef int (*merge_fn_t)(struct cache_entry **src,
 		struct unpack_trees_options *options);
 
 struct unpack_trees_options {
-	int reset;
-	int merge;
-	int update;
-	int index_only;
-	int nontrivial_merge;
-	int trivial_merges_only;
-	int verbose_update;
-	int aggressive;
-	int skip_unmerged;
-	int gently;
+	unsigned int reset:1,
+		     merge:1,
+		     update:1,
+		     index_only:1,
+		     nontrivial_merge:1,
+		     trivial_merges_only:1,
+		     verbose_update:1,
+		     aggressive:1,
+		     skip_unmerged:1,
+		     gently:1;
 	const char *prefix;
 	int pos;
 	struct dir_struct *dir;
-- 
1.5.5.rc0.28.g61a0.dirty

--

From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:25 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 13:16:24 -0700

It's really totally separate functionality, and if we want to start
doing case-insensitive hash lookups, I'd rather do it when it's
separated out.

It also renames "remove_index_entry()" to "remove_name_hash()", because 
that really describes the thing better. It doesn't actually remove the 
index entry, that's done by "remove_index_entry_at()", which is something 
very different, despite the similarity in names.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This makes no code changes what-so-ever apart from the movement/renaming, 
just moves things around a bit, and makes a function that used to be 
static (add_name_hash()) be exported because it's now in a different file.

 Makefile            |    1 +
 builtin-read-tree.c |    2 +-
 cache.h             |   31 ++++++++++++----------
 name-hash.c         |   73 +++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c        |   65 ++-------------------------------------------
 5 files changed, 95 insertions(+), 77 deletions(-)
 create mode 100644 name-hash.c

diff --git a/Makefile b/Makefile
index 7c70b00..6d35662 100644
--- a/Makefile
+++ b/Makefile
@@ -421,6 +421,7 @@ LIB_OBJS += log-tree.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
+LIB_OBJS += name-hash.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-revindex.o
diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index e9cfd2b..7ac3088 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -40,7 +40,7 @@ static int read_cache_unmerged(void)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce)) {
-			remove_index_entry(ce);
+			remove_name_hash(ce);
 			if (last && !strcmp(ce->name, last->name))
 				continue;
 			cache_tree_invalidate_path(active_cache_tree, ce->name);
diff --git a/cache.h b/cache.h
index ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:28 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 15:53:00 -0700

This allows verify_absent() in unpack_trees() to use the hash chains
rather than looking it up using the binary search.

Perhaps more imporantly, it's also going to be useful for the next phase, 
where we actually start looking at the cache entry when we do 
case-insensitive lookups and checking the result.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
No real change, except verify_absent() can now use the name hashing rather 
than the binary search. But it's all still very much case-sensitive.

 cache.h        |    2 +-
 name-hash.c    |    6 +++---
 unpack-trees.c |    8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 2afc788..76d95d2 100644
--- a/cache.h
+++ b/cache.h
@@ -353,7 +353,7 @@ extern int write_index(const struct index_state *, int newfd);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
-extern int index_name_exists(struct index_state *istate, const char *name, int namelen);
+extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen);
 extern int index_name_pos(const struct index_state *, const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
diff --git a/name-hash.c b/name-hash.c
index e56eb16..2678148 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -54,7 +54,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 		hash_index_entry(istate, ce);
 }
 
-int index_name_exists(struct index_state *istate, const char *name, int namelen)
+struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen)
 {
 	unsigned int hash = hash_name(name, namelen);
 	struct cache_entry *ce;
@@ -65,9 +65,9 @@ int ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:30 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 15:55:19 -0700

Right now nobody uses it, but "index_name_exists()" gets a flag so
you can enable it on a case-by-case basis.

Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Oooh.. We actually have some (admittedly stupid) case insensitivity code 
starting to appear. So we now hash the names insensitively, and we have 
the _capability_ to do case-insensitive lookups, but nobody actually uses 
that insensitive lookup capability yet.

But things are now starting to get interesting.

 cache.h        |    4 ++--
 dir.c          |    2 +-
 name-hash.c    |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 unpack-trees.c |    2 +-
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index 76d95d2..a9ddaa1 100644
--- a/cache.h
+++ b/cache.h
@@ -264,7 +264,7 @@ static inline void remove_name_hash(struct cache_entry *ce)
 #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL)
 #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
 #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
-#define cache_name_exists(name, namelen) index_name_exists(&the_index, (name), (namelen))
+#define cache_name_exists(name, namelen, igncase) index_name_exists(&the_index, (name), (namelen), (igncase))
 #endif
 
 enum object_type {
@@ -353,7 +353,7 @@ extern int write_index(const struct index_state *, int newfd);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
-extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen);
+extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 extern int index_name_pos(const struct index_state *, const char *name, int namelen);
 #define ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:33 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 16:52:46 -0700

..and start using it for directory entry traversal (ie "git status" will
not consider entries that match an existing entry case-insensitively to
be a new file)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Basically all of this patch is just setting up the "ignore_case" variable 
(default to false), but this one-liner in "dir_add_name()" is the actual 
meat of it all:

	-	if (cache_name_exists(pathname, len, 0))
	+	if (cache_name_exists(pathname, len, ignore_case))

because it now means that we will ignore unknown directory entries that 
match already-known files if they match case-insensitively and we have 
core.ignorecase being set.

So this actually starts using the case insensitivity logic, but it's for a 
really quite trivial and not very interesting case. 

 cache.h       |    1 +
 config.c      |    5 +++++
 dir.c         |    2 +-
 environment.c |    1 +
 4 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index a9ddaa1..9bce723 100644
--- a/cache.h
+++ b/cache.h
@@ -407,6 +407,7 @@ extern int delete_ref(const char *, const unsigned char *sha1);
 extern int trust_executable_bit;
 extern int quote_path_fully;
 extern int has_symlinks;
+extern int ignore_case;
 extern int assume_unchanged;
 extern int prefer_symlink_refs;
 extern int log_all_ref_updates;
diff --git a/config.c b/config.c
index 0624494..3d51868 100644
--- a/config.c
+++ b/config.c
@@ -342,6 +342,11 @@ int git_default_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.ignorecase")) {
+		ignore_case = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.bare")) {
 		is_bare_repository_cfg = git_config_bool(var, value);
 		return 0;
diff --git a/dir.c b/dir.c
index 7362e83..b5bfbca 100644
--- a/dir.c
+++ b/dir.c
@@ -371,7 +371,7 @@ static struct dir_entry *dir_entry_new(const ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:38 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Sat, 22 Mar 2008 09:35:59 -0700

If we find an unexpected file, see if that filename perhaps exists in a
case-insensitive way in the index, and whether the file matches that. If
so, ignore it as a known pre-existing file of a different name.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

All right, this is *it*. This is the actual core code that does something 
interesting.

I've tried to explain the behaviour in the comment, and let's face it, the 
patch is really really simple (yeah, 26 new lines but they are all really 
trivial and over half of them of them are actually the comments about 
what is going on).

The core of the code itself is just two lines, really, but it's all 
wrapped in a helper function and I tried to make it be really really 
obvious what is going on!

The reason why "src_index" had to become non-const is stupid: it's not 
because we actually do anything that really writes to the index, but the 
lazy index name hashing code means that even just a name lookup will 
possibly create the name hash in the index.

Oh well.

 unpack-trees.c |   26 ++++++++++++++++++++++++++
 unpack-trees.h |    2 +-
 2 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index bf7d8f6..95d3413 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -521,6 +521,22 @@ static int verify_clean_subdirectory(struct cache_entry *ce, const char *action,
 }
 
 /*
+ * This gets called when there was no index entry for the tree entry 'dst',
+ * but we found a file in the working tree that 'lstat()' said was fine,
+ * and we're on a case-insensitive filesystem.
+ *
+ * See if we can find a case-insensitive match in the index that also
+ * matches the stat information, and assume it's that other file!
+ */
+static int icase_exists(struct unpack_trees_options *o, struct cache_entry *dst, struct stat *st)
+{
+	struct cache_entry *src;
+
+	src = ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:45 am

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Sat, 22 Mar 2008 09:48:41 -0700

This is immaterial on sane filesystems, but if you have a broken (aka
case-insensitive) filesystem, and the objective is to remove the file
'abc' and replace it with the file 'Abc', then we must make sure to do
the removal first.

Otherwise, you'd first update the file 'Abc' - which would just
overwrite the file 'abc' due to the broken case-insensitive filesystem -
and then remove file 'abc' - which would now brokenly remove the just
updated file 'Abc' on that broken filesystem.

By doing removals first, this won't happen.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, this one looks - and is, really - trivial, but it's actually the only 
one in the whole series that I'm even remotely nervous about. First off, 
it actually does what it does regardless of that "core.ignorecase" 
variable, but that wouldn't worry me if it wasn't for the fact that I 
don't remember/understand what the heck that "last_symlink" logic was 
there for.

I *think* the logic is only for removal, and that splitting up the single 
loop to be two loops is totally safe and actually cleans things up, but I 
really want somebody to take a look at this. 

This patch is important, because without it you can't reliably switch 
between branches with case-aliases on a case-insensitive filesystem, and 
strictly speaking I should have put it before the previous patch, but I 
put it last because of this worry of mine. Patches 1-6 I think are totally 
obvious and ready to go at any point. This one I really want Junio to 
double-check.

The patch itself is really really trivial. We used to do both removal and 
file updates in one phase, we just split it into two. So I don't worry 
about the code, I only worry about that "last_symlink" thing.

Anyway, this concludes the series. It's not _complete_ - the 
case-independent compare function is a joke and only gets US-ASCII 
correct, and ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 11:06 am

So the whole patch series looks like this:

	 Makefile            |    1 +
	 builtin-read-tree.c |    2 +-
	 cache.h             |   36 +++++++++-------
	 config.c            |    5 ++
	 dir.c               |    2 +-
	 environment.c       |    1 +
	 name-hash.c         |  119 +++++++++++++++++++++++++++++++++++++++++++++++++++
	 read-cache.c        |   65 +--------------------------
	 unpack-trees.c      |   43 ++++++++++++++++---
	 unpack-trees.h      |   22 +++++-----
	 10 files changed, 199 insertions(+), 97 deletions(-)
	 create mode 100644 name-hash.c

and clearly does add more lines than it deletes, but it all really is 
pretty simple, and none of this is rocket science or even very intrusive. 
What took me longest to do was not the actual code itself, but to get 
_just_ the right approach so that the end result would be as simple and 
nonintrusive as possible. That core patch 6/7 was redone at least ten 
times before I was happy with it.

Anyway, perhaps exactly because I tried very hard to make it all make 
sense, I'm actually very very happy with the patch. I suspect it's too 
late for v1.5.5 even if I think all the patches are really simple, but I'm 
hoping it can go into at least "pu" and have people actually *test* it.

Talking about testing, the kind of safety I wanted to get with this patch 
is perhaps best described by the tests I did not on case-insensitive 
filesystems, but on regular *good* filesystems together with setting the 
"core.ignorecase" config variable.

Here's an example of how that patch 6/7 works and tries to be really 
careful even on a case-sensitive filesystem:

	mkdir test-case
	cd test-case
	git init
	git config core.ignorecase true

	echo "File" > File
	git add File
	git commit -m "Create 'File'"

	git checkout -b other
	git rm File
	echo "file" > file
	git add file
	git commit -m "Create 'file'"

	echo "File" > File
	git checkout master

and now it complains about

	error: Untracked working tree file 'File' would ...
From: Linus Torvalds
Date: Saturday, March 22, 2008 - 11:28 am

git-add will want more than this, but this is an example of what we should 
do - if 'ignore_case' is set, we probably should disallow adding the same 
case-insensitive name twice to the index.

This does *not* guarantee that the index never would have aliases when 
core.ignorecase is set, since the index might have been populated from a 
tree that was generated on a sane filesystem, and we still allow that, but 
things like this are probably good things to do for projects that want to 
work case-insensitively.

So even if you have a case-sensitive filesystem, the goal (I think) should 
be that you can set core.ignorecase to true, and that should help you work 
with other people who may be stuck on case-insensitive crud.

Anyway, the reason "git add" didn't actually work with the simple change 
to dir_add_name() is that "git add" doesn't load the index until *after* 
it has done the directory traversal (because it actually *wants* to see 
files that are already in the index). 

Something like this at least disallows the dual add if the case has 
changed.

		Linus

----
 read-cache.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 5dc998d..6aee6e0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -476,6 +476,13 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 		return 0;
 	}
 
+	if (ignore_case) {
+		struct cache_entry *alias;
+		alias = index_name_exists(istate, ce->name, ce_namelen(ce), 1);
+		if (alias)
+			die("Will not add file alias '%s' ('%s' already exists in index)", ce->name, alias->name);
+	}
+
 	if (index_path(ce->sha1, path, &st, 1))
 		die("unable to index file %s", path);
 	if (add_index_entry(istate, ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE))
--

From: Linus Torvalds
Date: Saturday, March 22, 2008 - 2:19 pm

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Sat, 22 Mar 2008 13:19:49 -0700

This simplifies the matching case of "I already have this file and it is 
up-to-date" and makes it do the right thing in the face of 
case-insensitive aliases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is patch 1/2 to make "git add" act better. Discard the previous 
simplistic and overly die()-eager patch that I hadn't signed off on.

 read-cache.c |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 5dc998d..8c57adf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -431,9 +431,9 @@ static int index_name_pos_also_unmerged(struct index_state *istate,
 
 int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 {
-	int size, namelen, pos;
+	int size, namelen;
 	struct stat st;
-	struct cache_entry *ce;
+	struct cache_entry *ce, *alias;
 	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_RACY_IS_DIRTY;
 
 	if (lstat(path, &st))
@@ -466,13 +466,11 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 		ce->ce_mode = ce_mode_from_stat(ent, st.st_mode);
 	}
 
-	pos = index_name_pos(istate, ce->name, namelen);
-	if (0 <= pos &&
-	    !ce_stage(istate->cache[pos]) &&
-	    !ie_match_stat(istate, istate->cache[pos], &st, ce_option)) {
+	alias = index_name_exists(istate, ce->name, ce_namelen(ce), ignore_case);
+	if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, &st, ce_option)) {
 		/* Nothing changed, really */
 		free(ce);
-		ce_mark_uptodate(istate->cache[pos]);
+		ce_mark_uptodate(alias);
 		return 0;
 	}
 
-- 
1.5.5.rc0.31.gdcfd.dirty

--

From: Linus Torvalds
Date: Saturday, March 22, 2008 - 2:22 pm

From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Subject: [PATCH 2/2] Make git-add behave more sensibly in a case-insensitive environment

This expands on the previous patch, and allows "git add" to sanely handle 
a filename that has changed case, keeping the case in the index constant, 
and avoiding aliases.

In particular, if you have an index entry called "File", but the
checked-out tree is case-corrupted and has an entry called "file"
instead, doing a

	git add .

(or naming "file" explicitly) will automatically notice that we have an
alias, and will replace the name "file" with the existing index
capitalization (ie "File").

However, if we actually have *both* a file called "File" and one called
"file", and they don't have the same lstat() information (ie we're on a
case-sensitive filesystem but have the "core.ignorecase" flag set), we
will error out if we try to add them both.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

The previous patch handled the "nothing changed" case, this one actually 
handles the case of the data needing to be updated.

The CE_ADDED flag is an in-memory flag that just protects a single "git 
add" invocation from changing the same alias twice. That would not be 
right, but if you do separate

	git add File
	git add file

commands, the second one will happily update the information that the 
first one added even if it was different - but

	git add File file

would be an error if they don't have the same stat() information.

 cache.h      |    1 +
 read-cache.c |   37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index 9bce723..81727e4 100644
--- a/cache.h
+++ b/cache.h
@@ -133,6 +133,7 @@ struct cache_entry {
 #define CE_UPDATE    (0x10000)
 #define CE_REMOVE    (0x20000)
 #define CE_UPTODATE  (0x40000)
+#define CE_ADDED     (0x80000)
 
 #define CE_HASHED    (0x100000)
 #define CE_UNHASHED  (0x200000)
diff --git ...
From: Junio C Hamano
Date: Saturday, March 22, 2008 - 11:13 pm

It may well be the case that this lstat() returning success was caused by
a ghost match with a file with different case, and I think it is the right
thing to say "no, it does not exist" if that is the case.

I wonder what happens when the file with the same case does exist that we
are trying to make sure is missing?

As far as I can tell, icase_exists() does not ask "does a file with this
name in different case exist, and a file with this exact case doesn't?"
but asks "does a file with this name, or another name that is different
only in case, exist?".
--

From: Linus Torvalds
Date: Sunday, March 23, 2008 - 8:41 am

Can't happen. This whole code-path only triggers if the entry didn't exist 
in the index when we merge a tree.


Correct. But see the call chain - this thing is only called if index is 
NULL, ie "there was no entry in the index".

So in this case, the other comment (above "icase_exists()") talks about 
that:

	This gets called when there was no index entry for the tree entry 
	'dst', but we found a file in the working tree that 'lstat()' said 
	was fine, [...]

and you can verify that "verify_absent()" only gets called by things where 
"index" was NULL (only three callers, and two of them are expressly inside 
a "if (!old)" case, and the third one is right after a "if (index) return"
statement.

[ There's _one_ special case: the "index" thing may have been NULL not 
  because there was no path in the source index, but because we didn't 
  even look at the index in the first place! So strictly speaking, we 
  should have a test for "o->merge" being set, but afaik that must always 
  be true if we have "o->update" set, and again, this logic only triggers 
  for that case.

  So the only case that doesn't set "o->merge" to get the index is 
  "builtin-read-tree.c" when you do a plain tree-only merge, but that one 
  has

	if ((opts.update||opts.index_only) && !opts.merge)
		usage(read_tree_usage);

  to make sure that you cannot update the working tree without taking the 
  index into account ]

Anyway, I think it's all good. 

			Linus
--

From: Johannes Schindelin
Date: Saturday, March 22, 2008 - 10:36 am

Hi,


As this patch is really unrelated to the series, this comment is really 
unrelated to the content of the patch ;-)

Any point in doing the "From:" and "Date:" inside the mail body?

Ciao,
Dscho
--

From: Linus Torvalds
Date: Saturday, March 22, 2008 - 10:47 am

I encourage people to do that for the kernel ("From:" in particular), 
because while git-am will pick them up from the email headers, that is 
only true if the emails don't get passed around and commented upon by 
others.

Now, in git, the chain-of-command is very short (everybody -> Junio), so 
it really doesn't matter, but in the kernel, when people send out patches 
like this, the patches may be forwarded by others (who add their sign-off 
lines etc), and then it's really good to have that "From:" line in 
particular in the body of the email, because it's more likely that it will 
remain there (otherwise we depend on the person who signs off and forwards 
it to add it!)

			Linus
--

From: Johannes Schindelin
Date: Saturday, March 22, 2008 - 10:57 am

Hi,


Okay.

Thanks,
Dscho
--

From: Steffen Prohaska
Date: Saturday, March 22, 2008 - 3:01 pm

Case insensitive file handling is only activated by
core.ignorecase = true.  Hence, we need to set it to give the tests
in t0050 a chance to succeed.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t0050-filesystem.sh |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


With this commit applied test 2 of t0050 passes.  This is the minimal
change to make t0050 useful.  Eventually test_expect_failure should be
replaced with test_expect_success.

    Steffen

diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 3fbad77..cb109ff 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -34,6 +34,7 @@ test_expect_success 'see if we expect ' '
 
 test_expect_success "setup case tests" '
 
+	git config core.ignorecase true &&
 	touch camelcase &&
 	git add camelcase &&
 	git commit -m "initial" &&
-- 
1.5.4.4.613.gaa46e5

--

From: Dmitry Potapov
Date: Monday, March 24, 2008 - 11:57 pm

We already detect if symbolic links are supported by the filesystem.
This patch adds autodetect for case-insensitive filesystems, such
as VFAT and others.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---

This patch goes on top of lt/case-insensitive

 builtin-init-db.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/builtin-init-db.c b/builtin-init-db.c
index 79eaf8d..62f7c08 100644
--- a/builtin-init-db.c
+++ b/builtin-init-db.c
@@ -254,8 +254,8 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			git_config_set("core.worktree", work_tree);
 	}
 
-	/* Check if symlink is supported in the work tree */
 	if (!reinit) {
+		/* Check if symlink is supported in the work tree */
 		path[len] = 0;
 		strcpy(path + len, "tXXXXXX");
 		if (!close(xmkstemp(path)) &&
@@ -266,6 +266,12 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			unlink(path); /* good */
 		else
 			git_config_set("core.symlinks", "false");
+
+		/* Check if the filesystem is case-insensitive */
+		path[len] = 0;
+		strcpy(path + len, "CoNfIg");
+		if (access(path, F_OK))
+			git_config_set("core.ignorecase", "true");
 	}
 
 	return reinit;
-- 
1.5.5.rc1.10.g3948

--

From: Johannes Schindelin
Date: Tuesday, March 25, 2008 - 2:59 am

Hi,


Clever!

Last time I checked, the "HEAD" file on VFAT was converted to "head" when 
the repository was initialised on Win32 (IIRC) and read on Linux (IIRC).  
Maybe this problem has gone away, but if not, it should definitely be 
fixed (depending on core.ignorecase).

Ciao,
Dscho

--

From: Dmitry Potapov
Date: Tuesday, March 25, 2008 - 3:49 am

We already detect if symbolic links are supported by the filesystem.
This patch adds autodetect for case-insensitive filesystems, such
as VFAT and others.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---

There was a stupid mistake in a previous patch...  (Inadvertantly,
I sent the old version of my patch, which was incorrect, and did not
realize that until saw Johannes' reply to my email.)

 builtin-init-db.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/builtin-init-db.c b/builtin-init-db.c
index 79eaf8d..62f7c08 100644
--- a/builtin-init-db.c
+++ b/builtin-init-db.c
@@ -254,8 +254,8 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			git_config_set("core.worktree", work_tree);
 	}
 
-	/* Check if symlink is supported in the work tree */
 	if (!reinit) {
+		/* Check if symlink is supported in the work tree */
 		path[len] = 0;
 		strcpy(path + len, "tXXXXXX");
 		if (!close(xmkstemp(path)) &&
@@ -266,6 +266,12 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			unlink(path); /* good */
 		else
 			git_config_set("core.symlinks", "false");
+
+		/* Check if the filesystem is case-insensitive */
+		path[len] = 0;
+		strcpy(path + len, "CoNfIg");
+		if (!access(path, F_OK))
+			git_config_set("core.ignorecase", "true");
 	}
 
 	return reinit;
-- 
1.5.5.rc1.10.g3948

--

From: Dmitry Potapov
Date: Tuesday, March 25, 2008 - 4:03 am

It is result of mounting the VFAT disk on Linux with shortname=lower,
which is the default :( Perhaps, it was a reasonable default for DOS
days, but now it only creates troubles. I usually mount VFAT disks with
shortname=winnt on Linux, though some people prefer shortname=mixed.

Dmitry
--

From: Dmitry Potapov
Date: Tuesday, March 25, 2008 - 1:14 am

I also have observed this problem with VFAT on Linux, but the effect was not
stable. It looks like old information is preserved somewhere in caches...

Anyway, I have tested this series of patches a bit on Windows and so far
I have found the following:

- merge different branches were two file names are only differ by case
  will cause that the result branch has two file names that differ only
  by case and one of them will be overwritten by the other and shown as
  modified in the worktree by git status.

- git status cares only about case-insensitivity only for files and not
  for directories. Thus, if case of letters in a directory name is changed
  then this directory will be shown as untracked.

- pattern specified in .gitignore are match as case-sensitive despite
  core.ignorecase set to true.

Personally, I don't care about any of the above issues much as I rarely
work on Windows and when I do, I always check that all filenames are in
low case except Makefile (and a few more exceptions). So, I have never
had any problem with using Git on case-insensitive system...

Dmitry
--

From: Linus Torvalds
Date: Tuesday, March 25, 2008 - 2:04 pm

Ok. So there's two issues here:

 - the git trees themselves had two different names

   This is not something I'm *ever* planning on changing. All my "case 
   insensitive" patches were about the *working*tree*, not about core git 
   data structures themselves.

   In other words: git itself is internally very much a case-sensitive 
   program, and the index and the trees are case-sensitive and will remain 
   so forever as far as I'm concerned. So when you do a tree-level merge 
   of two trees that have two different names that are equivalent in case, 
   git will create a result that has those two names. Because git itself 
   is not case-insensitive.

 - HOWEVER - when checking things out, we should probably notice that 
   we're now writing the two different files out and over-writing one of 
   them, and fail at that stage. I don't know what a good failure 
   behaviour would be, though. I'll have to think about it.

IOW, all my case-insensitivity checking was very much designed to be about 
the working tree, not about git internal representations. Put another way, 
they should really only affect code that does "lstat()" to check whether 

Ahh, yes. This is sad. It comes about because while we can look up whole 
names in the index case-insensitively, we have no equivalent way to look 
up individual path components, so that still uses the "index_name_pos()" 
model and then looking aroung the failure point to see if we hit a 
subdirectory. Remember: the index doesn't actually contain directories at 
all, just lists of files.


This should probably be fairly straightforward. All the logic here is in 
the function "excluded_1()" in dir.c - and largely it would be about 
changing that "strcmp()" into a "strcasecmp()" and using the FNM_CASEFOLD 
flag to fnmatch().

The only half-way subtle issues here are

 - do we really want to use strcasecmp() (which may match differently than 
   our hashing matches!) or do we want to expand on our icase_cmp() or 
   ...
From: Dmitry Potapov
Date: Tuesday, March 25, 2008 - 7:46 pm

Of course, case-insensitivity is about the working tree only. But when
I merge another branch to the current one, git normally checks that it
is not going to overwrite existing files in the *work tree* and refuses
to do the merge if some files may be overwritten.

So if I work on a case-insensitive filesystem and have a file in a different
case and core.ignorecase=false, then the merge fails as expected!

But core.ignorecase=true, which is supposed to do a better job for case-
insensitive filesystems, actually causes the problem here.

Here is my test script:

====
mkdir git-test
cd git-test
git init
git config core.ignorecase true
echo foo > foo
git add foo
git commit -m 'initial commit'

git checkout -b other

echo file > file
git add file
git commit -m 'add file'

git checkout master
echo File > File
git add File
git commit -m 'add File'

# I expect merge to fail here... and it does fail if core.ignorecase
# is set to false, but with core.ignorecase = true, git will overwrite
# 'File'.
# git config core.ignorecase false
git merge other
===

Dmitry
--

From: Linus Torvalds
Date: Tuesday, March 25, 2008 - 8:37 pm

I do agree - but this is not really about the *merge*. It is about the 
checkout.

Imagine that the merge had been done on a sane filesystem, and you're just 
pulling it or cloning it. No merge on your machine, but the problem is the 
same.

		Linus
--

From: Derek Fawcus
Date: Tuesday, March 25, 2008 - 4:39 am

Interesting.
That sounds a bit like the claimed windows 95 properly of 'tunneling' renames.
ISTR that it was to catch a move via 'shortname' which then preserved the longname.

However I'd have expected the Linux version to always use the long name...

DF
--

From: Jan Hudec
Date: Tuesday, March 25, 2008 - 11:26 am

... if it's there -- which it might not. Linux may create it always, but
Windows will not if they think they don't need to.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>
--

Previous thread: hi by Elena on Saturday, March 22, 2008 - 9:30 am. (1 message)

Next thread: Re: [SoC RFC] git statistics - information about commits by Junio C Hamano on Saturday, March 22, 2008 - 12:40 pm. (7 messages)