Re: Untracked working tree files

Previous thread: --diff-filter=T does not list x changes by Anders Melchiorsen on Wednesday, October 15, 2008 - 11:42 am. (6 messages)

Next thread: git-svnimport.perl bug when copy source path has a revision by Karl Chen on Wednesday, October 15, 2008 - 12:11 pm. (3 messages)
From: Andrew Morton
Date: Wednesday, October 15, 2008 - 11:56 am

I often get this (running git 1.5.6.rc0 presently):

y:/usr/src/git26> git-checkout linux-next
error: Untracked working tree file 'arch/x86/kernel/apic.c' would be overwritten by merge.

which screws things up.  I fix it by removing the offending file, which
gets irritating because git bails out after the first such instance, so
I need to rerun git-checkout once per file (there are sometimes tens of them).

Should this be happening?  I don't know what causes it, really.  All
I've been doing in that directory is running `git-checkout' against
various maintainers' trees.  95% of the time this works OK but
eventually git seems to get all confused and the above happens.

Is there some way in which I can work around this with a single command
rather than having to run git-checkout once per offending file?  I
suppose a good old `rm -rf *' would do it...

Thanks.
--

From: david
Date: Wednesday, October 15, 2008 - 12:09 pm

what I do when I run into this is "git reset --hard HEAD" which makes all 
files in the working directory match HEAD, and then I can do the other 
checkout.

--

From: david
Date: Wednesday, October 15, 2008 - 12:14 pm

I think you can also do git checkout -f head to force the checkout to 
overwrite all files

the fact that git will happily leave modified things in the working 
directory appears to be very helpful for some developers, but it's also a 
big land mine for others.

is there a way to disable this?

David Lang
--

From: Andrew Morton
Date: Wednesday, October 15, 2008 - 12:24 pm

On Wed, 15 Oct 2008 12:14:34 -0700 (PDT)



These files weren't modified.  By me, at least.  git might have
"modified" them, but it has all the info necessary to know that the
--

From: Andrew Morton
Date: Wednesday, October 15, 2008 - 12:26 pm

On Wed, 15 Oct 2008 12:14:34 -0700 (PDT)

I do

	git-reset --hard HEAD
	git-reset --hard linux-next
	git-checkout linux-next

and get

error: Untracked working tree file 'Next/SHA1s' would be overwritten by merge.
y


--

From: Nicolas Pitre
Date: Wednesday, October 15, 2008 - 12:32 pm

What about simply:

	git-checkout -f linux-next


Nicolas
--

From: Nicolas Pitre
Date: Wednesday, October 15, 2008 - 12:34 pm

Never mind -- you apparently did that already with success.


Nicolas
--

From: Linus Torvalds
Date: Wednesday, October 15, 2008 - 12:31 pm

Hmm. It doesn't actually do that normally. If you switch between trees, 
git will (or _should_) remove the old files that it knows about. If you 
get a lot of left-over turds, there's something wrong.

It could be a git bug, of course. That said, especially considering the 
source of this, I wonder if it's just that Andrew ends up using all those 
non-git scripts on top of a git tree, and then that can result in git 
*not* knowing about a certain file, and then when switching between trees 
(with either git checkout or with git reset), the data that was created 
with non-git tools gets left behind and now git will be afraid to 
overwrite it.

So yes, there are ways to force it (both "git checkout -f"  and "git reset 
--hard" having already been mentioned), but the need for that - especially 
if it's common - is a bit discouraging.

Especially since it's still possible that it's some particular mode of git 
usage that leaves those things around. Andrew - have you any clue what it 
is that triggers the behavior?

(By the filename, I realize it's a file that doesn't exist in one tree or 
the other, and which doesn't get removed at some point. But have you had 
merge failures, for example? Is it perhaps a file that was created during 
a non-clean merge, and then got left behind due to the merge being 
aborted? It would be interesting to know what led up to this..)

			Linus
--

From: david
Date: Wednesday, October 15, 2008 - 12:42 pm

I see it fairly frequently when switching between different branches of a 
project.

I also see it when I try applying a patch to a tree, then want to get up 
to date with that tree (in this case it really is different)

It could be that git is looking to see if the file is the same as the old 
tree had it before checking out the new tree. if it isn't for any reason 
it sounds the alert.

--

From: Linus Torvalds
Date: Wednesday, October 15, 2008 - 12:56 pm

So, at least for any normal switch, assuming file 'a' doesn't exist in the 
other branch, you really should have a few different cases:

 - you have a dirty file, and git should say something like

	error: You have local changes to 'file'; cannot switch branches.

   because it refuses to modify the file to match the other branch (which 
   includes removing it) if it doesn't match the index.

   So this case shouldn't leave anything behind.

 - You have that extra file, but it's not in the index.

   If it's in your current HEAD, we should still notice it with something 
   like:

	error: Untracked working tree file 'tree' would be removed by merge.

   because now it's untracked (not in the index), but the switching 
   between branches tries to essentially "apply" the difference between 
   your current HEAD and the new branch, and finds that the difference 
   involves removing a file that git isn't tracking.

See?

HOWEVER.

If you're used to doing "git checkout -f" or "git reset --hard", both of 
those checks are just ignored. After all, you asked for a forced switch. 

And at least in the second case, what I think happens is that git won't 
remove the file it doesn't know about, so you'll have a "turd" left 
around.

So yes, you can certainly get these kinds of left-overs, but they really 
should be only happening if you "force" something. Do you do that often?

			Linus
--

From: david
Date: Wednesday, October 15, 2008 - 1:17 pm

one place that I know I've run into it frequently is in an internal 
project that I did not properly setup .gitignore and did "git add ." and 
"git commit -a" to. that projects repository contains the compiled 
binaries and I frequently get these errors when switching trees.

that sounds like the first case.

I've seen discussion of a new sequencer functionality, would it allow me 
to define a .gitignore file and re-create the repository as if that file 
had existed all along?

--

From: Andrew Morton
Date: Wednesday, October 15, 2008 - 12:49 pm

On Wed, 15 Oct 2008 12:31:40 -0700 (PDT)

I treat my git directory as a read-only thing.  I only ever modify it


That's certainly a possibility - I get a lot of merge failures.  A real
lot.  And then quite a bit of rebasing goes on, especially in
linux-next.  And then there's all the other stuff which Stephen does on
top of the underlying trees to get something releasable happening.


--

From: Linus Torvalds
Date: Wednesday, October 15, 2008 - 1:08 pm

Is "git checkout -f" part of the scripting? Or "git reset --hard"?

So what I could imagine is happening is:

 - you have a lot of automated merging

 - a merge goes south with a data conflict, and since it's all automated, 
   you just want to throw it away. So you do "git reset --force" to do 
   that.

 - but what "git reset --hard" means is to basically ignore all error 
   cases, including any unmerged entries that it just basically ignores.

 - so it did set the tree back, but the whole point of "--hard" is that it 
   ignores error cases, and doesn't really touch them.

Now, I don't think we ever really deeply thought about what the error 
cases should do when they are ignored. Should the file that is in some 
state we don't like be removed? Or should we just ignore the error and 
return without removing the file? Generally git tries to avoid touching 
things it doesn't understand, but I do think this may explain some pain 
for you, and it may not be the right thing in this case.

(And when I say "this case", I don't really know whether you use "git 
checkout -f" or "git reset --hard" or something else, so I'm not even 
going to say I'm sure exactly _which_ case "this case" actually us :)

Of course, the cheesy way for you to fix this may be to just add a

	git clean -dqfx

to directly after whatever point where you decide to reset and revert to 
an earlier stage. That just says "force remove all files I don't know 
about, including any I might ignore". IOW, "git reset --hard" will 
guarantee that all _tracked_ files are reset, but if you worry about some 
other crud that could have happened due to a failed merge, that additional 
"git clean" may be called for.

Of course, it's going to read the whole directory tree and that's not 
really cheap, but especially if you only do this for error cases, it's 
probably not going to be any worse. And I'm assuming you're not compiling 
in that tree, so you probably don't want to save object files (you can 
remove ...
From: Andrew Morton
Date: Wednesday, October 15, 2008 - 1:23 pm

On Wed, 15 Oct 2008 13:08:36 -0700 (PDT)

well, this script has been hacked on so many times I'm not sure what
it does any more.

Presently the main generate-a-diff function is

doit()
{
	tree=$1
	upstream=$2

	cd $GIT_TREE
	git checkout "$upstream"
	git reset --hard "$upstream"
	git fetch "$tree" || exit 1
	git merge --no-commit 'test merge' HEAD FETCH_HEAD > /dev/null

	{
		git_header "$tree"
		git log --no-merges ORIG_HEAD..FETCH_HEAD
		git diff --patch-with-stat ORIG_HEAD
	} >$PULL/$tree.patch
	{
		echo DESC
		echo $tree.patch
		echo EDESC
		git_header "$tree"
		git log --no-merges ORIG_HEAD..FETCH_HEAD
	} >$PULL/$tree.txt
	git reset --hard "$upstream"
}

usually invoked as

doit origin v2.6.27
doit origin linux-next

etc.

the above seemed fairly busted, so I'm now using

        git checkout -f "$upstream"
        git reset --hard "$upstream"
        git fetch "$tree" || exit 1

which seems a bit more sensible.  Perhaps I should do the reset before
the checkout, dunno.

That function has been through sooooooo many revisions and each time
some scenario get fixed (more like "improved"), some other scenario
gets busted (more like "worsened").  The above sorta mostly works,
although it presently generates thirty-odd rejects against
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip.git#auto-latest,
which is way above my fix-it-manually threshold.  linux-next is still
dead because it's taking Stephen over two days to fix the mess he's


Yeah, there's no easy solution here, and I suspect the real solution is
"read programmer's mind".  Providing a reliable override (like -f) is a



--

From: Paolo Ciarrocchi
Date: Thursday, October 16, 2008 - 1:42 am

On Wed, Oct 15, 2008 at 10:23 PM, Andrew Morton

Hi Andrew,
I was wondering whether you could share the scripts you built on top of git,
you might get some useful suggestions from this list and they could be
inspiration for further improvement in GIT (it just happened with this
thread ;-)

Thanks.

Ciao,
-- 
Paolo
http://paolo.ciarrocchi.googlepages.com/
--

From: Andrew Morton
Date: Thursday, October 16, 2008 - 2:32 am

oh gee, you don't want to look.  It should all be in
http://userweb.kernel.org/~akpm/stuff/patch-scripts.tar.gz

But really it's just the one script, pull-git-patches, below.  That
thing's been hacked around so much that I daren't breathe on it.

Fortunately as long as Stephen Rothwell is producing linux-next I don't
have much need for it any more.




#!/bin/sh

GIT_TREE=/usr/src/git26
PULL=/usr/src/pull

git_header()
{
	tree="$1"
	echo GIT $(cat .git/refs/heads/$tree) $(cat .git/branches/$tree)
	echo
}

# maybe use git clean -dqfx

doit()
{
	tree=$1
	upstream=$2

	cd $GIT_TREE
	git checkout -f "$upstream"
	git reset --hard "$upstream"
	git fetch "$tree" || exit 1
	git merge --no-commit 'test merge' HEAD FETCH_HEAD > /dev/null

	{
		git_header "$tree"
		git log --no-merges ORIG_HEAD..FETCH_HEAD
		git diff --patch-with-stat ORIG_HEAD
	} >$PULL/$tree.patch
	{
		echo DESC
		echo $tree.patch
		echo EDESC
		git_header "$tree"
		git log --no-merges ORIG_HEAD..FETCH_HEAD
	} >$PULL/$tree.txt
	git reset --hard "$upstream"
}

do_one()
{
	tree=$1
	upstream=$2
	if [ ! -e $PULL/$tree.patch ]
	then
		echo "*** doing $tree, based on $upstream"
		git branch -D $tree
		doit $tree $upstream
	else
		echo skipping $tree
	fi
}

mkdir -p $PULL

if [ $1"x" = "-x" ]
then
	exit
fi

cd $GIT_TREE
git checkout -f master

cd /usr/src

if [ $# == 0 ]
then
	trees=/usr/src/git-trees
else
	trees="$1"
fi

if [ $# == 2 ]
then
	do_one $1 $2
else
	while read x
	do
		if echo $x | grep '^#.*' > /dev/null
		then
			true
		else
			do_one $x
		fi
	done < $trees
fi



--

From: Linus Torvalds
Date: Wednesday, October 15, 2008 - 1:23 pm

Actually, with your filename, I suspect the conflict would be not a real 
file content, but more of a "delete" conflicting with a modification to 
that file. IOW, I'm guessing that the thing you hit with 
arch/x86/kernel/apic.c was that some branch you pulled:

 - created that file

 - deleted arch/x86/kernel/apic_[32|64].c

 - the old file got marked as a rename source for the new apic.c and 
   there was a data conflict when trying to apply the changes.

as a result, your working tree would have that "apic.c" file in it, but 
with conflict markers, and marked as unmerged.

When you then do "git reset --hard", it will just ignore unmerged entries, 
and since the original tree (and the destination tree) match, and neither 
of them contain apic.c either, git will totally ignore that file and not 

It's "--hard", not "--force". Yeah, the git reset flags are insane. As is 
the default action, for that matter. It's one of the earliest interfaces, 
and it's stupid and reflects git internal implementations rather than what 
we ended up learning about using git later. Oh, well.

But 'git checkout -f' (which is nicer from a user interface standpoint) 
has the exact same logic and I think shares all the implementation. I 
think they both end up just calling "git read-tree --reset -u".

It's quite possible that we should remove unmerged entries. Except that's 
not how our internal 'read_cache_unmerged()' function works. It really 
just ignores them, and throws them on the floor. We _could_ try to just 
turn them into a (since) stage-0 entry.

Junio?

			Linus
--

From: Andrew Morton
Date: Wednesday, October 15, 2008 - 1:30 pm

On Wed, 15 Oct 2008 13:23:50 -0700 (PDT)

That sounds likely.  I suspect things were especially bad today because
I accidentally pulled four-week-old linux-next, which had over 500
rejects in it.


--

From: Junio C Hamano
Date: Wednesday, October 15, 2008 - 3:06 pm

I'd agree that dropping unmerged entries to stage-0 when we can would make
sense.  An conflicted existing path would get an stage-0 entry in the
index, which is compared with the switched-to HEAD (which could be the
same as the current one when "git reset --hard" is run without a rev), we
notice that they are different and the index entry and the work tree path
is overwritten by the version from the switched-to HEAD.  For a new path
that a failed merge tried to bring in, we notice that the switched-to HEAD
does not have that path and happily remove it from the index and from the
work tree.  All will go a lot smoother than the current code.

I am not sure what should happen when we can't drop the unmerged entry
down to stage-0 due to D/F conflicts, though.  IIRC, read-tree proper
would not touch the work tree in such a case, but merge-recursive creates
our and their versions with funny suffixes, which will not be known to the
index and will be left in the working tree.

--

From: Junio C Hamano
Date: Wednesday, October 15, 2008 - 4:00 pm

When aborting a failed merge that has brought in a new path using "git
reset --hard" or "git read-tree --reset -u", we used to first forget about
the new path (via read_cache_unmerged) and then matched the working tree
to what is recorded in the index, thus ending up leaving the new path in
the work tree.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Junio C Hamano <gitster@pobox.com> writes:

 > Linus Torvalds <torvalds@linux-foundation.org> writes:
 >
 >> On Wed, 15 Oct 2008, Linus Torvalds wrote:
 >>> 
 >> It's quite possible that we should remove unmerged entries. Except that's 
 >> not how our internal 'read_cache_unmerged()' function works. It really 
 >> just ignores them, and throws them on the floor. We _could_ try to just 
 >> turn them into a (since) stage-0 entry.
 >>
 >> Junio?
 >
 > I am not sure what should happen when we can't drop the unmerged entry
 > down to stage-0 due to D/F conflicts, though.  IIRC, read-tree proper
 > would not touch the work tree in such a case, but merge-recursive creates
 > our and their versions with funny suffixes, which will not be known to the
 > index and will be left in the working tree.

 I am still unsure what we should do when we hit D/F conflicts; this one
 simply replaces but it may be safer to drop ADD_CACHE_OK_TO_REPLACE from
 the options to trigger an error in such a case.  I dunno.

 read-cache.c               |   32 +++++++++++++++++++-------------
 t/t1005-read-tree-reset.sh |   30 ++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index c229fd4..efbab6a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1489,25 +1489,31 @@ int write_index(const struct index_state *istate, int newfd)
 int read_index_unmerged(struct index_state *istate)
 {
 	int i;
-	struct cache_entry **dst;
-	struct cache_entry *last = NULL;
+	int unmerged = 0;
 
 	read_index(istate);
-	dst = istate->cache;
 	for (i = 0; i < istate->cache_nr; i++) {
 ...
From: Linus Torvalds
Date: Wednesday, October 15, 2008 - 4:16 pm

Looks good to me. And from my tests, I think "git checkout -f" didn't have 
this problem at all, because it ends up using not got read-tree, but doing 
its own "reset_tree()" that uses unpack_trees().

I do wonder if "git reset" should perhaps be written in those terms, 
instead of just being a wrapper around git read-tree. But the patch looks 
fine.

		Linus
--

From: Junio C Hamano
Date: Wednesday, October 15, 2008 - 11:27 pm

Let's do this for 'maint' and I'll let others think about possible
improvements, then ;-).

--

From: Ingo Molnar
Date: Thursday, October 16, 2008 - 12:20 am

i've met this problem in various variants in the past few months, and i 
always assumed that it's "as designed" - as Git's policy is to never 
lose information unless forced to do so. (which i find very nice in 
general, and which saved modification from getting lost a couple of 
times in the past)

the situations where i end up with a messed up working tree [using 
git-c427559 right now]:

 - doing a conflicted Octopus merge will leave the tree in some weird 
   half-merged state, with lots of untracked working tree files that not 
   even a hard reset will recover from. The routine thing i do to clean 
   up is:

      git reset --hard HEAD
      git checkout HEAD .
      git ls-files --others | xargs rm              # DANGEROUS

   doing git checkout -f alone is not enough, as there might be various 
   dangling files left around.

 - git auto-gc thinking that it needs to do another pass in the middle 
   of a random git operation, but i dont have 10 minutes to wait so i 
   decide to Ctrl-C it.

 - doing the wrong "git checkout" and then Ctlr-C-ing it can leave the
   working tree in limbo as well, needing fixups. If i'm stuck between
   two branches that rename/remove files it might need the full fixup
   sequence above.

 - if a testbox has a corrupted system clock, its git repo and the 
   kernel build can get confused. This is to be expected i think - but
   the full sequence above will recover the corrupted tree. Not much Git
   can do about this i guess.

Does your fix mean that all i have to do in the future is a hard reset 
back to HEAD, and that dangling files are not supposed to stay around?

	Ingo
--

From: Junio C Hamano
Date: Thursday, October 16, 2008 - 7:49 am

As long as the index *somehow* knows about these new files, they are
removed.

The situation is:

 (0) you start from a HEAD that does not have path xyzzy;
 (1) you attempt to merge a rev that has path xyzzy;
 (2) the merge conflicts, leaving higher staged index entries for the
     path.
 (3) you decide not to conclude the merge by saying "reset --hard".

The old logic for "reset" was to remove paths that exist in the index at
stage #0 (i.e. cleanly merged) and not in HEAD.  The patch changes the
rule to remove paths that exist in the index at any stage (i.e. including
the ones that have conflicted and not resolved yet) and not in HEAD.


--

Previous thread: --diff-filter=T does not list x changes by Anders Melchiorsen on Wednesday, October 15, 2008 - 11:42 am. (6 messages)

Next thread: git-svnimport.perl bug when copy source path has a revision by Karl Chen on Wednesday, October 15, 2008 - 12:11 pm. (3 messages)