file rename causes history to disappear

Previous thread: [PATCH 6/7] gitweb: Assume parsed revision list in git_shortlog_body and git_history_body by Jakub Narebski on Wednesday, September 6, 2006 - 9:08 am. (4 messages)

Next thread: Re: [PATCH 0/7] gitweb: Trying to improve history view speed by Jakub Narebski on Wednesday, September 6, 2006 - 1:06 pm. (2 messages)
To: Git Mailing List <git@...>
Date: Wednesday, September 6, 2006 - 10:52 am

I moved a bunch of SATA drivers in the Linux kernel from drivers/scsi to
drivers/ata.

When I tried to look at the past history of a file using
git-whatchanged, post-rename, it only shows the history from HEAD to the
point of rename. Everything prior to the rename is lost.

I also tried git-whatchanged on the old path, but that produces an error.

[jgarzik@pretzel libata-dev]$ rpm -q git-core
git-core-1.4.1-1.fc5

Repository ("upstream" branch):
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

-

To: Jeff Garzik <jeff@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, September 6, 2006 - 11:38 am

For filenames that don't exist right now, you need to clearly separate the
revision name from the filename (ie you need to use "--").

There were patches to do "--follow-rename" which I don't think got applied
yet, but in the meantime, just do

git whatchanged -M -- drivers/ata/filename.c drivers/scsi/filename.c

where the "-M" means "show diffs as renames if possible" (which is
different from having the history actually _follow_ them), and the "--" is
the filename separator to tell git that the nonexistent
"drivers/ata/filename.c" file isn't a (currently) nonexistent revision
name, it's a (currently) nonexistent _filename_.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, September 6, 2006 - 11:46 am

Since I'm just interested in the log (ATM), even the lack of "-M" seems
to produce useful results. Thanks.

IMO it is highly counter-intuitive that renames are -not- followed. I
don't see the point of a "--follow-rename", it should Just Work(tm).

Jeff

-

To: Jeff Garzik <jeff@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, September 6, 2006 - 12:14 pm

No, it should not.

You haven't thought it through, and I excuse you, because even people who
should know better (and design SCM's) often haven't thought it through.

There's a huge difference between "pathname" and "inode". And git operates
on _pathnames_, not on inodes. So when you give a pathname specifier,
that's _exactly_ what it is. It's a pathname specifier, _not_ an "inode"
specifier.

And pathnames don't change. They're just names for paths to possibly
_find_ a file/inode. They can't be "renamed". The data that is found
behind a pathname may be moved to _another_ pathname (and we call that a
rename), but that doesn't change the original pathname in any way, shape,
or form.

Now, you can say "git shouldn't work with pathnames, it should work with
inodes, and use the pathnames to look them up", but you'd be wrong. You'd
be wrong for many reasons, so let me explain:

- pathnames are actually often a hell of a lot more interesting that
"inodes". Doing thing by pathname means that you have sane and
well-defined semantics for something like

git log -- drivers/scsi drivers/ata include/linux/ata.h

even if (for example) some of those files or directories don't
necessarily even exist at one particular point in time. Exactly
_because_ a pathname is not actually affected by the contents of the
repository.

So taking a filename-based approach is actually more _powerful_. You
can emulate the "follow a single file" behaviour on top of it, but you
can't sanely go the other way.

- following inodes/files instead of following pathnames happens to also
be fundamentally ambiguous when you split or merge the file contents.
What happens? You simply _cannot_ describe that in the form of "files".
It's impossible. Really. Yet it's actually fairly common.

In contrast, if you think of pathnames of _pathnames_ (rather than the
contents they point to), that particular sticky wicket simply doesn't
exist. It's a ...

To: Jeff Garzik <jeff@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, September 6, 2006 - 12:37 pm

Side note: one thing that I wanted to do, but never got around to, is to
allow wildcards in the tree-parsing code. It might be too expensive, but
it's still occasionally something I'd like to do:

git log -- 'mm/*.c'

to track every single C file in the VM (even if they don't exist right
_now_).

Notice the difference between

git log mm/*.c

and the above idea - the latter does actually work, but it only tracks the
C files that exist right now under mm/. But it should be possible (and is
potentially useful) to let the wildcard act over the history, rather than
just a single point in time.

Because one additional advantage of thinking in terms of pathnames is
exactly the fact that wildcards make sense in a way that they do _not_
make sense if you think of tracking "inodes". Exactly because "pathnames
are forever", and a pathname has validity and exists regardless of whether
a repository contains a _file_ with that name at any particular point in
time.

So right now git does do the wildcard thing, but only for "git ls-files"
(and through that, things like "git add", which used to be implemented in
terms of ls-files). So you can do

git add '*.c'

to add all C files (recursively - it's not the shell matcher).

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Wednesday, September 6, 2006 - 3:29 pm

I am happy to see we are in agreement. I touched this in the
ending note to

http://article.gmane.org/gmane.comp.version-control.git/26432

The only people who will get burnt by this change are the ones
with metacharacters in their pathnames, so it is relative safe
change.

I think 'git grep' pathspec code is probably the best to reuse
to convert diff-tree family. It knows how to match globs while
traversing a tree down without descending into a subtree that
would never match, which is what we need for them.

-

To: Junio C Hamano <junkio@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Thursday, September 7, 2006 - 6:16 am

May be make metacharacters the default behaviour, but provide a
command-line option to disable it? It'll be seldom used, but would
provide a way to disambiguate input for scripts and make possible
(even if a bit harder) to use such filenames.

-

To: Junio C Hamano <junkio@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Wednesday, September 6, 2006 - 5:45 pm

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> The only people who will get burnt by this change are the ones
Junio> with metacharacters in their pathnames, so it is relative safe
Junio> change.

But does that mean you'll provide the equivalent to "fgrep" for "grep",
as in a switch that turns this off, or a seperate command?

I can think of times when I might be trying to track a file with a square
bracket in the name.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
-

To: Randal L. Schwartz <merlyn@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Wednesday, September 6, 2006 - 8:52 pm

If your path is "foo.c[1]" then "foo.c[1]" as fnmatch() pattern
would not obviously match it, which is sad.

However, we do try to match the path literally before falling
back to fnmatch() so in practice I do not think it is so bad.

$ git ls-files -s ;# everybody has "hello world".
100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0 foo.c
100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0 foo/bar[1]/baz/boa.c
100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0 foo/bar[2].c
$ git grep hello -- 'foo/bar[1]'
foo/bar[1]/baz/boa.c:hello world
$ git grep hello -- 'foo/bar[[]*[]]*'
foo/bar[1]/baz/boa.c:hello world
foo/bar[2].c:hello world
$ git grep hello -- 'fo*'
foo.c:hello world
foo/bar[1]/baz/boa.c:hello world
foo/bar[2].c:hello world
$ exit

-

To: Jeff Garzik <jeff@...>
Cc: <git@...>
Date: Wednesday, September 6, 2006 - 11:05 am

Try "git log -- old/path/...". Path limiting works without "--" only if
the path exists.

--
http://onion.dynserv.net/~timo/
-

Previous thread: [PATCH 6/7] gitweb: Assume parsed revision list in git_shortlog_body and git_history_body by Jakub Narebski on Wednesday, September 6, 2006 - 9:08 am. (4 messages)

Next thread: Re: [PATCH 0/7] gitweb: Trying to improve history view speed by Jakub Narebski on Wednesday, September 6, 2006 - 1:06 pm. (2 messages)