I would not be too specific here about the exact syntax. I would rather
have an example where this might be useful.
In git.git, for example, you could point to pretty_print_commit() which
was split out from commit.c into pretty.c in 93fc05e(Split off the pretty
print stuff into its own file), and mention that it is hard to verify
without much hassle that the code split was really only a code split,
rather than a split with an evil change.
Or you could point to 691f1a2(replace direct calls to unlink(2) with
unlink_or_warn), where code was refactored, into a new function
(unfortunately in two commits, so it might be a case not covered by your
project) and it might be somebody's task to find out the original author
for that function.
Basically, I would like to have a structure in the proposal like this:
what? why? how? when?
Do not forget the case where there are more than one source of a code
move. Think "refactoring".
I would like this not to be specified too much here. For example, we do
not know yet, whether the matching will be fuzzy, or whether we find
something cleverer than that.
So, I suggest to list not the command line options, but what you intend to
support. I.e.:
Here you do not need to say that it is -m<num>, but that you want to
support following code movements both inside and between files, but only
optionally, for performance reasons (or some such).
In any case, this would probably just reuse the -M option.
It would be more in line with the diff options to use -U, but you do not
have to state that. Just talk about a configurable amount of context.
Again, there are better options for "git log" already, but you do not need
to be too explicit on the syntax side. Just say that you want to be able
to use as many of "git log"s options as make sense in the context of
line-level history.
See above.
See above.
See above.
Again, do not be too specific about details that have to be fleshed out
while working on the project. For example, we do not know yet whether it
would make more sense to look for code movements automatically when we
detected a deletion, and maybe fall back automatically to detecting code
copies when we found an inter-file move.
s/ed/ing/
Good.
Good.
IMHO this should be split into
1a) have an initial version which does nothing else than parse
git-log options and a single additional -L, requiring exactly
one file to be specified
1b) implement the xdiff callback and identify the commits touching
the line range (this is not completely trivial due to merges)
Again, this has to be split a little bit. Code can split, and it can also
unite. So, a single line range can easily become multiple ones.
s/but not a file/between files/
You mean code copy from somewhere in the same file?
For fuzzy matching support, I would add some ideas, such as trying to
match alpha-numeric characters, or matching longest words or some such.
Also mention the possibility that this might be infeasible. In any case,
give an example what case this is trying to help with.
Hmm. Maybe it would be better to be more precise. Like: 1st week: follow
the bird's eye view on Git's source code. 2nd week, analyze the rev-list
machinery (probably first looking at the code of merge-base, for easier
understanding), 3rd week, have a look at builtin/log.c, 4th week,
understand blame.c
This should probably adjusted a bit to my suggestions above.
Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html