Linux: Moving and Changing Code

Submitted by Jeremy
on July 16, 2007 - 1:59pm

In response to a recent merge request, Linus Torvalds explained a best practice when moving and changing code, "when doing renames it is generally *much* nicer to do a 100% rename (perhaps with just _trivial_ changes to make it compile - the include statements etc change, and maybe you want to change the name in the comment header too)." He went on to explain, "doing 'move the code and change it at the same time' is considered bad form. Movement diffs are much harder to read anyway (a traditional diff will show it as a new-file + delete, of course), so the general rule is: move code around _without_ modifying it, so that code movement (whether it's a whole file, or just a set of functions between files) doesn't really introduce any real changes, and is easier to look through the changes; do the actual changes to the code as a separate thing." He went on to note why this is especially important during Linux development, "where patches are the main way people communicate.":

"And when using git, the whole 'keep code movement separate from changes' has an even more fundamental reason: git can track code movement (again, whether moving a whole file or just a function between files), and doing a 'git blame -C' will actually follow code movement between files. It does that by similarity analysis, but it does mean that if you both move the code *and* change it at the same time, git cannot see that 'oh, that function came originally from that other file', and now you get worse annotations about where code actually originated."


From:	Linus Torvalds [email blocked]
To:	Latchesar Ionkov [email blocked]
Subject: Re: [V9fs-developer] [GIT PULL] 9p Patches for 2.6.23 merge window
Date:	Mon, 16 Jul 2007 09:29:35 -0700 (PDT)


On Sun, 15 Jul 2007, Latchesar Ionkov wrote:
>
> I thought that it is not a good idea to keep the v9fs_ prefix for code
> that is in different places (fs/9p and net/9p). If keeping the old
> prefix is more acceptable, I can create a new patch without the
> "v9fs_"->"p9_" renames.

It's fine, I don't care *that* much, and I already pulled. If it had been 
something more central, I'd have rejected it, but soemthing as specialized 
as the Plan9 fs, I just wanted to point out that this is now how we should 
do things.

In other words: when doing renames it is generally *much* nicer to do a 
100% rename (perhaps with just _trivial_ changes to make it compile - the 
include statements etc change, and maybe you want to change the name in 
the comment header too).

Doing "move the code and change it at the same time" is considered bad 
form. Movement diffs are much harder to read anyway (a traditional diff 
will show it as a new-file + delete, of course), so the general rule is:

 - move code around _without_ modifying it, so that code movement (whether 
   it's a whole file, or just a set of functions between files) doesn't 
   really introduce any real changes, and is easier to look through the 
   changes.

 - do the actual changes to the code as a separate thing.

This should be true in just about *any* development model, and it's 
especially true in Linux, where patches are the main way people 
communicate.

And when using git, the whole "keep code movement separate from changes" 
has an even more fundamental reason: git can track code movement (again, 
whether moving a whole file or just a function between files), and doing a 
"git blame -C" will actually follow code movement between files. It does 
that by similarity analysis, but it does mean that if you both move the 
code *and* change it at the same time, git cannot see that "oh, that 
function came originally from that other file", and now you get worse 
annotations about where code actually originated.

So next time, please don't move code and change it at the same time.

		Linus

Related Links:

GIT limitation, not good programming practice

Anonymous (not verified)
on
July 16, 2007 - 2:47pm

I think moving and renamming is not bad, unless you are using GIT. This is a limitation of GIT only. If you use another source code management tool, this is not a problem. GIT developpers should IMHO try to find a way to fix this buggy behavior. Developpers should not have to bother with conceptual bugs when they do something that follow good programming practice like a complete rework of a module.

renames & git

Anonymous (not verified)
on
July 16, 2007 - 3:20pm

i don't quite agree.

in other tools you have to tell the system explicitly you renamed a file. that's just the same hassle as doing a rename-only commit. the issue is certainly not a git limitation: if you rename AND completely change the contents of a file without telling the system, not even a human could detect the rename ... or is there a system that can handle "rename and splitup in 3 separate files" fine ?

And for my personal use (also git), i just rename and change in the same commit, and git always detected the rename just fine.

which systems have easier/better rename handling ?

"This is a limitation of GIT

Anonymous (not verified)
on
July 16, 2007 - 7:26pm

"This is a limitation of GIT only."

Absolutely not true. CVS and SVN do not like such changes. In fact with CVS you can only track a change in the file metadata - and you'd better remember to provide the entire previous history of the (older) file or else that will all be lost when you move the file. So what version control software do you use that magically tracks a move+change?

Bullshit. It is not a

Anonymous (not verified)
on
July 17, 2007 - 4:36am

Bullshit. It is not a conceptual _bug_ and especially not of git, it is a fundamental limitation of what _any_ tool can do.

Renames

andyjb (not verified)
on
July 17, 2007 - 5:30am

Hi,

Linus is not just talking about renaming *entire files*, he's also talking about variable and function renames, etc. In the general case you have to move the code before you can modify in order to keep a detectable link between the source and destination. git doesn't care about files, it just tracks the state of the *entire* repository and it can analyse the way in which its content moves around. The tracking works the only way it can: based on similarity matching across commits. Whether you rename an entire file or just move part is completely irrelevant.

Read the article again, you

Anonymous (not verified)
on
July 17, 2007 - 8:06am

Read the article again, you completely missed the point. This is not about working around some kind of GIT limitation, this is about working in a structured way.

git usage aside, commits

Anonymous (not verified)
on
July 17, 2007 - 8:59am

git usage aside, commits should be small and self-contained with a single purpose. Changing the functionality of functions/files and moving functions/files are two different actions, so they should be done in two separate commits. Easy as that.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.