Before actually implementing delete_ref(), we discussed this "deleted-refs/" idea. but I do not think it is a direction we would want to go. Ref deletion is an operation that happens far far rarer than updates and lookups, and I deliberately chose to optimize for the latter. There are valid reasons to delete refs, and one most frequent one would be to discard throw-away temporary branches you may have needed to switch to when your work was interrupted. But even counting that kind of deletion, I imagine that you would not be creating and removing more than one branch per every 10 commits you would create, and I also imagine you would be invoking not less than 5 operations that inspect project history, such as git-log and git-diff, between commits you make. An operation to build a new commit itself needs at least two lookups (one to see what's current upfront, and another to see nobody touched it upon lockless update). Most history-related operations at least need to look at one (typically, HEAD), and any refname you use to spell the name of an object or revision range (e.g. "v2.6.17..v2.6.18~10" needs to look at tags/v2.6.17 and tags/v2.6.18). Optimizing for deletion path at the expense of giving even a tiny penalty to lookup path is optimizing for a wrong case, and that is why I rejected deleted-refs/ idea when I originally did the delete_ref() implementation. Having said that, I would definitely think there still are rooms for optimization in the current implementation. For example, I do not recall offhand if I made the code to unconditionally repack without the deleted one, or only repack when we know the ref being deleted exists in the packed refs file. The latter obviously would be more efficient and if we currently do not do that, making it do so is a very welcomed change. Especially, given that the latest code does not pack branch heads by default, when a temporary throw-away branch is discarded, it is far more likely that it is not packed and we do not ...
I think that the ref deletion usage depends on the policy of people using git, and there may be people that delete a ref very often. For example, when git becomes a major SCM, there may be people working on big projects that want to create a new branch for each new bug and then delete the branch when the code on the bug branch has been integrated into The operations that inspect project history may use a ref cache or something so that a lookup on the disk may not be needed. So only the ref creation or The lookup code is already using cached packed refs. It could also use cached loose and deleted refs, so the lookup penalty would be very very small. By the way, the OS may already cache loose and deleted ref file stat information, so that may right now be a very small penalty. And at least, algorythmically speaking, with my patch the deletion path is now independent of the number of existing refs, so it's much better (while the lookup path stay the same). If there are thousand of refs and a heavy I/O load, rewritting the packed ref file for each deletion means writing on disk something that may not fit in the disk cache. It may be very bad. My patch also has a few added benefits like making it possible to have a read only packed ref file, while still letting people delete refs. It also make it possible to resurect a deleted ref, or to control branch deletion rights on a per ref directory basis (though that may not be very usefull). And the fact that the packed ref file (which may be read only for added safety) is not rewritten each time a ref is deleted make things much safer if there are many users working on the same git repository. Christian. -
I would say that is a very valid way to work with git, regardless of the size of project. Now, how often would you create such a per-bug branch and delete one, compared to the number of operations that would require ref lookups? Your example actually supports what I've said -- optimizing for Stop and think about what you are saying. What's a ref cache? We do not have such a beast today (unless you equate it with packed-refs file), and you would need to design and implement it, but think about how you make that operate. You would need to invalidate it when you delete a ref using the deleted-ref/ approach; that's not much different from repacking packed-refs file without the ref you just deleted, no? Of course you can argue that instead of repacking you always stat deleted-ref/ hierarchy; in other words, you can argue that you can make deletion path faster by penalizing the lookup path. So I do not think using "ref cache" (whatever it is, and however If the goal is to optimize for deletion path, then that is true. My point is that we do not want to optimize for deletion path at the expense of more costly lookup path. -
I agree completely with Junio. I make a lot of temporary "throw away" branches in Git; often they live on disk for 5/10 minutes at most before getting deleted again. I also make a smaller number (but still significant) of longer lived branches that hang around for days or weeks before getting deleted. In the former case (throw away) I wouldn't want those refs added to the packed refs file. They just don't live around long enough to make it worth it. And when I delete them I want them gone. So moving them off to a 'deleted-refs' directory to indicate they are gone is just delaying the removal. Not something I want. In the latter case (longer lived) I don't mind if I have to sit though an extra 500 ms to rewrite the entire packed refs file during a ref delete operation. I lived with the branch for weeks; I can probably spare a second to finally get rid of it once its gone upstream. Heck, the push to move that branch upstream might actually take longer to unpack the loose objects contained on that Absolutely. I figure I do ref lookups at least 3x the number of ref deletes I perform. And that's just thinking about the sequence of commands I commonly perform against my "throw away" branches which live for at most 10 minutes, let alone my longer lived branches that hang around for weeks. -- Shawn. -
