>>>>> "d" == dhruva <dhruvakm@gmail.com> writes: d> Also, if you clone from systems across time zones, what time do you d> expect to set on the files. I'm just used to tar, cpio, scp -a, rsync -a, ar, etc. using 'date +%s' seconds internally, so no timezone problem. I hate it when I get some latest WhizBang.tgz, only to untar it to find all the files' dates the same, when in fact the README hasn't been touched in seven years, but you can't tell that from ls -l. I recall some content tracker was involved. Of course I'll allowing you to know my delicate first day impressions. I'm sure as I grow older I will learn the difference between content tracker and archiver. --
Well, README was just touched; it wasn't on your disk at all shortly before. This would make a big difference if, for example, you unpacked "foo-1.0" on top of "foo-1.1" and the timestamps were from when the files were originally created, and now all of the source files that changed are older than the object files and the build system does nothing. Of course, with archives, you don't unpack different versions into the same directory, but with a version control system, you'll do it all the time, so you really need the system to put on disk the times when those files were last put there. If you want to know when the README you've got is from (and a whole lot more) "git log README" will tell you, although it won't tell you if somebody yesterday changed the README they're distributing from some other text to a file that's been sitting on their disk untouched for seven years. -Daniel *This .sig left intentionally blank* --
If this is the important bit, perhaps git-archive could be changed to create tarballs with file timestamps based on their commit dates. - Chris --
Based on the principle of least surprise, I'd consider this a rather good
idea.
--
Sincerely,
Stephen R. van den Berg.
To people that say "I could care less" - well, why don't you?
--
Unless I'm missing something, this would make git-archive rather more
expensive than it is now: Tree objects do not record any timestamps,
so figuring out the last commit that changed a file requires a full
history walk in the worst case[*]. (This is another side-effect of
not versioning files.) On the other hand, current git-archive's
running time depends only on the size of the tree-ish given, including
all subtrees and blobs.
My unscientific guesstimates on how much work this would be, in a
random (old) linux-2.6 clone:
$ git rev-parse HEAD
e013e13bf605b9e6b702adffbe2853cfc60e7806
$ time git ls-tree -r -t $(git rev-list HEAD~5000..HEAD) >/dev/null
real 0m1.385s
user 0m1.164s
sys 0m0.220s
$ git rev-list HEAD | wc -l
117812
So reading (and dumping) all those trees and subtrees incurs a penalty
on the order of 30 seconds. Compare to the current running time of
git-archive:
$ time git archive --format=3Dtar HEAD >/dev/null
real 0m2.790s
user 0m2.684s
sys 0m0.072s
Of course, the ratio will keep getting worse as history gets longer.
=2D Thomas
[*] I think to really have a "worst case" here, you need at least one
file in every leaf directory that has not changed since the root
commit, and another that changes in every commit to force the search
to really read every subtree.
=2D-=20
Thomas Rast
trast@{inf,student}.ethz.ch
How many people use git-archive and how many times a day do they use it? For example, kernel.org seems to put out linux-2.x.y.z.tar.bz2 once every 2 to 7 days. The overhead of this new option (and certainly it should be an option, not the default) should be measured not against the old running time, but against the frequency of usage of the tool. Look at it on those time scales, it may not be a big deal. By all accounts, this overhead will not affect the "giterate" [meaning git-literate ;-)] people too much. --
