Re: timestamps not git-cloned

Previous thread: timestamps not git-cloned by jidanni on Thursday, November 27, 2008 - 7:24 pm. (7 messages)

Next thread: [PATCH] gitk: Updated German translation. by Christian Stimming on Friday, November 28, 2008 - 3:46 am. (1 message)
From: jidanni
Date: Thursday, November 27, 2008 - 10:06 pm

>>>>> "d" == dhruva  <dhruvakm@gmail.com> writes:

d> Also, if you clone from systems across time zones, what time do you
d> expect to set on the files.

I'm just used to tar, cpio, scp -a, rsync -a, ar, etc. using 'date +%s'
seconds internally, so no timezone problem.

I hate it when I get some latest WhizBang.tgz, only to untar it to
find all the files' dates the same, when in fact the README hasn't
been touched in seven years, but you can't tell that from ls -l. I
recall some content tracker was involved.

Of course I'll allowing you to know my delicate first day impressions.
I'm sure as I grow older I will learn the difference between content
tracker and archiver.
--

From: Daniel Barkalow
Date: Thursday, November 27, 2008 - 11:59 pm

Well, README was just touched; it wasn't on your disk at all shortly 
before. This would make a big difference if, for example, you unpacked 
"foo-1.0" on top of "foo-1.1" and the timestamps were from when the files 
were originally created, and now all of the source files that changed are 
older than the object files and the build system does nothing.

Of course, with archives, you don't unpack different versions into the 
same directory, but with a version control system, you'll do it all the 
time, so you really need the system to put on disk the times when those 
files were last put there. If you want to know when the README you've got 
is from (and a whole lot more) "git log README" will tell you, although it 
won't tell you if somebody yesterday changed the README they're 
distributing from some other text to a file that's been sitting on their 
disk untouched for seven years.

	-Daniel
*This .sig left intentionally blank*
--

From: Chris Frey
Date: Saturday, November 29, 2008 - 1:54 am

If this is the important bit, perhaps git-archive could be changed
to create tarballs with file timestamps based on their commit dates.

- Chris

--

From: Stephen R. van den Berg
Date: Saturday, November 29, 2008 - 2:22 am

Based on the principle of least surprise, I'd consider this a rather good
idea.
-- 
Sincerely,
           Stephen R. van den Berg.

To people that say "I could care less" - well, why don't you?
--

From: Thomas Rast
Date: Saturday, November 29, 2008 - 3:16 am

Unless I'm missing something, this would make git-archive rather more
expensive than it is now: Tree objects do not record any timestamps,
so figuring out the last commit that changed a file requires a full
history walk in the worst case[*].  (This is another side-effect of
not versioning files.)  On the other hand, current git-archive's
running time depends only on the size of the tree-ish given, including
all subtrees and blobs.

My unscientific guesstimates on how much work this would be, in a
random (old) linux-2.6 clone:

  $ git rev-parse HEAD
  e013e13bf605b9e6b702adffbe2853cfc60e7806
  $ time git ls-tree -r -t $(git rev-list HEAD~5000..HEAD) >/dev/null

  real    0m1.385s
  user    0m1.164s
  sys     0m0.220s
  $ git rev-list HEAD | wc -l
  117812

So reading (and dumping) all those trees and subtrees incurs a penalty
on the order of 30 seconds.  Compare to the current running time of
git-archive:

  $ time git archive --format=3Dtar HEAD >/dev/null

  real    0m2.790s
  user    0m2.684s
  sys     0m0.072s

Of course, the ratio will keep getting worse as history gets longer.

=2D Thomas

[*] I think to really have a "worst case" here, you need at least one
file in every leaf directory that has not changed since the root
commit, and another that changes in every commit to force the search
to really read every subtree.

=2D-=20
Thomas Rast
trast@{inf,student}.ethz.ch



From: Sitaram Chamarty
Date: Saturday, November 29, 2008 - 6:14 pm

How many people use git-archive and how many times a day do they use
it?  For example, kernel.org seems to put out linux-2.x.y.z.tar.bz2
once every 2 to 7 days.

The overhead of this new option (and certainly it should be an option,
not the default) should be measured not against the old running time,
but against the frequency of usage of the tool.  Look at it on those
time scales, it may not be a big deal.

By all accounts, this overhead will not affect the "giterate" [meaning
git-literate ;-)] people too much.
--

Previous thread: timestamps not git-cloned by jidanni on Thursday, November 27, 2008 - 7:24 pm. (7 messages)

Next thread: [PATCH] gitk: Updated German translation. by Christian Stimming on Friday, November 28, 2008 - 3:46 am. (1 message)