Hi folks,
I'm using git to store some generated files, as well as their sources.
(This is in the context of Debian package development, where entire
upstream release tarballs are injected into an upstream branch, with
Debian releases merging the upstream branch, and adding the Debian
packaging files.)
The upstream release tarballs contains files such as
- yacc/lex code, and the corresponding generated sources
- Docbook/XML code, and corresponding HTML/PDF documentation
These are provided by upstream so that end users don't need these tools
installed (particularly docbook, since the toolchain is so flaky on
different systems). However, the fact that git isn't storing the
mtime of the files confuses make, so it then tries to regenerate these
(already up-to-date) files, and fails in the process since the tools
aren't available.
Would it be possible for git to store the mtime of files in the tree?
This would make it possible to do this type of work in git, since it's
currently a bit random as to whether it works or not. This only
started when I upgraded to an amd64 architecture from powerpc32,
I guess it's maybe using high-resolution timestamps.
Thanks,
Roger
P.S. The repo I'm working on here is at
git://git.debian.org/git/collab-maint/gutenprint.git
--=20
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
Hi, This subject comes up from time to time, but the answer always stays the same: No. The trees are purely defined by their content, and that's by design. If you do not want to regenerate files that are already up-to-date, you need multiple checkouts of the same repository. Thanks, Matthias --
Or a make-rule that touches the files you know are up to date. Since you control the build environment, that's probably the simplest solution. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
This is the approach I'm currently taking, since it's simple and doesn't require any tool changes. Ideally, I'd like to avoid such hackiness, though. I understand all the arguments I've seen in favour of not using the mtime of the files when checking out. They make sense. However, in some situations (such as this), they do not--git is breaking something that was previously working. In my case, I'm injecting *release tarballs* into git, and the timestamps on the files really do matter. Regarding issues with branching and branch switching, I always do builds from clean in this case. If an option was added to git-checkout to restore mtimes, it need not be the default, but git could record them on commit and then restore them if asked /explicitly/. For this, and some other uses I have in mind for git, it would be great if git could store some more components of the inode metadata in the tree, such as: - mtime - user - group - full permissions - and also allow storage of the full range of file types (i.e. block, character, pipe, etc.) This would allow git to be used as the basis for a complete functional versioned filesystem (which I'd like to use for my lightweight virtualisation tool, schroot, which currently uses LVM snapshots for this purpose). Regards, Roger --=20 .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
You can. The way to ask explicitly right now is to write hooks that implement the functionality you want. It's not as easy as setting a config value, but since you'd have to write the patch to do that anyways (and it's likely it will get dropped), you'd be better off writing some hooks and submitting them as contrib I believe someone else has done some work along the way of turning git into complete-with-metadata backupsystem before. Google might prove beneficial. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Although now that I come to think of it, storing "user" and "group" made it near-enough totally useless for anything a user had created as the repos hardly ever could be shared. I'll say it again; Hooks can be written to handle this. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Hi, No, since this would wreck people's workflows: - compile in branch "master" - switch to branch "topic" - compile - switch back to branch "master" Now you _want_ files in "master" that were changed in "topic" to be recompiled. This is a quite common case. However, nothing hinders you having your own ".gitmtimes" in the tree, and a script people can use as a hook, which applies the mtimes to the files. Ciao, Dscho --
I don't think it would be done as in core change at all, or at least soon. You can use Metastore, or some custom clean/smudge gitattribute filters with something like Metastore (or etckeeper) to store extra metadata about files in your tree. See http://git.or.cz/gitwiki/InterfacesFrontendsAndTools -- Jakub Narebski Poland ShadeHawk on #git --
On Wed, 2008-11-19 at 11:37 +0000, Roger Leigh wrote: Unless I'm mistaken, I was under the impression that the reason why git doesn't, and shouldn't do this is _because_ it confuses make. Suppose you've got two branches, and you check out the other branch, resulting in changes in 3 files. Should git go and modify the mtime for every single file, and remove any file that isn't part of the repo (Such as generated object files)? If it modifies the dates on every file, but doesn't remove the generated object files, how does make handle that, as it'll likely generate some of the object files, but not all of them. If it doesn't, but touches the files that changed, and the dates are now older than the corresponding object files, make would fail to recompile the project properly! The only way this could work is if you never switch branches, which is quite limiting for git, and never check out an older revision, which is quite limiting for the RCS systems in general. You should probably fix your build script, or add a hook script that sets the dates on the files in question manually, but the former solution would be much better. --
Not for docbook/flex/yacc stuff, which is what was causing trouble. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Hi, Only if you _allow_ the problem that makes ccache necessary. Which we don't. Ciao, Dscho P.S.: reminds me -- once again -- of the complicator's glove. --
You do, for some definition of "problem". make git checkout whatever make git checkout where-you-were-before make # <--- this one The last make is correct with plain git and plain make, but it can be slow. With a cache in your build system, it can just reuse the objects created during the first "make". It doesn't change correctness, only performance. -- Matthieu --
beside the obvious answer it comes back often as a request, it is possible in theory to create a shell script which, for each file present in the sandbox in the current branch, would find the mtime of the last commit on that file (quite an expensive operation) and apply it. I had a need for this once, then lost interest since using git as it is is so much better than trying to mimic behaviour of old scm tools and makefiles. You should store mostly content of source files. You should do a make in your first cloned repo at least once before committing anything to the repo. That's what I did and I saved days... -- Christian -- http://detaolb.sourceforge.net/, a linux distribution for Qemu with Git inside ! --
Hi, I had a need like this, too, and solved it by teaching the build process to fall back to generated files if the tool to generate them was not available. Ciao, Dscho --
Surely this is only expensive because you're not already storing the information in the tree; if it was there, it would be (relatively) cheap? You could even compare the old and new trees to see if you Except in this case I'm storing the content of *tarballs* (along with pristine-tar). I'm committing exactly what's in the tarball with no changes (this is a requirement). I can't change the source prior to commit. Regards, Roger --=20 .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
No, it's because git is *snapshot* based and doesn't care about anything but contents. Storing filestate information in the tree would be a backwards incompatible change that would require a major version change. Caring about meta-data the way you mean it would mean that git add foo.c; git commit -m "kapooie"; touch foo.c; git status would show "foo.c" as modified. How sane is that? Or should we introduce a new concept for altered metadata only? "metafied"? So what do we do when the next user whizzes along and wants support for full acl's? And what do we do when Windows (or some other bizarre system) add some sort of extension so we have to have different types of ACL support on both We already do that by matching the SHA1 hash for the index entries. Only content that is actually different between to branches are altered upon checkout (which is why it's so damn fast when you're using topic- branches properly). -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
It's not strictly true that it's only caring about contents. The contents are of course in the blobs, but the tree is already effectively storing inode data, since it's a directory of filenames/subtrees, just one that only cares to store the permissions part of the total inode data. I understand that git stored the permissions tacked onto the hash; would it be feasable to tack on the other bits as well. If I understand correctly, it's binary encoded in the pack format, and that would require updating the format to hold the additional I've never come close to suggesting we do anything so insane. What I am suggesting is that on add/commit, the inode metadata be recorded in the tree (like we already store perms), so that it can be (**optionally**) reused/restored on checkout. Whether it's stored in the tree or not is a separate concern from whether to *use* it or not. For most situations, it won't be useful, as has been made quite clear from all of the replies, and I don't disagree with this. However, for some, the ability to have this information to hand to make use of would be invaluable. There have been quite a few suggestions to look into using hooks, and I'll investigate this. However, I do have some concerns about *where* I would store this "extended tree" data, since it is implicitly tied to a single tree object, and I wouldn't want to store it directly as content. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. --
No, that would break backwards compatibility with cross-repo Then write a hook for it. You agree that for most users this will be totally insane, and yet you request that it's added in a place where everyone will have to pay the performance/diskspace penalty for it but only a handful will get any benefits. That's patently absurd. Especially since there are such easy workarounds that you can put in Store it as a blob targeted by a lightweight tag named "metadata.$sha1" and you'll have the easiest time in the world when writing the hooks. Also, the tags won't be propagated by default, which is a good thing since your timestamps/uid's whatever almost certainly will not work well on other developers repositories. That's what I'd do anyways. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
The cost is tiny. The extra space would be smaller than a single And yet the fact that it won't propagate makes it totally useless: all the other people using the repo won't get the extra metadata that will prevent build failures. Having the extra data locally is nice, but not exactly what I'd call a solution. The whole point of what I want is to have it as an integral part of the repo. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. --
Easiest way is typically something like this in the makefile:
docbook_version = $(shell docbook2man --version 2>/dev/null)
ifneq "$docbook_version",""
mymanpage.1:
## Real docbook build rules here
else
mymanpage.1:
if [ -e $@ ]; then \
echo "No 'docbook' installed, using pregenerated man
pages" >&2 ; \
else \
echo "Pregenerated manpages are missing and no docbook
found!" >&2 ; \
exit 1 ; \
fi
endif
Such stuff will take an order of magnitude less time than trying to
patch GIT to preserve metadata that most projects don't want
preserved. You may also find it's easier to just comment out the
documentation build rules if you are always guaranteeing that the docs
have been compiled.
Cheers,
Kyle Moffett
--
Then make it signed tags and ship them along. Or do this properly and simply put in your buildsystem that some targets never need to be rebuilt. That's (by far) the simplest solution. On a sidenote, I fail to see how the pre-generated stuff can avoid getting updated unless also the sources for that stuff was updated, in which case either of the following is true: a) You really do need to rebuild, because upstream fucked up. b) The pre-generated stuff should *also* be checked out and get new timestamps. Either way, to me it sounds like your buildsystem needs some love. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
No, the cost is huge. The SHA-1 for the tree with _exactly the same contents_ will be different, just because f.e you applied a patch one second earlier than I did, and that's completely insane. Git is purely a content tracker as has been said numerous times on this mailing list, and that is for good reasons. If the tree entries change just because some timestamps are different, the CPU time needed to generate a diff will grow by a big amount of time. Atempts to add additional information to the basic git objects have failed several times, and yours will probably fail too since there are numerous reasons why you do _not_ want a timestamp in the tree _and_ there are several workarounds for your problem, which at --
>>>>> "Roger" == Roger Leigh <rleigh@codelibre.net> writes: Roger> Except in this case I'm storing the content of *tarballs* (along with Roger> pristine-tar). I'm committing exactly what's in the tarball with Roger> no changes (this is a requirement). I can't change the source prior Roger> to commit. If you're not doing distributed source code development, why are you using git? It's hard to be angry at a screwdriver for not pounding in nails properly. Sounds like you want rsync or something. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion --
Err, it *is* being used for distributed development... of Debian packaging. We track upstream releases on one branch, merge this periodically onto the master branch containing the Debian packaging infrastructure, and also have other bits such as a continually-rebased patches branch to generate quilt patch series I think not! Perhaps if you read my original mail, you might understand the reasoning behind this (whether you consider that valid reasoning or not is another matter). Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. --
Can you store the tarballs in the repository, instead of the contents of the tarballs? The tarballs will contain the dates you want, and you can obviously get tar to set the timestamps the way you want. (Then you add a higher-level Makefile that knows how to unpack the tarball to a directory, maintaining the timestamps, patch anything you're changing, and run make in that directory.) That is to say, from your perspective, the sources include the upstream distributed tarballs, but the individual files in upstream tarballs aren't source files for you, since you can't (by policy) modify them (within the pristine tarball). If you want to change the sources of the packaged project, you add a patch file to do it, rather than simply changing the source (which, as you say, you're required not to do). Git really wants to store the inputs to your workflow, each of which might change independently. That's why the files in your work tree have timestamps based on when they came to be in your work tree (get set to the current time whenever git puts different content there, and leaves them unchanged if their contents don't change when moving from commit to commit). The "sources" in your workflow are a different set of files from the sources in the project, and git really wants *your* repository to match *your* workflow and not the workflow of the upstream project, when you're acting as a packager rather than an upstream developer. -Daniel *This .sig left intentionally blank* --
Note that pristine-tar will work no matter what the mtimes or other file metadata are, none of that affects generation of deltas or regeneration of tarballs from them. Also, the source you commit does not really have to be identical to what's in the tarball. (Despite what it may say in the man page. ;-) A larger delta will be generated if something is different. So, three possible approaches: 1. Run make or whatever you need to do before running pristine-tar, and put up with a larger delta. 2. Before building, you could use pristine-tar to extract the original tarball, and then have a program examine that tarball, and reset the mtimes in your build tree to match the mtimes of files in it. (Or you could duplicate the info with metastore -m, which could be restored quicker.) 3. Store uncompressed tarballs in git, so that they will pack efficiently, and use pristine-gz to regenerate the pristine .tar.gz. Only mentioned because this could be more space efficient than option #1, if the pristine-tar deltas get too large. --=20 see shy jo
I don't get it. Why are end users running make in the first place? Why aren't those in the build-dependencies? --=20 martin | http://madduck.net/ | http://two.sentenc.es/ =20 it is better to have loft and lost than to never have loft at all. -- groucho marx =20 spamtraps: madduck.bogus@madduck.net
By end user, I mean person downloading and building the sources. They are optional build depdendencies. They are provided pre-built, and won't be rebuilt unless they get outdated. In the release tarball, the timestamps are correct, ensuring this never happens. When checking out with git, the timestamps are incorrect, and it attempts to rebuild something that's *already built*. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. --
Hi, I'll try just one more time. Why don't you teach your build process to check if the generated files can be generated, and if not, fall back to the committed ones? Ciao, Dscho --
Well, it's definitely not a good idea to try rebuilding when the tools aren't available, and I'll update the Makefiles to only attempt a rebuild when this is the case. So yes, making the build a bit more intelligent is definitely something to do. However, this is really a separate issue, since the repo dates back eight years, and I don't want to break older stuff. This will only fix things for the future. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. --
I know you will hate me, but I think the solution here is to fix the toolchain and make those build dependencies required. --=20 martin | http://madduck.net/ | http://two.sentenc.es/ =20 "first get your facts; then you can distort them at your leisure." -- mark twain =20 spamtraps: madduck.bogus@madduck.net
>>>>> "martin" == martin f krafft <madduck@madduck.net> writes: martin> I know you will hate me, but I think the solution here is to martin> fix the toolchain and make those build dependencies required. I agree with martin here. Your planned solution of not rebuilding the files if the tools are not present may lead to serious problems if the user modifies the source files and happens not to have the tools around. Moreover, requiring the build dependencies would allow you to drop the generated files from the repository and rebuild them in your packaging (source or binary) process. Sam -- Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/ --
