I have a kind of awkward project to work with (~44k files, many binaries).
The normal "git commit", which seem to be more than enough
for anything and anyone else, is a really annoying procedure
in my context. It spend too much time refreshing index and
generating list of the files for the commit message.At first I stopped using git commit -a (doing only update-index),
now I'm about to start using write-tree/commit-tree/update-ref
directly. It helps, but sometimes I really miss -F/-C. It's also
ugly: I can (and almost did) commit an unchanged tree.Is there any simple way to modify git commit for such a workflow?
Failing that, any simple and _fast_ way to find out if the index
is any different from HEAD? (so that I don't produce empty commits).
-
I am not sure what you are trying. Do you mean stat() is slow
Maybe you want "assume unchanged"?
-
incredibly slow. That and the matter of having 44000 files to process
If that is core.ignoreState you mean, than maybe this is what I mean.
I haven't tried it yet (now I wonder myself why I haven't tried it).
But (I'm repeating myself, in <81b0412b0612060235l5d5f93d0hd1aaf34924f7783@mail.gmail.com>)
I do not really understand how it _can_ help: "I ask because it does
not ignore stat info, as the name implies. Because if it would,
there'd be no point of calling lstat at all, wouldn't it?" That last
question was about refresh_cache_entry - it calls lstat
unconditionally.Still, I guess I'll have to try it.
But aside from me trying ignoreState, can anyone help me with that
question regarding checking if the index is any different from HEAD?
Because even on a very brocken filesystem and 40k files in a repo you
sometimes do want to call git-update-index --refresh just to be sure
you haven't missed anything. And than it'll quickly become annoying
flicking ignoreState back and forth.-
Tried. No noticeable difference:
$ git repo-config core.ignorestat true; time gup --refresh
real 0m8.004s
user 0m1.936s
sys 0m5.702s
$ git repo-config core.ignorestat false; time gup --refresh
real 0m7.787s
user 0m1.890s
sys 0m5.703s
$
(that's cygwin).
-
Comparing index and HEAD should be cheap on a system with slow
lstat(), I think, as "git-diff-index --cached HEAD" should just
ignore the working tree altogether. Is that what you want?-
yes, except that it'll compare the whole trees. Could I make it stop
at first mismatch? "-q|--quiet" for git-diff-index perhaps?
It's just not only stat, but also, open, read, mmap (yes, I try to use
it for packs) and close are really slow here as well.-
Its Cygwin/NTFS. lstat() is slow. readdir() is slow. I have the
Yes, basically. The Cygwin/NTFS issues Alex is pointing out are
exactly why git-gui has a "Trust File Modification Timestamp" option
on both a per-repository and global level. My larger repositories
(~10k files) are difficult to work with without that option enabled.--
Shawn.
-
Then maybe "git grep assume.unchanged" would help?
-
Hmm. OK, maybe I should have answered "No"" to your first question.
I keep looking at the assume unchanaged feature of update-index,
but refuse to use it because I'm a lazy guy who will forget to tell
the index a file has been modified. Consequently I'm going to miss
a change during a commit.What may help (and without using assume unchanged) is:
* skip the `update-index --refresh` part of git-status/git-commit
* skip the status template in COMMIT_MSG when using the editorAs Git will still at least make sure a `commit -a` includes
everything that is dirty.Files whose modification dates may have been messed with (but
whose content are unchanged) will just go through expensive SHA1
computation to arrive at the same value, which is fine.Users skipping the first part are doing so under the assumption that
their modification dates are usually always correct, and that then
they aren't the SHA1 computation of a handful of files is cheap
compared to stat'ing the entire set of files.Users skipping the second part are doing so under the assumption
that knowing the names of the files they are committing doesn't
really improve their odds of writing a good commit message.--
Shawn.
-
The second part is not about a good commit message but more
about a path that should have been updated but forgotten (the
same mistake you would be likely to make and that is the reason
assume-unchanged is not good for you).I do not mind too much if you added a new --quick option to "git
commit" for this rather specialized need.-
Just to be clear, I'm not trying to blame Cygwin here.
Windows' dir command is slow. Windows Explorer is slow while
browsing directories. Eclipse chugs hard while doing any directory
scans (it normally runs very fast if its not rescanning the entire
directory structure). The drive is just plain slow.Yea, I know, get a faster disk... but some bean counters don't
believe that a $50 more expensive disk could ever save enough time
to warrant the extra $50 captial expenditure...I spend at least an hour a week waiting for enough IO to finish so
that the mouse pointer will move again. *sigh*--
Shawn.
-
before buying any new hardware, you could easily imagine the
following scenario (I'm also "stuck" with windows, so it's an idea
I've been toying around for a week or so).There're virtualizers around, on which networking capabilities can
be activated. And we could easily create a vm with linux+git
inside, using ext2/ext3/ext4 fs virtual disks (you'd benefit from
windows cache actually...)example: YTech_Subversion_Appliance_v1.1 (ubuntu + subversion).
I've no prototype yet, but I've 2 scenario possible:
1) use vmplayer and a minimal uclibc initramfs with git onboard
2) use qemu+kqemu and a similar mini-distro (but right now networking
is an issue on windows hosts: I'm exploring tunneling)The 1st scenario is "easy". And I start to prefer this idea over
even mingw porting of git (I tried and it's hard, really).But again, maybe jgit would be a better universal solution.
--
Christian
-
I think this is a very common scenario costing hideous amounts of
money around the globe.If you have lot's of files in a folder, don't even think of
accidentally touching those folders in Windows Explorer, if you do -
keep Process Explorer or similar ready. I've ended up using (even w/o
Cygwin) scripts, automatic compressing and even a database functioning
as directory cache - basically creating accessibility layers for aVery interesting! Have you a time-frame for this? Maybe even
something for the GIT faq/wiki. Please keep us informed.
-
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| David Miller | Slow DOWN, please!!! |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Natalie Protasevich | [BUG] New Kernel Bugs |
