In a repository under a mounted Windows directory (ntfs) I get this error: $ git status fatal: unable to create '.git/index.lock': Read-only file system $ Is this correct? there exist a workaround? I just need to know if current working directory is clean and report back to qgit user, so read-only access would be ok for me. All other commands commonly used to browse a repository seems to work well, without pretending to write stuff. Thanks Marco -
"git status" is kind of strange. It's really technically the engine behind the messages for "git commit": both in a very real technical sense (it's _literally_ the same script:"git-commit.sh" is not just "git-commit", but also "git-status"), but also in a very real historical "that is what the code was written for". And you wouldn't think that it really needs write access, and you'd be largely correct, EXCEPT for the fact that git status actually does a refresh of the index, to make sure that we don't claim that something is dirty just because somebody has touched the file. IOW, there's an implicit "git update-index --refresh" as part of calculating the status, and that's the thing that wants to lock the index "git status" doesn't "pretend" to write stuff. It really does. You *can* just use "git-runstatus" instead. That's the command that actually does all the heavy lifting. But you can see the difference by doing this: touch Makefile git runstatus vs touch Makefile git status Notice how the "runstatus" one claims that Makefile is "modified:". That's exactly because it doesn't do the index refresh. Linus -
Sorry, perhaps it is a silly question, but why git index should be different after just touching a file? IOW is it not possible that "git update-index --refresh" exists without modifing the index, just because ther's nothing to modify? So, finally, could be possible making "git status" taking the lock only _after_ has checked there's something new to write to the index? So to avoid write access in most cases ? (expecially with repo mounted on a read-only fs) Thanks Marco -
It uses the information from lstat(2) that is stored in the
index and taken from the filesystem -- if they are different
(touch changes it as you know, but so does "vi that-file") they
are reported as "modified".
git checkout -- Makefile
git diff-files Makefile
touch Makefile
git diff-files Makefile
The 'refresh' operation is to update the lstat(2) information in
the index for paths whose contents actually match what is in the
index (and it does not do anything else -- most notably paths
whose contents are different between the working tree and the
index are left as-is).
-
Running refresh internally in runstatus without writing the result out _might_ be an option, but that would largely be a hack to only help qgit. Other shapes of "git status", such as "git status <filename>" and "git status -a", still need to perform the same index manipulation as "git commit" with the same parameters before calling git-runstatus, and at that point the extra "internal refresh" in runstatus is an unwelcome extra cycle. -
I might be overlooking something, but couldn't that updated index be saved elsewhere? And subcommands be pointed at that, of course. Morten -
You should be able to set the GIT_INDEX_FILE environtment variable to point the index somewhere else than $GIT_DIR/index, so it's not located on a read-only filesystem. See the git(7) man page. - Ted -
Yes, I agree. If I modify qgit in running 'git runstatus' as a fallback in case 'git status' exits with an error (without checking what kind of error exactly) could be an acceptable path or could hide subtle side-effects? I have no the knowledge to answer this by hand. Thanks Marco -
It's probably better for you to just - run "git update-index --refresh" and don't care about the exit value - run "git runstatus" unconditionally which should basically get you something working. HOWEVER, it's also quite possible that "git-commit.sh" should just do this on its own. If the update-index fails, we really only care if we literally use the index later to *write* something, ie the commit case. For just "git status", maybe we should just silently ignore the error.. Linus -
Hi, So, why don't you just do a git diff --name-only HEAD and check for an empty output??? No need for a patch to Git (so late in the -rc phase), or backwards incompatibility... Ciao, Dscho -
It seems to have the same issues of 'git runstatus' in case of ntfs filesystems, so I would prefer, eventually, use 'git runstatus' that Well, it's a _new_ option so I fail to see backwards incompatibility. Perhaps you are referring to qgit backward incompatibility, but in any case a new version of qgit is due to fix a parsing bug that shows after a modification of 'git rev-list' output occurred in git 1.5 development. Marco -
Hi, Which issues? That the lstat data are not equal on Cygwin and Linux? The patch does not help here. Maybe a patch to Linux' ntfs driver would, but I fail to see how Git could possibly help here. If git-diff is trying to write files, _that_ would be a bug. As for your use of git-status: I think it is wrong. You said you want to check if the working directory is clean. Then just do that, and do not try You need a new version of _Git_ if you use that option. And if you can do without depending on a newer Git, it is _bad_ to do it nevertheless. Ciao, Dscho -
Well, I tested the patch and indeed it helps a lot ;-) It's correct that checking Linux lstat against cygwin one (stored in index) gives different results, but it's now where the patch makes the difference rechecking all the files (in memory) to see if are really That's a true point. Altough if git 1.5 ships _without_ '--refresh' option in 'git runstatus' for a porcelain tool point of view it means *do forget* that option until next major release. There's no point in adding the feature one day after git 1.5 is out; qgit will not use that feature anyway for next months. Marco -
I could opt for shipping qgit 1.5.5 _without_ using '--refresh' and then ship, as example in a month, qgit 1.5.6 that uses the feaure. But I can do this _only_ if git 1.5 has it. -
Hi, Not really. The thing is, git-status does a lot more than what you need. And what you need is _only_ what "git diff --name-only HEAD" does already! It _also_ checks the index, it _also_ only checks the files with different stat information, but it does _not_ try to update the index and prepare a message to be displayed when committing. So, what is the big problem about accepting that patching git-status for one obscure use is wrong, wrong, wrong, when git-diff already does what is needed??? Ciao, Dscho -
Probably I'm doing something wrong, but that's how working dir
detection is currently implemented in qgit:
void Git::getDiffIndex() {
QString status;
if (!run("git status", &status)) // git status refreshes the index,
run as first
return;
if (!run("git diff-index HEAD", &_wd.diffIndex))
return;
// check for files already updated in cache, we will
// save this information in status third field
if (!run("git diff-index --cached HEAD", &_wd.diffIndexCached))
return;
...... other stuff .....
The first call to git-status has been there for ages and with the only
goal to refesh the index so to avoid stale data in following 'git
diff-index' calls.
If I have understood correctly you suggest to remove that call because
useless? And rely 'git diff-index' info directly. Of course if there
are no side effects I'will be happy to drop the call, but I'm not sure
it's the safest way to go.
Another option would be to accept a broken working dir detection in
these corner cases. It's a realistic option and probably the best.
Indeed also subsitute 'git status' with 'git runstatus' as long as I
get back _all_ the repo files it seems to me a lesser option, IMHO
it's better failing with empty case than have a big flow of incorrect
data.
Marco
-
Because git-status itself is conceptually a read-only operation, and having it barf on a read-only file system is justifiably a bug. But I agree that attempting to fix it now is probably just too risky not to compromize the 1.5.0 release. Better leave it as a known issue for v1.5.0 and fix it later... especially if there is a possible workaround in the mean time. Nicolas -
I do not 100% agree that it is conceptually a read-only operation. -
It is. It's the technical issue that makes it not so. But when a user asks for a status, he's clearly not expecting to _write_ anything, even less for the command to fail if the file system is read-only. This is like if a file system driver refused to open a file when the file system is mounted read-only just because it cannot update the file's atime on disk. Just like the atime example, we may refresh the index while at it when preparing the status results. This is a valid technical concern. But it for sure should not be mandatory for the command to succeed if the file system is read-only. Nicolas -
I do not think so. It is a workflow issue that user indicates the cache cleanliness information does not matter anymore. Is it wrong for "git-status" to be losing the cache cleanliness information? The intended audience of that program is those who are about to make a commit in the repository, as they are asking "what would I be committing?" Up to that point, they may have cared about the reminder they get from "git diff" that they edited a file and then ended up reverting the whole edit they did to that file (I find that empty diff from "git diff" often very useful, although I felt "Huh?" when I was new to git). But when they ask "git status", they care more about the real change, and at that point (since they feel they may be ready to make a commit -- and that is the whole point of running "git-status") they do want to lose the cache cleanliness information. So "git-status" to be read-write application to discard the cache-cleanliness information is probably a good thing. -
You're making assumption about work flows and using that to justify I don't dispute that. But git-status should certainly not be restricted It is... when the file system lets you write. Like I said this is a technically good thing to do. But a command that is called "status" should provide a "status" even if the file system is read-only nevertheless. The index updating business that is done behind the scene is and should be an opportunistic optimization, but it should not prevent status reporting. It is pretty expected that a "commit" command would fail if the file system is ro, but not a "status" command. And this is true irrespectively of whatever workflow you might be most likely to use the "status" command for. Nicolas -
It really isn't. It's not even a "technical issue". It's a fundamental optimization. Sure, you can call optimizations just "technical issues", but the fact is, it's one of the things that makes git so _usable_ on large archives. At some point, an "optimization" is no longer just about making things slightly faster, it's about something much bigger, and has real semantic meaning. So the fact is, "git status" _needs_ to refresh the index. Because if it doesn't, you'll see every file that doesn't match the index as "dirty", and that is not just a "technical issue". And yes, doing an "internal" refresh, like Junio's patch does, hides the issue, but it hides it BY MAKING THE OPTIMIZATION POINTLESS! I suspect Marco is testing some reasonably small git archive. With something like git itself, with less than a thousand files (and most of them fairly small, so rehashing them all is quick), the optimization may _feel_ like just a small technical detail. Now, try the same thing on the Linux kernel or somethign similar, especially with cold caches or not a huge amount of memory. That "technical issue" is what makes "git status" take less than a second for me, and only a bit longer if things aren't cached - because we don't actually have to read all the file data. Now, it so happens that _if_ things are cached, at least under Linux, cached IO is so _incredibly_ fast that you won't even realize how expensive an operation you missed. I can SHA1 every file in the kernel archive (21432 files right now - 8 million LOC, and 230MB of data) in less than a couple of seconds. But that's only because it's all cached for me anyway, because I tend to run with lots of RAM, and I do things like "git grep so-and-so" which brings it all into cache. But try the same thing without caches. Here's something you can do under linux: sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" git read-tree HEAD time git update-index --refresh and it takes me *40* seconds. ...
There is just a semantic issue that you seem to overlook completely. According to the dictionarry, "status" is a synonym to a "state". It is _not_ an action. So, from a _user_ perspective, the git-status command should give back a "status". Of _course_ the user will benefit from the index updating business, but as important as this update might be (and I do agree that it is fundamental for GIT's performance), this is still a by-product of the "status" command. Therefore, the fact that the index isn't writable should not prevent git-status from providing the very result for which its name was chosen. The index might as well be brought up to date on disk the next time the file system is writable. Nicolas -
I think a one paragraph summary of your argument is: - index is a good thing -- it is what makes the difference between usable and unusable. - git-status needs to refresh the index in order to do its thing efficiently and usably _anyway_, so once it spends cycles to do so, it is senseless not to write the refreshed index out when it can. I do not think anybody disputes that in a repository with 20k+ paths, it is sensible to leave the index stat-dirty for all paths. But I think your example read-tree HEAD misses the point by stressing the importance of index too much. Index is important for the usability and I do not think anybody is disputing it. The thing is, nobody switches the index that way without running "update-index --refresh" afterwards. Normal people would use git-reset to switch to a different tree object, and the command does that for you. If you are a hardcore, you would know to use "read-tree -m HEAD" at least to avoid making paths unnecessarily stat-dirty. Your example, while it is valid and demonstrates why the index is a good thing very well, is simply not part of a normal workflow and not very relevant when discussing the performance ramifications of what state "git-status" should leave the index in. When I said "calling 'update-index --refresh' in git-status loses stat-dirtiness information", I was certainly _NOT_ talking about losing the information that 20k+ paths used to be stat-dirty because the user did "read-tree HEAD" earlier. At least for me, it is very normal to do something like this. * start from a clean index. * edit cache.h, diff.h, and diff-lib.c. * stop, think, and realize that my earlier edit to change one function prototype in diff.h was not needed, and revert the change to that line still in the editor. * fix things up further by editing other files. And then, I would run "git diff" to see where I am. I still remember that I touched diff.h and I also remember that I once changed ...
Indeed. Except that `git-update-index --refresh` is itself not very fast on Cygwin+NTFS and large projects (about the size of the kernel). So git-status is a real slouch there. Not running `git update-index --refresh` saves at least a couple of seconds. This is why git-gui lets you disable the refresh, and is part of the reason why it computes the status on its own by diff-index, Yes. Which is why if git-gui finds a file that has an empty diff, but that was reported as modified by diff-files, it tells the user its about to go waste a few seconds running `update-index --refresh`, then does so. In practice I've found it rare that a file is dirty in the index, but is not actually modified. The typical culprit appears to actually be the virus scanner on a Windows system. For some reason it feels a need to modify some random XML 'source' files that are tracked by Git. Out of 30,000 files it likes to modify about 100. Not only that, but I think we can do much better with git-runstatus than we do now. If we scan the working directory (to search for untracked files), and we walk the index in parallel, we can update the index with new stat data if necessary. Of course that doesn't matter much on Linux; its VFS operations don't take hours. -- Shawn. -
Hi, Just to fuel the fire even more: Does it make _sense_ running git-status when you cannot write? I mean, the only reasonable use cases to ask git-status (even interpreting it in the "state" sense you are proposing), is when you are _working_ on the files. Which you cannot do without write access. BTW I was not aware that "git diff --name-only HEAD" would not check if the file is differing or not, but even then, it is arguably the right thing for qgit to show what the index' idea of the status is. Ciao, Dscho -
