This fixes the race in the last test int t3700.
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
---
I'm basically just copying what git-commit.sh does, but I'll admit to not
quite understanding why it's necessary. It does fix the race in t3700 though.
Applies on top of pu.
cheers,
Kristian
builtin-commit.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/builtin-commit.c b/builtin-commit.c
index 7dc8977..19862df 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -81,6 +81,7 @@ static char *prepare_index(const char **files, const char *prefix)
if (all || also) {
add_files_to_cache(verbose, also ? prefix : NULL, files);
+ refresh_cache(REFRESH_QUIET);
if (write_cache(fd, active_cache, active_nr) || close(fd))
die("unable to write new_index file");
return lock_file.filename;
@@ -110,6 +111,7 @@ static char *prepare_index(const char **files, const char *prefix)
fd = hold_lock_file_for_update(next_index_lock,
git_path("next-index-%d", getpid()), 1);
add_files_to_cache(verbose, prefix, files);
+ refresh_cache(REFRESH_QUIET);
if (write_cache(fd, active_cache, active_nr) || close(fd))
die("unable to write new_index file");
--
1.5.3.4.206.g58ba4
-
Hi, On Fri, 9 Nov 2007, Kristian H
OK, I guess what I was wondering was why write_cache() doesn't write out an up-to-date index. I'll send an updated patch. Do we need a call to refresh_cache() when we update the user cache but commit an index created from read_tree+add_files? I.e. after the add_files_to_index() call on line 97? The shell script doesn't do this, it only runs update-index --refresh for the index that gets committed. Kristian -
Hi, > > On Fri, 9 Nov 2007, Kristian H
No, that's equivalent to "git commit -i <file>". If you just say "git commit <file>", that will create a temporary index initialized to HEAD, add file to that index and the regular (user) index, and then commit the temporary index file. The shell script doesn't refresh the regular index in this case. I think it makes sense to add that, but it will be a subtle difference in behaviour. cheers, Kristian -
Ah, it all makes sense. I should have been more relaxed and
careful when I first read your t3700 patch.
If this were a breakage in racy-git-avoidance code, then
refresh_cache() Kristian added (or "add --refresh" immediately
after "git commit" in the test script) would have been fooled by
the same raciness and wouldn't have changed a thing.
This discussion exposes a problem with add_files_to_cache()
function.
Try this in a clean repository that tracks Makefile:
$ git diff-files --name-only ;# no output
$ touch Makefile
$ git diff-files --name-only
Makefile
$ git add -u
$ git diff-files --name-only
Makefile
$ git add Makefile
$ git diff-files --name-only ;# no output
I think this is a broken behaviour. As long as we are adding
the contents from a path on the filesystem to the corresponding
location in the index, it should sync the stat bits for the
entry in the index with the filesystem to make the entry
stat-clean. IOW, we should not see stat-dirtiness reported
after "add -u".
The funny thing is that add_files_to_cache() run by "git add -u"
calls add_file_to_cache() to add the updated contents for
touched paths, but the latter function is used in the more
explicit "git add Makefile" codepath, without any magic
postprocessing after the function returns to sync the stat
info. IOW, both "add -u" and "add Makefile" ends up calling
add_file_to_cache("Makefile") and I do not see why we are
getting different results.
And add_file_to_cache(), which calls add_file_to_index() on
the_index, does call the fill_stat_cache_info() to sync the stat
information by itself, as it should be. I am still puzzled with
this.
So I think Kristian's two refresh_cache() do fix the issue, but
at the same time I _think_ it is a workaround for broken
add_files_to_cache() behaviour, which is what we should fix.
-
Yup, that's what I expected, and why I couldn't quite understand why the There is some timing involved in this, which may explain the different results you see. On my laptop I can trigger the add_files_to_cache() problem roughly 4 out of 5 times, but on my faster desktop, it can't OK, I'll resend the patch with a better description, and add a refresh call for the user index too, for the git commit <file> case. cheers, Kristian -
Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.
The change meant well, but was not executed quite right. It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".
This was wrong. There are three possible comparison results
between the index and the file in the work tree:
- with lstat(2) we _know_ they are different. E.g. if the
length or the owner in the cached stat information is
different from the length we just obtained from lstat(2), we
can tell the file is modified without looking at the actual
contents.
- with lstat(2) we _know_ they are the same. The same length,
the same owner, the same everything (but this has a twist, as
described below).
- we cannot tell from lstat(2) information alone and need to go
to the filesystem to actually compare.
The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:
$ echo hello >file
$ git add file
$ echo aeiou >file ;# the same length
If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified. The path is called 'racily clean'. We need to
reliably detect racily clean paths are in fact modified.
To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.
That's ...Wouldn't it be much better to now - make it "unsigned int flags" - create a few enums or #define's to make the usage be more readable? just went from subtle to "incredibly non-obvious". I realize that 3 is "silent + assume_racy" and 2 is just "assume_racy", but I may realize that now, and in a month? I'd need to look it up. Linus -
ce_match_stat() can be told:
(1) to ignore CE_VALID bit (used under "assume unchanged" mode)
and perform the stat comparison anyway;
(2) not to perform the contents comparison for racily clean
entries and report mismatch of cached stat information;
using its "option" parameter. Give them symbolic constants.
Similarly, run_diff_files() can be told not to report anything
on removed paths. Also give it a symbolic constant for that.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Fri, 9 Nov 2007, Junio C Hamano wrote:
>>
>> -int run_diff_files(struct rev_info *revs, int silent_on_removed)
>> +int run_diff_files(struct rev_info *revs, int option)
>
> Wouldn't it be much better to now
> - make it "unsigned int flags"
> - create a few enums or #define's to make the usage be more readable?
>
> Because this:
>
>>- run_diff_files(&rev, 0);
>>+ run_diff_files(&rev, 2);
>> - !ie_modified(istate, istate->cache[pos], &st, 1)) {
>> + !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
>
> just went from subtle to "incredibly non-obvious".
That really is true. Apparently I am getting much slower
lately. This is to just introduce the constants and change the
types.
builtin-apply.c | 2 +-
cache.h | 14 ++++++++++----
check-racy.c | 2 +-
diff-lib.c | 16 +++++++++-------
diff.h | 4 +++-
entry.c | 2 +-
read-cache.c | 47 ++++++++++++++++++++++++++++++-----------------
unpack-trees.c | 4 ++--
8 files changed, 57 insertions(+), 34 deletions(-)
diff --git a/builtin-apply.c b/builtin-apply.c
index 5cc90e6..0fff02e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2099,7 +2099,7 @@ static int verify_index_match(struct cache_entry *ce, struct stat *st)
return -1;
return 0;
}
- return ce_match_stat(ce, st, 1);
+ return ce_match_stat(ce, st, ...Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.
The change meant well, but was not executed quite right. It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".
This was wrong. There are three possible comparison results
between the index and the file in the work tree:
- with lstat(2) we _know_ they are different. E.g. if the
length or the owner in the cached stat information is
different from the length we just obtained from lstat(2), we
can tell the file is modified without looking at the actual
contents.
- with lstat(2) we _know_ they are the same. The same length,
the same owner, the same everything (but this has a twist, as
described below).
- we cannot tell from lstat(2) information alone and need to go
to the filesystem to actually compare.
The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:
$ echo hello >file
$ git add file
$ echo aeiou >file ;# the same length
If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified. The path is called 'racily clean'. We need to
reliably detect racily clean paths are in fact modified.
To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.
That's ...