(In response to the earlier thread on the git 'notes' feature/idea: "[PATCH 00/15] git-note: A mechanism for providing free-form after-the-fact annotations on commits") Ok, this is getting too big for me to ship all in one patch series, so I'm gonna have to split it up. Right now I'm thinking three patch series, but as I'm not near finished yet, that may change. The three patch series I'm talking about are: 1. Refactoring the git tag object. This lays down part of the groundwork needed to support the second incarnation of git 'notes'. This is all ready to go and should follow shortly after this mail. 2. Introducing soft references (softrefs). Softrefs is the general mechanism behind the "reverse mapping" needed to support 'notes', as discussed in the previous thread. I'm still working on this patch series, and it will hopefully be ready soon-ish. 3. Reimplementing git 'notes' on top of (1) and (2). This should be fairly easy (once (2) is done), as it's mostly a matter of building a simple porcelain on top of (1) and (2). Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
This patch series implements part of the ground work for the 'notes'
feature discussed earlier in the thread "[PATCH 00/15] git-note: A
mechanism for providing free-form after-the-fact annotations on commits".
The following patches refactors the tag object by:
1. Making the "tag" header optional
2. Introducing a new optional "keywords" header
3. Making the "tagger" header mandatory as far as possible
4. Do better and more thorough verification of tag objects
Unfortunately the first patch in the series is bigger than I would have
liked, but I couldn't find an easy way to split it up.
Here's the shortlog:
Johan Herland (6):
Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header
git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string
git-fsck: Do thorough verification of tag objects.
Documentation/git-mktag: Document the changes in tag object structure
git-mktag tests: Fix and expand the mktag tests according to the new tag object structure
Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object
Documentation/git-mktag.txt | 42 +++++--
builtin-fsck.c | 35 ++++++
builtin-log.c | 2 +-
mktag.c | 148 +++++-------------------
t/t3800-mktag.sh | 204 ++++++++++++++++++++++++++++++---
tag.c | 266 +++++++++++++++++++++++++++++++++++--------
tag.h | 4 +-
7 files changed, 507 insertions(+), 194 deletions(-)
Have fun!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
-
In order to support ref-less tag objects (aka. 'notes'), we want to do some changes to the tag object structure. The new structure implemented in this patch is backward-compatible in the way that all existing tag objects (valid in the old implementation) will remain valid in the new implementation. The following changes are done: 1. Make the "tag" header optional. The "tag" header contains the tag name, which is optional for 'notes'. The new semantics for the "tag" header are as follows: The tag header _must_ be given for signed tags (this is already enforced by git-tag.sh). When the tag header is not given, its value defaults to the empty string. 2. Introduce a new optional "keywords" header. The "keywords" header is a comma-separated list of free-form values. However, two certain value have special meaning: "tag" and "note": When the "keywords" header is missing, its default value is set to "tag" if a "tag" header is present; else the default "keywords" value is set to "note". The "keywords" header is meant to be used by porcelains for classifying different types of tag objects. This classification may then be used to filter tag objects in the presentation layer (e.g. by implementing extra filter options to --decorate, etc.) 3. Make the "tagger" header mandatory. This header has already for a long time been "de facto" mandatory, in that it is automatically generated by git-tag.sh. This patch verifies the existence of the header when creating tag objects. However, since there exists old tags without the "tagger" header, the verification should not be done when parsing tag objects. 4. Consolidate the parsing and verification of tag objects. Currently parsing is done in tag.c:parse_tag_buffer(), and verification is done with a very similar piece of code in mktag.c:verify_tag(). This patch unifies the parsing and verification of tag objects in a new function parse_and_verify_tag_buffer() which is then called ...
Why must signed tags have a tag header? Will notes optionally have a tag header? -
The purpose of signing a tag is to cryptographically verify the thing pointed at by the tag. But you also want to protect the tag itself. In order to make it harder for someone to rename a signed tag (thereby opening the door to replacing it with a different - possibly signed - malicious tag), you want to include the tag name in the signed data. This allows us to verify that the tag ref (as stored in '.git/refs') is identical to the tag name stored inside the signed object. Yes, 'notes' will optionally have a "tag" header. When I originally designed notes, I didn't think anybody would want to name their notes, but Linus requested it, and there's no technical argument against it. Note that if you name your note, and put a ref to it (under '.git/refs'), there's technically no distinction between a tag object and a note object, except what you choose to put in the "keywords" header, of course. Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
This is a consequence of making the "tag" header in tag objects optional. Signed-off-by: Johan Herland <johan@herland.net> --- builtin-log.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/builtin-log.c b/builtin-log.c index 3744712..1a0f111 100644 --- a/builtin-log.c +++ b/builtin-log.c @@ -181,7 +181,7 @@ int cmd_show(int argc, const char **argv, const char *prefix) printf("%stag %s%s\n\n", diff_get_color(rev.diffopt.color_diff, DIFF_COMMIT), - t->tag, + *(t->tag) ? t->tag : name, diff_get_color(rev.diffopt.color_diff, DIFF_RESET)); ret = show_object(o->sha1, 1); -- 1.5.2 -
Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.
Signed-off-by: Johan Herland <johan@herland.net>
---
builtin-fsck.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/builtin-fsck.c b/builtin-fsck.c
index cbbcaf0..a8914ae 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
static int fsck_tag(struct tag *tag)
{
struct object *tagged = tag->tagged;
+ enum object_type type;
+ unsigned long size;
+ char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
+ if (!data)
+ return error("Could not read tag %s", sha1_to_hex(tag->object.sha1));
+ if (type != OBJ_TAG) {
+ free(data);
+ return error("Internal error: Tag %s not a tag", sha1_to_hex(tag->object.sha1));
+ }
+ if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* Thoroughly verify tag object */
+ free(data);
+ return error("Tag %s failed thorough tag object verification", sha1_to_hex(tag->object.sha1));
+ }
+ free(data);
if (!tagged) {
return objerror(&tag->object, "could not load tagged object");
--
1.5.2
-
The objerror() function prints the sha1 and object type, I think this one should be used instead of error() here. -
Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.
Signed-off-by: Johan Herland <johan@herland.net>
---
Of course, you're right. Like this, I hope:
builtin-fsck.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/builtin-fsck.c b/builtin-fsck.c
index cbbcaf0..71a5fd5 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
static int fsck_tag(struct tag *tag)
{
struct object *tagged = tag->tagged;
+ enum object_type type;
+ unsigned long size;
+ char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
+ if (!data)
+ return objerror(&tag->object, "could not read tag");
+ if (type != OBJ_TAG) {
+ free(data);
+ return objerror(&tag->object, "not a tag (internal error)");
+ }
+ if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* thoroughly verify tag object */
+ free(data);
+ return objerror(&tag->object, "failed thorough tag object verification");
+ }
+ free(data);
if (!tagged) {
return objerror(&tag->object, "could not load tagged object");
--
1.5.2
-
The tagger field was introduced mid July 2005; any repository with a tag object older than that would now get non-zero exit from fsck. This won't practically be problem in newer repositories, but it is somewhat annoying. Perhaps do this only under the new -v option to git-fsck, say "warning" not "error", and not exit with non-zero because of this? -
Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.
Signed-off-by: Johan Herland <johan@herland.net>
---
Like this?
Or would you rather switch around the "verbose" and the
"parse_and_verify_tag_buffer()" (i.e. not even attempt the thorough
verification unless in verbose mode)?
Have fun!
...Johan
builtin-fsck.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/builtin-fsck.c b/builtin-fsck.c
index bacae5d..fb9a8bb 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -359,11 +359,24 @@ static int fsck_commit(struct commit *commit)
static int fsck_tag(struct tag *tag)
{
struct object *tagged = tag->tagged;
+ enum object_type type;
+ unsigned long size;
+ char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
if (verbose)
fprintf(stderr, "Checking tag %s\n",
sha1_to_hex(tag->object.sha1));
+ if (!data)
+ return objerror(&tag->object, "could not read tag");
+ if (type != OBJ_TAG) {
+ free(data);
+ return objerror(&tag->object, "not a tag (internal error)");
+ }
+ if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
+ objwarning(&tag->object, "failed thorough tag object verification");
+ free(data);
+
if (!tagged) {
return objerror(&tag->object, "could not load tagged object");
}
--
1.5.2
--
Johan Herland, <johan@herland.net>
www.herland.net
-
Actually I was thinking about doing something like this. - if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose) + if (parse_and_verify_tag_buffer(0, data, size, verbose)) -
Well, after running fsck with --verbose, I take the whole suggestion back. I think it is a good idea to do the "thorough" tag validation in general, and it should not be buried under the verbose output, which is almost useless unless in a very narrow special case that you are really trying to see which exact object is corrupt. So I think your original patch to signal error on thorough tag validation failure is probably a good approach in general. People need to know that in git.git fsck would return non-zero because of v0.99 tag, but the people who get hit/annoyed by this ought to be minority. It may be the case that a major portion of git users currently are the ones who futz with the git.git repository, but there would be a serious problem if it continues to be the case ;-) -
I also noticed that a number of the early tags in the kernel repo use the ancient format, and would thus fail fsck. <stroke-of-madness> Could we replace the v0.99 tag (and other ancient tags) with "correct" versions, and then encourage users who have already cloned to delete their v0.99 tag and re-pull? New clones would of course never see the old tag at all. This sure as hell sounds similar to inserting foot into mouth before shooting oneself in said foot, but it might still be worth considering... </stroke-of-madness> Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
I actually think that is not too bad. In the course of git development, the kernel folks had to do the wholesale repository conversion twice (once when the order of hashing and compression was swapped, another when the flat tree was made hierarchical), I think. Compared to that, tags are not referred to by other entities, so it's much easier to "convert" (iow, re-sign). -
The new structure of tag objects is documented. Also some much-needed cleanup is done. E.g. remove the paragraph on the 8kB limit, since this limit was removed ages ago. Signed-off-by: Johan Herland <johan@herland.net> --- Documentation/git-mktag.txt | 42 ++++++++++++++++++++++++++++++------------ 1 files changed, 30 insertions(+), 12 deletions(-) diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt index 2860a3d..411105d 100644 --- a/Documentation/git-mktag.txt +++ b/Documentation/git-mktag.txt @@ -8,40 +8,58 @@ git-mktag - Creates a tag object SYNOPSIS -------- -'git-mktag' < signature_file +[verse] +'git-mktag' < tag_data_file + DESCRIPTION ----------- -Reads a tag contents on standard input and creates a tag object +Reads tag object data on standard input and creates a tag object that can also be used to sign other objects. The output is the new tag's <object> identifier. -Tag Format + +DISCUSSION ---------- -A tag signature file has a very simple fixed format: three lines of +Tag object data has the following format +[verse] object <sha1> type <typename> - tag <tagname> + tag <tagname> (optional) + keywords <keywords> (optional) + tagger <committer> + +followed by a blank line and a free-form message and an optional signature +that git itself doesn't care about, but that may be verified with gpg or +similar. -followed by some 'optional' free-form signature that git itself -doesn't care about, but that can be verified with gpg or similar. +In the above listing, `<sha1>` represents the object pointed to by this tag, +`<typename>` is the type of the object pointed to ("tag", "blob", "tree" or +"commit"), `<tagname>` is the name of this tag object (and must correspond +to the name of the corresponding ref (if any) in `.git/refs/`). `<keywords>` +is a comma-separated list of keywords associated with this tag object, and +`<committer>` holds the "`name <email>`" of the tag ...
The existing tests are updated to reflect the changes in the tag object. Additionally some more tests are added to test the new "keywords" header, and to test the more thorough verification routine. Signed-off-by: Johan Herland <johan@herland.net> --- t/t3800-mktag.sh | 204 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 190 insertions(+), 14 deletions(-) diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh index 7c7e433..f6e3d10 100755 --- a/t/t3800-mktag.sh +++ b/t/t3800-mktag.sh @@ -34,7 +34,7 @@ too short for a tag EOF cat >expect.pat <<EOF -^error: .*size wrong.*$ +^error: .* size .*$ EOF check_verify_failure 'Tag object length check' @@ -46,6 +46,8 @@ cat >tag.sig <<EOF xxxxxx 139e9b33986b1c2670fff52c5067603117b3e895 type tag tag mytag +tagger foo + EOF cat >expect.pat <<EOF @@ -61,6 +63,8 @@ cat >tag.sig <<EOF object zz9e9b33986b1c2670fff52c5067603117b3e895 type tag tag mytag +tagger foo + EOF cat >expect.pat <<EOF @@ -76,6 +80,8 @@ cat >tag.sig <<EOF object 779e9b33986b1c2670fff52c5067603117b3e895 xxxx tag tag mytag +tagger foo + EOF cat >expect.pat <<EOF @@ -91,7 +97,7 @@ echo "object 779e9b33986b1c2670fff52c5067603117b3e895" >tag.sig printf "type tagsssssssssssssssssssssssssssssss" >>tag.sig cat >expect.pat <<EOF -^error: char48: .*"[\]n"$ +^error: char48: .*"[\]n" after "type"$ EOF check_verify_failure '"type" line eol check' @@ -103,10 +109,12 @@ cat >tag.sig <<EOF object 779e9b33986b1c2670fff52c5067603117b3e895 type tag xxx mytag +tagger foo + EOF cat >expect.pat <<EOF -^error: char57: no "tag " found$ +^error: char57: .*$ EOF check_verify_failure '"tag" line label check #1' @@ -118,21 +126,27 @@ cat >tag.sig <<EOF object 779e9b33986b1c2670fff52c5067603117b3e895 type taggggggggggggggggggggggggggggggg tag +keywords foo +tagger bar@baz.com + EOF cat >expect.pat <<EOF -^error: char87: no "tag " found$ +^error: char87: .*$ EOF ...
This patch adds the check described by Junio.
Signed-off-by: Johan Herland <johan@herland.net>
---
builtin-fsck.c | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/builtin-fsck.c b/builtin-fsck.c
index a8914ae..379317e 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -515,6 +515,25 @@ static int fsck_handle_reflog(const char *logname, const unsigned char *sha1, in
return 0;
}
+static void fsck_verify_ref_to_tag_object(const char *refname, struct object *obj)
+{
+ /* Verify that refname matches the name stored in obj's "tag" header */
+ struct tag *tagobj = (struct tag *) parse_object(obj->sha1);
+ size_t tagname_len = strlen(tagobj->tag);
+ size_t refname_len = strlen(refname);
+
+ if (!tagname_len) return; /* No tag name stored in tagobj. Nothing to do. */
+
+ if (tagname_len < refname_len &&
+ !memcmp(tagobj->tag, refname + (refname_len - tagname_len), tagname_len) &&
+ refname[(refname_len - tagname_len) - 1] == '/') {
+ /* OK: tag name is "$name", and refname ends with "/$name" */
+ return;
+ }
+ else
+ error("%s: Mismatch between tag ref and tag object's name %s", refname, tagobj->tag);
+}
+
static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
{
struct object *obj;
@@ -529,6 +548,8 @@ static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int f
/* We'll continue with the rest despite the error.. */
return 0;
}
+ if (obj->type == OBJ_TAG) /* ref to tag object */
+ fsck_verify_ref_to_tag_object(refname, obj);
default_refs++;
obj->used = 1;
mark_reachable(obj, REACHABLE);
--
1.5.2
-
Johan, I gave only a cursory look at the series so far (that is, I looked at your log message and had a very quick pass over the code to see what the basic idea is and if it is sound, without really reading/reviewing the code). It all looks good. -
The code that was enabled by passing a non-zero 'thorough_verify' argument to parse_and_verify_tag_buffer() moved the 'tag_line' and 'keywords_line' pointer variables forward in memory while checking for illegal chars. These pointers were later used when setting the respective members on the parsed tag object. The fix refactors the verification loop so as to use offsets to the 'tag_line' and 'keywords_line' pointers, instead of moving the pointers directly. The patch also includes cleanup of the code associated with moving the various '*_line' pointers past their initial header field identifier. These operations are now done along with the calculation of their corresponding '*_len' variables. The patch also includes minor changes to expected output in associated testcases. The bug was discovered by inspection. Currently none of the callers of parse_and_verify_tag_buffer() that use thorough_verify != 0, also use the 'tag' and 'keywords' members of the parsed tag object. Signed-off-by: Johan Herland <johan@herland.net> --- This goes on top of the existing "Refactor the tag object" patch series. Have fun! ...Johan t/t3800-mktag.sh | 8 ++++---- tag.c | 49 ++++++++++++++++++++++++++----------------------- 2 files changed, 30 insertions(+), 27 deletions(-) diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh index f6e3d10..ac9008a 100755 --- a/t/t3800-mktag.sh +++ b/t/t3800-mktag.sh @@ -186,7 +186,7 @@ tagger bar@baz.com EOF cat >expect.pat <<EOF -^error: char67: could not verify tag name$ +^error: char66: could not verify tag name$ EOF check_verify_failure 'verify tag-name check' @@ -240,7 +240,7 @@ tagger bar@baz.com EOF cat >expect.pat <<EOF -^error: char83: .*$ +^error: char82: .*$ EOF check_verify_failure '"keywords" line check #1' @@ -258,7 +258,7 @@ tagger bar@baz.com EOF cat >expect.pat <<EOF -^error: char87: .*$ +^error: char86: .*$ EOF check_verify_failure '"keywords" line check #2' @@ ...
This patch series introduces soft references (softrefs); a mechanism for declaring reachability between arbitrary (but existing) git objects. Softrefs are meant to provide the mechanism for "reverse mapping" that we determined was needed for tag objects (especially 'notes'). The patch series also teaches git-mktag to create softrefs for all tag objects. See the Discussion section in the git-softref manual page (patch #4/7) or the comments in the header file (patch #1/7) for more details on the design of softrefs. I've added some informal performance data at the bottom of this mail [1]. Note that this patch series is incomplete in that the following things have yet to be implemented: 1. Clone/fetch/push of softrefs 2. Packing of softrefs 3. General integration of softrefs into parts of git where they might be useful 4. Find appropriate value for MAX_UNSORTED_ENTRIES There are also some questions connected to the above list of todos: 1. Just how should softrefs affect reachability? Should softrefs be used/followed in _all_ reachability computations? If not, which? 2. How should softrefs propagate. I suggest they are pretty much always propagated under clone/fetch/push. (Note that the softrefs merge algorithm in softrefs.c removes duplicates and softrefs between non-existing objects, so pre-filtering of the softrefs to be clones/fetched/pushed may not be necessary) 3. Where can softrefs be used to improve performance by replacing existing techniques? 4. How to best pack softrefs? Keeping them in the same pack as the objects they refer to seems to be a good idea, but more thought needs to be put into this before we can make an implementation 5. How to find _all_ (even unreachable) tag objects in repo for 'git-softref --rebuild-tags'? 6. Optimization. Pretty much nothing has been done so far. Performance seems to be acceptable for now. Probably needs more testing to determine bottlenecks NOTE: After the 7 ...
See patch for documentation. Signed-off-by: Johan Herland <johan@herland.net> --- softrefs.h | 188 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 188 insertions(+), 0 deletions(-) create mode 100644 softrefs.h diff --git a/softrefs.h b/softrefs.h new file mode 100644 index 0000000..db0f8b9 --- /dev/null +++ b/softrefs.h @@ -0,0 +1,188 @@ +#ifndef SOFTREFS_H +#define SOFTREFS_H + +/* + * Softrefs is a general mechanism for declaring a relationship between two + * existing arbitrary objects in the repo. Softrefs differ from the existing + * reachability relationship in that a softref may be created after _both_ of + * the involved objects have been added to the repo. In contrast, the regular + * reachability relationship depends on the reachable object's name being + * stored _inside_ the other object. A reachability relationship can therefore + * not be created at a later time without violating the immutability of git + * objects. + * + * Softrefs are defined as going _from_ one object _to_ another object. Once + * a softref between two objects has been created, the "to" object is + * considered reachable from the "from" object. + * + * Also, softrefs are stored in a way that makes it easy and quick to find all + * the "to" objects reachable from a given "from" object. + * + * The softrefs db consists of two files: .git/softrefs.unsorted and + * .git/softrefs.sorted. Both files use the same format; one softref per line + * of the form "<from-sha1> <to-sha1>\n". Each sha1 sum is 40 bytes long; this + * makes each entry exactly 82 bytes long (including the space between the sha1 + * sums and the terminating linefeed). + * + * The entries in .git/softrefs.sorted are sorted on <from-sha1>, in order to + * make lookup fast. + * + * The entries in .git/softrefs.unsorted are _not_ sorted. This is to make + * insertion fast. + * + * When softrefs are created (by calling add_softref()/add_softrefs()), they + * are appended to ...
Hi, This is preposterous. Either you substitute the patch for a documentation, or you document it in the commit message. I consider commit messages like "See patch for documentation" as reasonable as all those CVS "** no This sure sounds like you buy the disadvantages of both. Last time I checked, it was recommended to look at your needs and pick _one_ appropriate data structure fitting _all_ your needs. Besides, your lines are way too long. Yes, it is not in Documentation/SubmittingPatches, but even just a cursory look into the existing source shows you that it is mostly 80-chars-per-line. I think it goes without saying that you should try to imitate the existing practices in any project, and since you have to read the source to get acquainted with it _anyway_, it would only be a duplication to have it in SubmittingPatches, too. Ciao, Dscho -
Well, maybe we should do this. diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 01354c2..4bdfdfe 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -5,6 +5,7 @@ Checklist (and a short version for the impatient): - make commits of logical units - check for unnecessary whitespace with "git diff --check" before committing + - tab width is 8, the terminal is 80-columns wide. - do not check in commented out code or unneeded files - provide a meaningful commit message - the first line of the commit message should be a short @@ -82,6 +83,14 @@ option). Another thing: NULL pointers shall be written as NULL, not as 0. +(1b) Tab width is 8, Terminal is 80-column wide. + +We generally follow the same coding style guidelines as the +Linux kernel project. Lines are indented with Tabs, each of +which are 8 columns wide. Lines should fit on 80-column wide +terminals. + + (2) Generate your patch using git tools out of your commits. git based diff tools (git, Cogito, and StGIT included) generate -
Hi, But where to stop? Many people want to put an opening curly bracket in its own line. Other indenting is subject for discussion, too. White space after operators, but not after function names should be included, too. I know you mean good, but I think it is not a bad idea to let people get familiar with the code (and the formatting rules) first. This way we can even tell who did, and who did not do that, before submitting a patch. Ciao, Dscho -
Well, I could have copied documentation from the header file into the commit First, the unsorted file is bounded in size to make sure it never gets large enough to really impact performance Second, I'd ask you to look at the performance numbers (in patch #0) I have indeed tried to follow the 80-chars-per-line rule. In softrefs.h I fail to see a _single_ line exceeding 80 chars per line. In the other files I believe the number of long lines is comparable to other files in the git repo. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
This code tries to implement the softrefs API as straightforwardly as
possible. Virtually no optimization has been done, although I do have
a feeling the code has ok performance as is.
All functions that do not appear in the API docs have some comments
attached to them.
There are also a couple of things to be considered before inclusion:
- File locking. Currently no locking is performed on softrefs files
before reading or writing entries.
- Packing. We need a plan for how softrefs should be included in packs,
at which supporting code must be added to this implementation.
Signed-off-by: Johan Herland <johan@herland.net>
---
softrefs.c | 712 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 712 insertions(+), 0 deletions(-)
create mode 100644 softrefs.c
diff --git a/softrefs.c b/softrefs.c
new file mode 100644
index 0000000..c7308c8
--- /dev/null
+++ b/softrefs.c
@@ -0,0 +1,712 @@
+#include "cache.h"
+#include "softrefs.h"
+
+/* constants */
+static const char * UNSORTED_FILENAME = "softrefs.unsorted";
+static const char * SORTED_FILENAME = "softrefs.sorted";
+static const unsigned int MAX_UNSORTED_ENTRIES = 1000;
+
+
+/* softref entry as it appears in a softrefs file */
+struct softrefs_entry {
+ char from_sha1_hex[40];
+ char space;
+ char to_sha1_hex[40];
+ char lf;
+};
+
+/* simple encapsulation of a softrefs file */
+struct softrefs_file {
+ char *filename;
+ int fd;
+ struct softrefs_entry *data; /* mmap()ed softrefs_entry objects */
+ unsigned long data_len; /* # of softrefs_entry objects in data */
+};
+
+/* Internal file opened/closed by (de)init_softrefs_access() */
+static struct softrefs_file *internal_file = 0;
+
+/*
+ * Open and mmap() the given filename, Assign the file descriptior, data
+ * pointer and data length to the given softrefs_file object.
+ * Return 0 on success, -1 on failure.
+ *
+ * Note that a non-existing file is not a failure per se, but is rather treated
+ ...git-softref is meant to be used from shell scripts that need to interact
with the softrefs database. The builtin command provides most of the
functionality present in the softrefs C API.
Documentation to follow in a subsequent patch.
Signed-off-by: Johan Herland <johan@herland.net>
---
builtin-softref.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 167 insertions(+), 0 deletions(-)
create mode 100644 builtin-softref.c
diff --git a/builtin-softref.c b/builtin-softref.c
new file mode 100644
index 0000000..f95db4e
--- /dev/null
+++ b/builtin-softref.c
@@ -0,0 +1,167 @@
+/*
+ * git softref builtin command
+ *
+ * Add, list and administer soft references (softrefs)
+ *
+ * Copyright (c) 2007 Johan Herland
+ */
+
+#include "cache.h"
+#include "tag.h"
+#include "refs.h"
+#include "softrefs.h"
+
+static const char builtin_softref_usage[] =
+ "git-softref [ --list [<from-object>]"
+ " | --has <from-object> <to-object>"
+ " | --add <from-object> <to-object>"
+ " | --rebuild-tags"
+ " | --merge-unsorted [<softrefs-file>]"
+ " | --merge-sorted <softrefs-file> ]";
+
+static int list_helper(
+ const unsigned char *from_sha1,
+ const unsigned char *to_sha1,
+ void *cb_data)
+{
+ printf("%s %s\n", sha1_to_hex(from_sha1), sha1_to_hex(to_sha1));
+ return 0;
+}
+
+int rebuild_tags_helper(
+ const char *refname,
+ const unsigned char *sha1,
+ int flags,
+ void *cb_data)
+{
+ struct object *o = parse_object(sha1);
+ if (o && o->type == OBJ_TAG) {
+ struct tag *t = (struct tag *) o;
+ struct softref_list **prev = (struct softref_list **) cb_data;
+ struct softref_list *current = xmalloc(sizeof(struct softref_list));
+ current->next = *prev;
+ hashcpy(current->from_sha1, t->tagged->sha1);
+ hashcpy(current->to_sha1, t->object.sha1);
+ *prev = current;
+ }
+ return 0;
+}
+
+int cmd_softref(int argc, const char **argv, const char *prefix)
+{
+ int i;
+ int show_usage = 0, list = 0, has = 0, add = 0, ...Signed-off-by: Johan Herland <johan@herland.net> --- Documentation/git-softref.txt | 119 +++++++++++++++++++++++++++++++++++++++++ 1 files changed, 119 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-softref.txt diff --git a/Documentation/git-softref.txt b/Documentation/git-softref.txt new file mode 100644 index 0000000..6a3e13b --- /dev/null +++ b/Documentation/git-softref.txt @@ -0,0 +1,119 @@ +git-softref(1) +============== + +NAME +---- +git-softref - Create, list and administer soft references + + +SYNOPSIS +-------- +[verse] +'git-softref' --list [<from-object>] +'git-softref' --has <from-object> <to-object> +'git-softref' --add <from-object> <to-object> +'git-softref' --rebuild-tags +'git-softref' --merge-unsorted [<softrefs-file>] +'git-softref' --merge-sorted <softrefs-file> + + +DESCRIPTION +----------- +Query and administer soft references in a git repository. + +Soft references are used to declare reachability between already existing +objects. An object (called the 'to-object') may be declared reachable from +another object (the 'from-object') without affecting the immutability of either +object. + +The `--list` option will list existing softrefs in the database. If given the +optional <from-object>, the list is limited to softrefs from the given object. +The `--has` option is used to check for the existence of a softref between two +given objects, similarly the `--add` option is used to add such a softref. + +The `--rebuild-tags` option is used to generate softrefs for all tag objects in +the repository reachable from tag refs. Tag objects use softrefs to declare +reachability 'from' the tagged object, 'to' the tag object. This allows for tag +objects to the cloned/fetched/pushed along with their associated objects. + +Finally, the `--merge-unsorted` and `--merge-sorted` options are used to merge +softrefs files into the sorted softrefs db. The filename argument must point +to an existing file in unsorted/sorted softrefs ...
Adds testing of the basic options available to the git-softref command. Signed-off-by: Johan Herland <johan@herland.net> --- t/t3050-softrefs.sh | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 314 insertions(+), 0 deletions(-) create mode 100755 t/t3050-softrefs.sh diff --git a/t/t3050-softrefs.sh b/t/t3050-softrefs.sh new file mode 100755 index 0000000..a925178 --- /dev/null +++ b/t/t3050-softrefs.sh @@ -0,0 +1,314 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johan Herland +# + +test_description='Basic functionality of soft references' +. ./test-lib.sh + + +# Prepare repo and create some notes + +test_expect_success 'Populating repo with test data' ' + echo "foo" > foo && + git-add foo && + test_tick && + git-commit -m "Initial commit" && + git-tag -a -m "Tagging initial commit" footag && + echo "bar" >> foo && + test_tick && + git-commit -m "Second commit" foo && + git-tag -a -m "Tagging second commit" bartag +' + +# At this point we should have: +# - commit @ 301711b66fe71164f646b798706a2c1f7024da8d ("Initial commit") +# - tag @ ad60bc179c6874af6d97f181c67f11adcca5122b ("footag") +# - commit @ 9671cbee7ad26528645b2665c8f74d39a6288864 ("Second commit") +# - tag @ a927fc832d42f1f64d8318e8acec43545d9562de ("bartag") +# - The tag creation should also have created softrefs: +# - From "Initial commit" to "footag" +# - From "Second commit" to "bartag" + +# Testing git-softref --list + +test_expect_success 'Testing git-softref --list on initial test data (1)' ' + cat > expected_output << EOF && +301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b +9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de +EOF + git-softref > actual_output 2>&1 && + cmp actual_output expected_output && + git-softref --list > actual_output 2>&1 && + cmp actual_output expected_output +' + +test_expect_success 'Testing git-softref --list on initial test data (2)' ' + cat > ...
Also cleans up sorting and whitespace in Documentation/cmd-list.perl Signed-off-by: Johan Herland <johan@herland.net> --- .gitignore | 1 + Documentation/cmd-list.perl | 7 ++++--- Makefile | 6 ++++-- builtin.h | 1 + git.c | 1 + 5 files changed, 11 insertions(+), 5 deletions(-) diff --git a/.gitignore b/.gitignore index 27e5aeb..7fd6904 100644 --- a/.gitignore +++ b/.gitignore @@ -119,6 +119,7 @@ git-show git-show-branch git-show-index git-show-ref +git-softref git-ssh-fetch git-ssh-pull git-ssh-push diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index a181f75..4e1f45b 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -90,6 +90,7 @@ git-clean mainporcelain git-clone mainporcelain git-commit mainporcelain git-commit-tree plumbingmanipulators +git-config ancillarymanipulators git-convert-objects ancillarymanipulators git-count-objects ancillaryinterrogators git-cvsexportcommit foreignscminterface @@ -101,13 +102,13 @@ git-diff-files plumbinginterrogators git-diff-index plumbinginterrogators git-diff mainporcelain git-diff-tree plumbinginterrogators -git-fast-import ancillarymanipulators +git-fast-import ancillarymanipulators git-fetch mainporcelain git-fetch-pack synchingrepositories git-fmt-merge-msg purehelpers git-for-each-ref plumbinginterrogators git-format-patch mainporcelain -git-fsck ancillaryinterrogators +git-fsck ...
For each tag object created, we create a corresponding softref from the
tagged object to the tag object itself. This is needed to enable efficient
lookup of which tag objects that point to a given commit/object.
Signed-off-by: Johan Herland <johan@herland.net>
---
mktag.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/mktag.c b/mktag.c
index af0cfa6..db8a6b8 100644
--- a/mktag.c
+++ b/mktag.c
@@ -1,5 +1,6 @@
#include "cache.h"
#include "tag.h"
+#include "softrefs.h"
/*
* Tag object data has the following format: two mandatory lines of
@@ -32,6 +33,7 @@ int main(int argc, char **argv)
{
unsigned long size = 4096;
char *buffer = xmalloc(size);
+ struct tag result_tag;
unsigned char result_sha1[20];
if (argc != 1)
@@ -46,7 +48,7 @@ int main(int argc, char **argv)
buffer[size] = 0;
/* Verify tag object data */
- if (parse_and_verify_tag_buffer(0, buffer, size, 1)) {
+ if (parse_and_verify_tag_buffer(&result_tag, buffer, size, 1)) {
free(buffer);
die("invalid tag data file");
}
@@ -57,6 +59,13 @@ int main(int argc, char **argv)
}
free(buffer);
+
+ /* Create reverse mapping softref */
+ if (add_softref(result_tag.tagged->sha1, result_sha1) < 0) {
+ die("unable to create softref for resulting tag object %s",
+ sha1_to_hex(result_sha1));
+ }
+
printf("%s\n", sha1_to_hex(result_sha1));
return 0;
}
--
1.5.2.1.144.gabc40
-
The text-based softrefs file format uses 82 bytes per entry (40 bytes from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF). The binary softrefs file format uses 40 bytes per entry (20 bytes from_sha1, 20 bytes to_sha1). Moving to a binary format increases performance slightly, but sacrifices easy readability of the softrefs files. Signed-off-by: Johan Herland <johan@herland.net> --- To illustrate the change in performance from changing the softrefs file format, I prepared a linux repo (holding 57274 commits) with 10 tag objects, and created softrefs from every commit to every tag object (572740 softrefs in total). The resulting softrefs db was 46964680 bytes when using the text format, and 22909600 bytes when using the binary format. The experiment was done on a 32-bit Intel Pentium 4 (3 GHz w/HyperThreading) with 1 GB RAM: ======== Operations on unsorted softrefs: (572740 (10 per commit) entries in random/unsorted order) ======== Listing all softrefs (sequential reading of unsorted softrefs file) -------- [text format] $ /usr/bin/time git softref --list > /dev/null 0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+11786minor)pagefaults 0swaps [binary format] $ /usr/bin/time git softref --list > /dev/null 0.35user 0.01system 0:00.38elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5913minor)pagefaults 0swaps Listing HEAD's softrefs (sequential reading of unsorted softrefs file) -------- [text format] $ /usr/bin/time git softref --list HEAD > /dev/null 0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+11790minor)pagefaults 0swaps [binary format] $ /usr/bin/time git softref --list HEAD > /dev/null 0.02user 0.01system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5918minor)pagefaults 0swaps Sorting softrefs -------- [text format] $ /usr/bin/time git softref ...
Hi, It is bad style to introduce one type, and then change it to another in a backwards-incompatible way. Either you make it backwards compatible, or you start with the second format, never even mentioning that you had another format. Ciao, Dscho -
While I agree with that in principle, I think you are being a bit too harsh to a set of patches that shows possible alternatives for an idea that is not even in any unreleased version of git. Got out of the wrong side of bed this morning? -
Hi, Possibly. Except it was not a bed, but an airplane passenger seat. And it did not help that I totally disagree with the approach: "sorted list does not do this well, unsorted not that... let's take both!" So, please take my words with a grain of salt. But I still think that _what_ I was saying is correct. Yes, the patch series shows an alternative, but I would have wished for either a smaller quick-n-dirty proof-of-concept implementation ([RFC]), or a better thought-through one ([PATCH]). Ciao, Dscho -
As Junio correctly pointed out, this patch is only here to demonstrate an alternative solution. Whether or not this patch should be used should be determined _long_ before (even thinking about) putting this into a release. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
The patch series to look-up and maintain "softref" relationship is trivially clean. Although I probably would have many nits to pick, I do not think it is _wrong_ in a major way per-se. I would not even mind saying that I liked the basic concept, until I thought things a bit deeper. Here are some initial notes I took while reading your patches. Semantics --------- Not all "softref" relationship is equal. "This object is referred to by these tags" is one obvious application, and only because we already try to follow tags when git-fetch happens anyway, it looks natural to make everybody follow such a softref relationship. However, as Pierre Habouzit wants to, we may want to make a bug tracking sheet (the details of the implementation of such a bug tracking sheet does not matter in this discussion -- it could be a blob, a commit, or a tag) refer to commits using this mechanism, by pointing at the blob from commits after the fact (i.e. "later it was verified that this commit fixes the bug described in this bug entry"). Side note: I am assuming a simplest implementation where one blob would always capture the latest status of the bug. refs/bugs/12127 would point at the latest version of bug 12127's tracking sheet. An alternative implementation would be to represent each entry of the tracking sheet for a single bug as a blob, and have multiple blobs associated to a commit on the main project, or the other way around, but then you would need a way to give order between referers to a single referent, which I do not find in your proposed "softref". Most users who want to download and compile the main project do not care about bug tracker objects. You would need to have a way to describe what kind of relationship a particular softlink represents, and adjust the definition of reachability for the purposes of traversal of objects. It gets worse when you actually start using softrefs. I do not think you would have a limited set of softrefs, such ...
Yes, I'm starting to see that it's not a good idea to put _all_ softrefs in We're getting a little ahead of ourselves, aren't we? IMHO, it would be up to the bug system to determine which (and how many) connections to make between the bug reports and the commits (or even if softrefs would be the correct mechanism for these connections at all). We shouldn't necessarily base the softrefs design on how we imagine a hypothetical bug system to work. But Pierre might have something to say on how he would want to use softrefs, and his system is hopefully _less_ hypothetical. :) But I can see the use of letting the user/porcelain/bugtracker define Yes, I see that different classes of softrefs would have different semantics for propagation. we could probably try to set up some sane defaults, and then let users put rules in their configs for how they would want to Isn't there a .used flag on objects we could easily check to see if we've seen an object before, thus preventing us from following the circular Hmm. First of all, I'm not sure it would be useful to add a _direct_ link between E and A, but even so... I'm thinking we can do the regular/current reachability calculation first, and after it's done, we analyze the softrefs to see if there are more objects to be fetched. In that scenario, we wouldn't need to send A again, As above, if the receving side gets the list of involved softrefs, it can make the decision on which objects it needs to fetch. Hmm. Thinking about it, this process would of course need to be recursive, Yeah, there are still may open questions. But I'm glad to see that most people seem to find the basic concept useful, at least. Thanks for taking the time and effort to comment. Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
Hi, Has my lightweight annotation patch reached you? I like my approach better than yours, because it is 1) a way, way smaller patch, and 2) it automatically includes the versionability. After thinking about it a little more (my plane was slow, and as a result I am allowed to spend 8 more hours in Paris), I think that a small but crucial change would make this thing even more useful: Instead of having "core.showAnnotations" be a boolean config, it might be better to have "core.annotationsRef" instead, overrideable by the environment variable GIT_ANNOTATION_REF. With this, you can have different refs for different kinds of annotations. For example, some people might add bugtracker comments (even comments like "this commit was bad: introduced bug #798, solved by commit 9899fdadc.."). Those comments could live in refs/annotations/bugs. To see them, just say GIT_ANNOTATION_REF=refs/annotations/bugs gitk Voila. I am quite certain that treating annotations as branches, containing fan-out directories for the reverse lookup. I am even quite certain that in most cases, a working-directory-less merging is possible for such annotations. Ciao, Dscho -
I see your point, but your lightweight annotations are solving a different problem, aren't they? They do provide the after-the-fact annotations that sort of sparked of these discussions, but I can't see how your patch is a replacement of the general "relationships between arbitrary objects" concept that softrefs try to solve. Of course, it might be that the lightweight annotations are "good enough" for the use cases we currently see, and that softrefs are a bit overkill. I'm not convinced about the working-directory-less merging. AFAICS the lightweight annotations will behave pretty much like the "regular" version controlled filesystem, and you'll have the same kind of conflicts when you merge stuff between repos. I'd be glad to be proven wrong, of course. Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
Sorry for the noise, but I'm really pissed with git@vger recently. That mail I'm answering to, never made it to the list. Neither did my quite long answer to Martin, who kindly forwared it back to me so that I can send it again, sadly git@vger just does not wants mail, whereas its SMTP seems to accept mail: Jun 10 16:02:09 pan postfix/smtp[20560]: DE84FCAD9: to=3D<git@vger.kernel= =2Eorg>, relay=3Dvger.kernel.org[209.132.176.167]:25, delay=3D4, delays=3D0= =2E31/0.02/0.5/3.1, dsn=3D2.7.0, status=3Dsent (250 2.7.0 nothing apparentl= y wrong in the message. BF:<H 0.0800319>; S1754569AbXFJOCJ) I've sent a mail to postmaster@vger.kernel.org a week ago, but it seems it remained a dead letter. Is anyone able to tell what's going wrong ? it's _really_ irritating, and let me want to give up discussions, as I _hate_ losing mail (I usually don't keep a copy of mails I send to a mail list as I expect it to send it back to me, and well, what would be a copy worth if nobody can read the mail anyway ?). So if anyone knows what can be done .... --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
el.org>, relay=3Dvger.kernel.org[209.132.176.167]:25, delay=3D4, delays=3D0= =2E31/0.02/0.5/3.1, dsn=3D2.7.0, status=3Dsent (250 2.7.0 nothing apparentl= SOrry, *I* screwed up. I did not checked first if the archived had the mails, and it had. So as I don't see the mails come, it's definitely a problem in between, and is not necessarilly vger.kernel.org's fault, and I apologies for the noise. So if anyone sent a mail to the list (without me in Cc) hoping that I would get the mail and never answered, it's time to send it again :/ That is also the reason why some of my mails have been sent many times: because I thought they were eaten at some point. That should not happen again, sorry about that. Cheers, --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
To be fair, I'm still struggling with the storage backend yet, trying to make things fast enough (My current import rate of mails is 10 per second, wich is not that brilliant I guess), and also to design some simple things like "answering" to a bug. For now, my design is the following, I've a 'bts' branch where the bugs reports (plain mailboxes) go. Grit is able to manage as many branch the user wants, bts is just the default name for it. Then, for a bts branch, you have $GIT_DIR/grit/<branch>.index and $GIT_DIR/grit/<branch>/. The former is the index of the tip of the bts branch, and the latter contains some bits of the tip of the branch checkouted (can be seen as some kind of cache, useful to run mutt -f on a mbox e.g.). Bugs have the sha id of the hash of the first imported mail, and are put in sha[:2]/sha[2:] files, =C3=A0 la .git/objects/. I also should have a second file with annotations about the bug, format not really clear for now, as "one file per bug" could be quite inefficient. OTOH if I mix too many bugs in the same file, the merge risk is bigger (but I suppose I could use a specific merge strategy on this). Here is the sole non hypothetical thing yet. My plans then was to use "links" (softlinks or not, I'm speaking generically, I hope softrefs will match my needs, I don't know yet) between specific commits, and bugs. Links would somehow carry information on wether this is an "opening" tag (like: this bug is present starting at that commit), an informationnal tag (like: this commit helps fixing that bug, but is not enough), or a closing/fixing tag (like: this commit fixes it). A fourth kind may be also used aka a not-found tag (like: well this commit does not fixes the bug, but for sure it's not there anymore at that commit). Though, softlinks do not need to "carry" the information for real, they just need to be linked somehow to the bug, bug that would have the annotations for those softlinks in them. What is somehow flawed for me, is that ...
(Resending this in plaintext; sorry to those who got it twice.)
Being able to specify relationships between commits after the fact seems
like a very useful facility.
Does it make sense to have type information to record what the
relationship between two objects means? Without that, it seems like
it'll be hard to build much of a tool set on top of this feature, since
no two tools that made use of it could unambiguously query just their
own softrefs.
A few use cases for relationships-after-the-fact come to mind in
addition to the one the patch itself mentions:
A facility like this could replace the info/grafts file, or at least
provide another way to turn a regular commit into a merge commit. Just
put a "manually specified merge parent" ref between the target revision
and the one you want git to think you've merged from. That would scale a
lot better than info/grafts does, I suspect, if only by virtue of being
O(log n) searchable thanks to the sorting.
One could easily imagine recording a "cherry picked" softref, which
could, e.g., be the rebase machinery to skip over an already-applied
revision. IMO the lack of any tool-readable history about cherry picking
-- which is, after all, a sort of merge, at least conceptually -- is a
shortcoming in present-day git. (Not a huge one, but if nothing else
it'd be great to see cherry picking represented in, e.g., the gitk
history display.)
It might be possible to annotate rebases to make pulling from rebased
branches less troublesome. If you have
A--B--C--D
\
-E--F--G
and you rebase E onto D, a "rebased from" softref could be recorded
between E and E':
A--B--C--D
\ \
-E....E'--F'--G'
Then a pulling client could potentially use that information to cleanly
replay the rebase operation to keep its history straight. Perhaps if you
could record historical rebases like that, you could do away with the
current gotchas involving rebasing shared repositories. One negative
side ...Actually MadCoder/Pierre had a similar idea on IRC. He wanted to separate softrefs into namespaces, so that softrefs for tags could live in a different place than softrefs associated with his "gits" bug tracker. I haven't thought very much about this, but it's certainly possible to do Yes, I _knew_ this was similar to grafts in some way :) While working on this, I tried to see if I could leverage grafts somewhere in my design, but I found them to be too commit-bound and specific. But when you look at it Whoa. I hadn't even imagined this, but I guess you're right. I actually thought about solving the same problem (using a much worse method) way back Thanks a lot for sharing your ideas. :) Have fun! ...Johan [1] http://article.gmane.org/gmane.comp.version-control.git/46137 -- Johan Herland, <johan@herland.net> www.herland.net -
Well, if we're two with the same idea, it's a good one, no ? :) In fact, the namespace idea like I told you on IRC isn't _that_ brilliant. But I'm sure recording a softref with: <from_sha> <to_sha> <token> token would help classify the softref. And I'm sure we'll end up with: <from_sha> <to_sha> <token> <flags> with the flags to say what behaviour (e.g.) the reachability resolver should have wrt that link ? --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Interesting. But I'm not sure I want to give up the fixed-length softref records as I imagine it makes the lookup and processing _much_ faster. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net -
The token (really a namespace identifier) could be defined as a string with a fixed, probably small, maximum size. Or, better IMO, it could be an integer with a bunch of enumerated values for internal git uses and a range reserved for unofficial / experimental use. I agree that fixed-length records seem like a win here, assuming this is the general storage layout we end up with when all is said and done. -Steve -
