I wanted to build an StGit command that coalesced adjacent patches to
a single patch. Because the end result tree would still be the same,
this should be doable without ever involving HEAD, the index, or the
worktree. StGit's existing infrastructure for manipulating patches
didn't lend itself to doing this kind of thing, though: it's not
modular enough. So I started to design a replacement low-level
interface to git, and things got slightly out of hand ... and I ended
up with a much bigger refactoring than I'd planned.
It's all outlined below, and all the code I currently have is
attached. Unless there's opposition, my plan is to convert one command
at a time to use the new infrastructure -- this can be done since the
on-disk format is unaffected.
Comments?
Python wrapping of git objects (gitlib.py)
----------------------------------------------------------------------
To make it easier to work with git objects, I've built a more
high-level interface than just calling git commands directly. Some of
it is trivial:
* Blobs and tags aren't covered (yet), since StGit never uses them.
I think. If they are needed in the future, they can easily be
wrapped.
* Trees are represented by the Python class Tree. They are
immutable, and have only one property: the sha1 of the tree. More
could be added if we need to look inside trees.
The interesting case is for commit objects:
* Commits are represented by the Python class Commit. They are
immutable, and have two properties: sha1 and commit data.
* Commit data is represented by the Python class CommitData. These
objects are also immutable, and have properties for author,
committer, commit message, tree, and list of parent commits. They
also have setter functions for these properties, which (since
CommitData objects are immutable) return a modified copy of the
original object.
* Author and committer are represented by a Person class, also
immutable with setter ...Wouldn't HEAD need to be modified since the commit log changes slightly, even though the tree is the same. Or am I misunderstanding Thanks for this. I'll need a bit of time to read it all and give feedback. In general, I welcome this refactoring. I'll go through the whole e-mail in the next days and get back to you. -- Catalin -
This reminds me of someone suggesting that some patches could be represented by more than one commit. But I'm not sure such a beast would be useful - I fear that would make StGIT much more complicated, but would it really make things better ? Best regards, -- Yann -
You might be remebering me pointing out that the old infrastructure Yes, it makes everything much more complicated, and no, it doesn't buy us anything new. After all, once we know the parent and the tree we want a patch to have, we can just manufacture a commit that has that tree and that parent. My proposed new infrastructure cannot represent such patches, very much by design. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle -
Thanks, I appreciate it. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle -
I eventually managed to have a (not so in-depth) look at this and I am OK with it (but merging after 0.14). As you said the current structure and the new one will work in parallel for some time. It would even be possible to implement unit testing. Some random comments below: Commitdata maybe should be written as CommitData (as in the e-mail text). The Repository.head property is not cached and it is probably OK to avoid some problems I had. We could run profiling afterwards to see As you said, that's a bit difficult for 'push' as it is made of a series of separate individual patch pushes. Can we not use a temporary index by setting GIT_INDEX_FILE during the whole transaction and freely update the working copy. Only if the transaction fails, we clean the working copy and check out the default index? This might slow down some operations though. In the future, it would be nice to record the stack state before transactions in a git object (via pickle) together with HEAD id and provide unlimited undo. Thanks. -- Catalin -
Yes. This is a very nice property that makes the whole enterprise
Yes. Unit testing would be easier with the new code, since it's more
Probably. I've alredy made that spelling error myself at least once
Yes. This applies to a few other things as well -- I try to be careful
to write things so that they could be cached transparently if it turns
out we need it. But I also try to not optimize prematurely.
In some cases the caching is more than just a speed optimization.
Commits and trees can be compared with regular Python object identity,
The sample code I posted does exactly this index trick. So yes, I
think it's a splendid idea. :-)
Basically, what it does (or eventually should do) is repeatedly use
the temp index to create the new commits without ever touching the
default index or the worktree. Only if a merge conflicts would we
check out one of the parents and spill the conflicts to the worktree.
(And once we're finished pushing we'd want to check out the new stack
top, obviously.)
The only problem is that while git-read-tree can do a three-way merge
in a temp index without touching the worktree, git-merge-recursive
assumes that the worktree reflects one of the trees to be merged and
can be written to. Which is _totally OK_ if there really is going to
be a conflict. But merge-recursive does fancier conflict resolution
than read-tree (I think -- I still have to look up exactly what, but I
don't think read-tree does rename detection for example), so it might
Yes, this was exactly what I was refering to when I talked about
transaction logging. The only difference is that I'd prefer a simple
text format instead of a pickle, but we can have a fight about that
detail later. :-)
The complete state that would have to be saved would be
* (patchname, commit sha1) pairs for all patches
* (branchname, branch head sha1) pair
* the ordered lists of applied, unapplied, and eventually also
hidden patches
This is somewhat redundant (applied/unapplied ...Oh, and * .git/config (one blob) * the worktree (one tree object?) * the index (one tree object per stage?) Otherwise, we won't be able to un-clobber them. But a first implementation could certainly skip this part. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle -
