Re: [RFC] Zit: the git-based single file content tracker

Previous thread: Tip: avoiding net overhead using git over sshfs by Felipe Carvalho Oliveira on Wednesday, October 22, 2008 - 6:26 pm. (5 messages)

Next thread: What's cooking in git.git (Oct 2008, #05; Wed, 22) by Junio C Hamano on Wednesday, October 22, 2008 - 10:41 pm. (1 message)
From: Giuseppe Bilotta
Date: Wednesday, October 22, 2008 - 6:29 pm

Hello all,

one of the common remarks done about git is that since it tracks
tree contents, it's not the best-suited tool to track a bunch of
independent files which happen to be in the same directory.

I've found myself in the situation of wanting to track my changes done
to one or more 'single' files in a directory (e.g. $HOME), and
deciding to use antiquate, clumsy, slow and inefficient but file-based
RCS (yes, you read that right) over git.

In other situations (e.g. for my UserJS folder) I ended up using git,
but not liking the idea of having things such as tags referring to all
of my UserJS projects instead of the single file they were inteded
for, or having to put 'filename: ' at the beginning of commit messages
just because the history was shared.

So today I decided to start hacking at a git-based but file-oriented
content tracker, which I decided to name Zit.

The principle is extremely simple: when you choose to start tracking a
file with Zit,

zit track file

Zit will create a directory .zit.file to hold a git repository
tracking the single file .zit.file/file, which is just a hard link to
file.

The reason for using .zit.file as a non-bare repository rather than
just a GIT_DIR is that it allows things such as 'git status' to ignore
everything else. A possible alternative could have been to use
.zit.file as the GIT_DIR and create an all-encopassing
.zit.file/info/exclude, but the general idea of having this kind of
detached GIT_DIR felt less robust (or maybe I just forgot some
export).

I also don't like the idea of the hardlink, first of all because of
portability problems, and secondly because of the way too many
possibility that the hardlink broke somewhere along the way. For
example, I haven't tested any fancy git commands on my sample zit
implementation, and I'm not sure checking out some older version would
actually work.

If anybody is intered in trying out my quick hack for the idea,
there's a git repository for Zit at git://git.oblomov.eu/zit ...
From: Felipe Oliveira Carvalho
Date: Thursday, October 23, 2008 - 5:33 am

It sounds interesting. I have some single files that I would like to track
using git, zit seems to be a good solution.

--
Felipe
--

From: Nguyen Thai Ngoc Duy
Date: Thursday, October 23, 2008 - 5:50 am

Why not use one .zit repo and track each file on each own branch?.
-- 
Duy
--

From: Giuseppe Bilotta
Date: Thursday, October 23, 2008 - 6:33 am

So your proposal is to have a single .zit repo which is actually a git
repo and where each additional tracked file becomes its own branch,
and zit would take care of switching from branch to branch when zit
commands are called?

I think this solution would have a number of problems, apart from
being generally quite messy. First of all, moving a file and its
history somewhere else means toying around with the history of a much
wider repo, whereas the current approach would mean just moving the
.zit.file dir together with the file (modulo hardlinks). Non-linear
histories for a single file would be more complex to handle, too. And
publishing just the history of one file would be damn complex.


-- 
Giuseppe "Oblomov" Bilotta
--

From: Nguyen Thai Ngoc Duy
Date: Thursday, October 23, 2008 - 6:51 am

I don't know if switching is necessary. With one file per pranch, the

The history should be linear. Git (or zit) repository is just a
container for git branches. Each branch contains only one file. Moving
a file history is equivalent to "git push" + "git branch -D".
Something like this (not tested):

cd dst
git init
cd src
git push dst local-branch:remote-branch
git branch -D local-branch


-- 
Duy
--

From: Giuseppe Bilotta
Date: Thursday, October 23, 2008 - 7:21 am

Looks a little too clumsy for my taste. Also, I don't like the idea of
having to enforce linear history for files, or getting rid of the
index. I would like zit to be as lightweight a wrapper for git as
possible, retaining the whole functionality.





-- 
Giuseppe "Oblomov" Bilotta
--

From: Johannes Sixt
Date: Thursday, October 23, 2008 - 6:03 am

git breaks hard links, mind you! (Just in case you check out older
versions and you wonder why your "real" file is not updated).

But there's a recent patch by Dscho floating around that takes care of the
hard link case.

-- Hannes
--

From: Giuseppe Bilotta
Date: Thursday, October 23, 2008 - 6:28 am

I feared that the hardlink choice was not the best one. I would
definitely prefer finding a solution that didn't depend on hardlinks:
not only there would be no worry about breaking them, it'd also be
more portable.

-- 
Giuseppe "Oblomov" Bilotta
--

From: Johannes Schindelin
Date: Friday, October 24, 2008 - 10:44 am

Hi,


Yep, I still want to work on it; it breaks on one of Junio's machines.

Ciao,
Dscho

--

From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 10:48 am

On Fri, Oct 24, 2008 at 7:44 PM, Johannes Schindelin

Well, it's not needed by Zit anymore, but there was someone else
asking about on the ml recently, too 8-)



-- 
Giuseppe "Oblomov" Bilotta
--

From: Jean-Luc Herren
Date: Thursday, October 23, 2008 - 4:23 pm

Hi!


This sounds great and would seem very useful to manage my ~/bin/
directory which contains a set of unrelated one-file-tools that

If you have many files you want to track in a single directory
(like ~/bin/), all those additional directories will quickly feel
like clutter.  If you track every file, it will even double the
number of things you see with an "ls -a".

If you decide against a shared repository, maybe you want to
consider to not use ".zit.file/", but ".zit/file/" as the
repository?  This would reduce the clutter to a single directory,
just like with ".git".  And moving files around wouldn't be that
much complicated.

jlh
--

From: Giuseppe Bilotta
Date: Thursday, October 23, 2008 - 11:55 pm

Right. I'll give that a shot.

-- 
Giuseppe "Oblomov" Bilotta
--

From: Jakub Narebski
Date: Friday, October 24, 2008 - 3:31 am

By the way RCS which I use for version control of single files use
both approaches: it can store 'file,v' alongside 'file' (just like
your '.zit.file/' or '.file.git/'), but it can also store files on
per-directory basis in 'RCS/' subdirectory (proposed '.zit/file/' or
'.zit/file.git/' solution)

By the way, it would be nice to have VC interface for Emacs for Zit...
-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 3:52 am

Indeed, there's not particular reason why both solutions shouldn't be
available. I'll think about implementing it this way:

$ zit init

will indicate that we want to track many files, and thus it will
create a .zit directory under which RCS files will be available.

$ zit track somefile

will start tracking somefile by setting up .zit/somefile.git if .zit
is available or .somefile.git otherwise.

The only problem then is priority. When looking for a file's repo, do
we look at .file.git first, or .zit/file.git? How does RCS behave in

I'm afraid someone else will have to take care of that, since Emacs is
not really something I use.

-- 
Giuseppe "Oblomov" Bilotta
--

From: Jakub Narebski
Date: Friday, October 24, 2008 - 4:32 am

rcsintro(1) states:

  If you don't want to clutter your working directory with RCS files, create
  a  subdirectory called RCS in your working directory, and move all your RCS
  files there.  RCS commands will look *first* into that directory to find

I'll try to hack it using contrib/emacs/vc-git.el as a base...

-- 
Jakub Narebski
Poland
--

From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 5:15 am

Cool. I pushed changes to this end to git.oblomov.eu/zit --now zit
will look for .zit/file.git first, then for .file.git; if neither is
found, and .zit/ exists, the repo is set to .zit/file.git, otherwise
it's set to .file.git

You can either manually mkdir .zit, or use zit init that does exactly

Cool, thanks.

-- 
Giuseppe "Oblomov" Bilotta
--

From: Junio C Hamano
Date: Friday, October 24, 2008 - 11:28 am

I am not opposed to the wish to track a single file (but I have to say I
am not personally in need for such a feature), but I have to wonder from
the technical point of view if one-repo-per-file is the right approach.

Running "git init" in an empty directory consumes about 100k of diskspace
on the machine I am typing this on, and you should be able to share most
of them (except one 41-byte file that is the branch tip ref) when you
track many files inside a single directory by using a single repository,
one branch per file (or "one set of branches per file") model.

--

From: david
Date: Friday, October 24, 2008 - 12:11 pm

the reason to use seperate repos is to ease the work involved if you need 
to move that file (and it's repo) elsewhere.

with the git directory being under .zit, would it be possible to link the 
things that are nessasary togeather?

hmm, looking at this in more detail.

about 44K of diskspace is used by the .sample hook files, so those can be 
removed

the remaining 56K is mostly directories eating up a disk block

find . -ls
200367    4 drwxr-xr-x   7 dlang    users        4096 Oct 24 12:00 .
200368    4 drwxr-xr-x   4 dlang    users        4096 Oct 24 12:00 ./refs
200369    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./refs/heads
200370    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./refs/tags
200371    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./branches
200372    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./hooks
200373    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./info
1798469   4 -rw-r--r--   1 dlang    users         240 Oct 24 12:00 ./info/exclude
1600716   4 -rw-r--r--   1 dlang    users          58 Oct 24 12:00 ./description
200374    4 drwxr-xr-x   4 dlang    users        4096 Oct 24 12:00 ./objects
200375    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./objects/pack
200376    4 drwxr-xr-x   2 dlang    users        4096 Oct 24 12:00 ./objects/info
1600717   4 -rw-r--r--   1 dlang    users          23 Oct 24 12:00 ./HEAD
1600719   4 -rw-r--r--   1 dlang    users          92 Oct 24 12:00 ./config

how many of these are _really_ nessasary?

tags, info, hooks, branches, and description could probably be skipped for 
the common zit case, as long as they can be created as needed.

If git has problems with these not existing, would it make sense to make 
git survive if they are missing and create them if needed?

the objects directory will eat up more space as revisions are checked in 
(and more sub-directories are created), would it make sense to have a 
config option to do a flat ...
From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 12:42 pm

I was slowly writing a reply but it seems David beat me to it, so here
goes a couple of additional comments.


Precisely. The one-repo-per-file is just the simplest and most
flexible solution. But yes, I have to admit I hadn't looked into disk


Exactly. I'm setting up zit to prepare its repos to a more compact

For starters, I'm wondering if setting core.preferSymlinkRefs would be

It seems that tags, hooks, branches and description can be done with.

info contains exclude which is rather essential, and this is something
that could be shared across repositories. Also, we could spare a block
by removing info, moving exclude to the .git dir and setting

This is probably the biggest remaining spacewaste. Typical zit usage
will generate a rather small number of objects, so flattening the
object store for the repo wouldn't be a bad idea. Is that possible?

-- 
Giuseppe "Oblomov" Bilotta
--

From: david
Date: Friday, October 24, 2008 - 12:46 pm

is it? by default everything in this file is commented out. And with you 
only adding files explicitly why would it ever need to excluded anything?

David Lang

--

From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 12:51 pm

Ahem. Yes. I've got a patch ready for zit that gets rid of them.


Zit does
 		echo "*" > $GIT_DIR/info/exclude
and yes it sucks to use a whole block for a file that only contains
one character. Suggestions welcome.

The reason why we want the exclude is that when you do zit status
somefile you don't want every other file in the directory to come up
as 'not tracked'.

-- 
Giuseppe "Oblomov" Bilotta
--

From: david
Date: Friday, October 24, 2008 - 12:54 pm

good point.

David Lang
--

From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 1:13 pm

Yes, the file pointed at by the config key core.excludesfile is read
too, so we could have it point at $GIT_DIR/zitexclude, which would
allow us to spare a block. The most space saving would be achieved by
a core.excludepattern or similar key, which would allow us to get rid
of the exclude file altogether.

-- 
Giuseppe "Oblomov" Bilotta
--

From: Jakub Narebski
Date: Friday, October 24, 2008 - 1:30 pm

Well, with all zit repositories in '.zit/' directory (similar to RCS/)
you could have point core.excludesfile to _common_ '.zit/excludes';
the pattern doesn't change from zit repository to zit repository?

You could even use per-user ~/.zitignore (I'm not sure if git expands 
'~' in paths; there was some patch for it, but was it accepted?) or 
system-wide /usr/lib/zitignore or /usr/libexec/zitignore file.
-- 
Jakub Narebski
Poland
--

From: Giuseppe Bilotta
Date: Saturday, October 25, 2008 - 12:48 am

System-wide means maximum space save, but it require system
administration to install Zit, and considering that one of the things
I love of Zit now is its being self contained, I would rather not
depend on anything system-wide anyway.

The user .zitignore file is probably the best approach: we can create
it ourselves (usually), and even if Git doesn't expand the pathname
itself, we can just use an absolute path. I'll go that way.

-- 
Giuseppe "Oblomov" Bilotta
--

From: Jakub Narebski
Date: Saturday, October 25, 2008 - 2:10 am

First, absolute path to ~/.zitignore is a bit fragile: what if layout
of home directories for users change, for example because of increasing
number of users some fan-out is required (/home/nick -> /home/2/nick)?
Second, ~/.zitignore looks like something that user can change; if
you install zit, it can install libexec/zitignore somewhere... or just
use ./zit/excludes (with 'do not edit' comment perhaps...).

-- 
Jakub Narebski
Poland
--

From: Giuseppe Bilotta
Date: Saturday, October 25, 2008 - 3:30 am

(Actually, I just found another interesting thing about the config, in
that it stores the path to the work tree. This is not a problem,
though, because zit_setup() sets GIT_WORK_TREE.)

As I said, I don't like depending on stuff that needs to be installed.
For example, what about user (non-system) installs? the libexec (or
whatever) solution would have the same problem as the ~/.zitignore
solution, with the moving $HOME.

I guess this leaves the .zit/ solution as the most robust one,
although it's not the most space-effective, especially if you have
many directories, each with a single tracked file. On the plus side,
going for the .zit/ solution and dropping support for .somefile.git/
means some significant code semplification.

-- 
Giuseppe "Oblomov" Bilotta
--

From: david
Date: Friday, October 24, 2008 - 12:53 pm

I just had what's probably a silly thought.

how close is a zit setup to a subproject setup?

David Lang
--

From: Giuseppe Bilotta
Date: Friday, October 24, 2008 - 1:06 pm

Honestly, I haven't the slightest idea how they work. My
understanding, which could be completely wrong, is that they are
full-fledged git repositories, and that additional metadata at the top
level takes care of understanding what ref is needed for each toplevel
project. If this is true, using them wouldn't simplify zit, but rather
make it more complex (and space intensive).

-- 
Giuseppe "Oblomov" Bilotta
--

Previous thread: Tip: avoiding net overhead using git over sshfs by Felipe Carvalho Oliveira on Wednesday, October 22, 2008 - 6:26 pm. (5 messages)

Next thread: What's cooking in git.git (Oct 2008, #05; Wed, 22) by Junio C Hamano on Wednesday, October 22, 2008 - 10:41 pm. (1 message)