Re: Status of all files (was: Re: How can I tell if a file is ignored by git?

Previous thread: [PATCH] Makefile: Remove excess backslashes from sed by Brian Gernhardt on Thursday, April 8, 2010 - 8:22 pm. (8 messages)

Next thread: [PATCH 0/3] send-email: --smtp-domain improvements by Brian Gernhardt on Thursday, April 8, 2010 - 10:11 pm. (12 messages)
From: Eric Raymond
Date: Thursday, April 8, 2010 - 9:04 pm

I'm planning some work on Emacs VC mode.

I need a command I can run on a path to tell if it's ignored by git.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report
--

From: Jacob Helwig
Date: Thursday, April 8, 2010 - 9:10 pm

What about a variant of:
    git ls-files -i -o --exclude-standard
--

From: Eric Raymond
Date: Friday, April 9, 2010 - 4:32 am

That will do nicely, thank you.

There could be something better.  Emacs VC mode, and other similar
front ends, would be greatly aided by a command that lists all files,
each with a status code it can understand.  Our canonical list
(omitting two that apply only to locking systems) is:

  'up-to-date        The working file is unmodified with respect to the
                     latest version on the current branch, and not locked.

  'edited            The working file has been edited by the user.

  'needs-update      The file has not been edited by the user, but there is
                     a more recent version on the current branch stored
                     in the master file.

  'needs-merge       The file has been edited by the user, and there is also
                     a more recent version on the current branch stored in
                     the master file.  This state can only occur if locking
                     is not used for the file.

  'added             Scheduled to go into the repository on the next commit.

  'removed           Scheduled to be deleted from the repository on next commit.

  'conflict          The file contains conflicts as the result of a merge.

  'missing           The file is not present in the file system, but the VC
                     system still tracks it.

  'ignored           The file showed up in a dir-status listing with a flag
                     indicating the version-control system is ignoring it,

  'unregistered      The file is not under version control.

The -t mode of ls-files appears to be almost what is wanted, but not quite.
(Among other things, it does not list ignored files.)  I request comment
on some related questions:

1. How do these statuses map to git terminology?  My tentative map, in terms 
of git file-list -t codes, is

up-to-date   = H?
edited       = C
needs-update = no equivalent
needs-merge  = no equivalent
added        = no equivalent
removed      = K
conflict     = no ...
From: Randal L. Schwartz
Date: Friday, April 9, 2010 - 5:11 am

>>>>> "Eric" == Eric Raymond <esr@thyrsus.com> writes:

Eric> There could be something better.  Emacs VC mode, and other similar
Eric> front ends, would be greatly aided by a command that lists all files,
Eric> each with a status code it can understand.  Our canonical list
Eric> (omitting two that apply only to locking systems) is:

A lot of these don't make sense for git and other DVCS.  How have
hg and bzr interpreted these "canonical" states?

For example:

Eric>   'needs-update      The file has not been edited by the user, but there is
Eric>                      a more recent version on the current branch stored
Eric>                      in the master file.

This makes sense only with a file-based VCS, not a tree-based VCS like
git.

Eric>   'needs-merge       The file has been edited by the user, and there is also
Eric>                      a more recent version on the current branch stored in
Eric>                      the master file.  This state can only occur if locking
Eric>                      is not used for the file.

Ditto.

Eric>   'removed           Scheduled to be deleted from the repository
Eric>   on next commit.

Not useful in git.

Eric>   'missing           The file is not present in the file system, but the VC
Eric>                      system still tracks it.

Not available in git.  (If it's not a real file, it can't be tracked. :)

Eric>   'ignored           The file showed up in a dir-status listing with a flag
Eric>                      indicating the version-control system is ignoring it,

Eric>   'unregistered      The file is not under version control.

These two would be identical in git.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
--

From: Eric Raymond
Date: Friday, April 9, 2010 - 6:20 am

That asks the question the wrong way around.  These state codes
are used to change how VC *itself* performs when you fire various
commands; the VCSes called by the VC back ends never have to
'interpret' them.

It is not expected that every VCS will report all of them; in
particular, as you say, some only make sense in locking systems.  
When VC knows it's dealing with a merging system, it will never go
down a logic path where a locking-related state is checked for.

I deleted two of the locking-system-only states from what you saw, but
may have missed others; I don't completely understand all the states,
because at least eleven other people hacked on VC during the 15 years
I was doing other things and added several that were not in my
original design.

(There is some excuse for this. Emacs VC is probably unique in that
its ontology has to be rich enough to accomodate *every VCS there
is*. Nothing else even attempts that, AFAIK.)

But to answer your question at least in part, here is a piece of code
mapping status codes from Mercurial's hg status -A command to Emacs
state codes.

    (when (eq 0 status)
        (when (null (string-match ".*: No such file or directory$" out))
          (let ((state (aref out 0)))
            (cond
             ((eq state ?=) 'up-to-date)
             ((eq state ?A) 'added)
             ((eq state ?M) 'edited)
             ((eq state ?I) 'ignored)
             ((eq state ?R) 'removed)
             ((eq state ?!) 'missing)
             ((eq state ??) 'unregistered)
             ((eq state ?C) 'up-to-date) ;; Older mercurials use this
             (t 'up-to-date)))))))

This is failing to report at least one interesting state, 
which is 'conflict.  But otherwise it looks pretty complete.

What I'm really looking for is a git functional equivalent of hg status -A.
The git backend presently uses diff-index and interprets the output in
a way that I fear is rather brittle.

I'm inclined to think you are right that 'need-update and ...
From: Junio C Hamano
Date: Saturday, April 10, 2010 - 12:07 pm

This isn't about file vs tree, but more about centralized vs distributed.
In DVCS workflows "needs-update" as a concept does not even exist when you
are working on a topic branch to perfect one thing and one thing only.
You do not want to update only because somebody else did some work that
may be totally unrelated to what you wanted to achieve on the current
branch.

I presume that many people use git in centralized workflow where they use
only 'master' branch and "git pull ; work ; git commit; git push" are the
only things they do.  In that setting, "needs-update" may make sense.  The
VC backend implementation has to do "git fetch" to see if the origin has
advanced.

Almost the same comment applies to 'needs-merge', but the VC backend not
only needs to worry about "file has been edited", but also "commits that



Ignored is a subset of Unregistered, no?  Neither exists in the index
(i.e. not tracked); ignored ones are covered by .gitignore and you need to
force "git add" to start tracking them.
--

From: Jakub Narebski
Date: Friday, April 9, 2010 - 5:56 am

There is also


In Git you don't have locking, but you have three versions: in the
working area (the working file), in the index, and latest version on
the current branch (the HEAD version).

So 'up-to-date in Git would probably mean working tree = cached = HEAD

Does this include stat-dirty files, i.e. if file has been modified
(mtime), but the contents is the same in working file and in HEAD

Needs *update* looks like it came from centralized VCS like CVS and
Subversion, where you use update-the-commit method.  You can't say
that HEAD version is more recent that working file...

The rought equivalent would be that upstream branch for current branch
(e.g. 'origin/master' can be upstream for 'master' branch) is in
fast-forward state i.e. current branch is direct ancestor of

This, like 'needs-update, looks like it is relevant only in

Note that with Git you can have other merge conflict than simple
CONFLICT(contents).  With CONFLICT(rename/rename) for example the file
would not contain textual conflict, so e.g. it won't have conflict

Note that file might be missing only in working directory, and can be


Probably 'conflict.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Eric Raymond
Date: Friday, April 9, 2010 - 7:02 am

Not documented in my installed version, 1.6.3.3.  Where can I go in the

Yes, that was what I thought.  Is this what ls-files is reporting as 'H'?  

(The ls-files -t codes need better documentation.  If I get detailed enough

No, it does not.  Thank you for asking that question, I have just
added a note about this to the VC code exactly where it will do the

Agreed. But there's no way to tell that this is the case without 
doing a pull operation or otherwise querying origin, and I'm
not going to do that.

Explanation: My general rule for DVCS back ends is that the status commands
aren't allowed to do network operations, and it's OK for them not to
report a state code if that would be required.  This is so VC will fully
support disconnected operation when the VCS does.

I have, however, added a note to vc-git.el explaining that this is
possible if we ever teach the mode front end to behave differently when

Following your previous logic, I think it would make sense to set this if 
we could detect that the upstream of the current branch has forward commits 
touching this file.  Again, this would require a network operation in the

It is unclear what Emacs wants in this situation; I will try to find out.
The documentation says this:

                     For now the conflicts are text conflicts.  In the
                     future this might be extended to deal with metadata
                     conflicts too.


That was my best guess too.  Can anyone say more definitely?
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Matthieu Moy
Date: Friday, April 9, 2010 - 7:23 am

http://thread.gmane.org/gmane.comp.version-control.git/126516

In short, "git ls-files -t" was written long ago, never tested, and
probably mostly used by no one. It has a very strange behavior, it's
not just the doc. I'd advise against using it.

"git status --porcelain" is probably what you want:

       --porcelain
           Give the output in a stable, easy-to-parse format for
           scripts. Currently this is identical to --short output, but
           is guaranteed not to change in the future, making it safe
           for scripts.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--

From: Eric Raymond
Date: Friday, April 9, 2010 - 9:24 am

It sounds very much to me as though this feature should be scheduled

Yes, this looks like what I would want, all right - if the status
codes were actually *comprehensible*! 

We should tackle this right now, because VC is not the last front end
that will need to parse the format and at least I am willing to patch
your docs based on what I learn.  Most of your other customers won't
do that.

I'm going to start a separate thread about this.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Daniel Grace
Date: Friday, April 9, 2010 - 4:18 pm

Eric,

I am working on a similar program (not ready for announcing yet). I
have not gotten to the part that would need this, but I would be happy
to start planning that stage and work with you to make sure that this
feature met both of our needs, and help write the documentation if
need be.

(Sorry for the double everyone in To/Cc, gmail defaulted to HTML email
and it was rejected from the list. I had to To/Cc you all again so
that Reply All from list members would work as expected.)

Daniel
http://www.doomstick.com


--

From: Eric Raymond
Date: Friday, April 9, 2010 - 8:35 pm

I'm willing to cooperate.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

From: Junio C Hamano
Date: Friday, April 9, 2010 - 9:52 am

It was added primarily for Cogito, which is presumably dead by now.
--

From: Jakub Narebski
Date: Friday, April 9, 2010 - 7:50 am

It was *documented* in git version 1.7.0 in 
  7c9f703 (commit: support alternate status formats, 2009-09-05)
I am running git version 1.7.0.1.

BTW. it is only since git 1.7.0 that "git status" is no longer
"git commit --dry-run"... and has sane behaviour wrt. specifying paths.


Actually it would not require network access, but it would require extra
work, and equivalents of 'needs-update and 'needs-merge would not exist
in all cases (in all situations).

In Git you have remote-tracking branches, which are tracking where 
branches in remote repository point to.  Since quite some time by default
the reside in 'refs/remotes/<remote>/' namespace, while ordinary local
branches in 'refs/heads/' namespace.  For example remote-tracking branch
'refs/remotes/origin/master', usually referred to in short as 
'origin/master', tracks (follows) branch 'master' ('refs/heads/master')
in remote 'origin'.  Those branches might be out-of-date with respect
to remote repository, and to update them you need network connection.

Local branches can be created to "track" other branches, to base work
on the other branches.  In particular you need to create local branch
which "tracks", or in other words has as 'upstream' some remote-tracking
branch, as you cannot work on non-local branch (outside 'refs/heads/'
namespace).

Now, *if* you are on branch with some upstream, you can check without
need for network operation whether "git pull" would do if there were
no new changes in remote, which means what "git merge <upstream>" would
do (pull = fetch + merge).

We can check if remote-tracking branch, which is upstream of current
branch, modified current file.  We can also check if remote-tracking
branch is in fast-forwardable state wrt. current branch (the equivalent
of 'needs-update state, I guess), or did remote-tracking branch diverged
from current branch (the equivalent of 'needs-merge state, I guess).
All this without need for network operation... but all this based on
current information that ...
From: Paolo Bonzini
Date: Saturday, April 10, 2010 - 3:12 pm

You can query the origin _as it was on the last fetch_.

If you are on branch X, the logic is as follows:

- Let R be the value of configuration key branch.X.remote,
- let M be the value of configuration key branch.X.merge,
- for all values S of configuration key remote.R.fetch,
   - strip an initial +
   - if S is M:N, return N
   - if S is P/*:Q/* where P is a prefix of M, take M, replace this
     prefix with Q and return the result

In the most common case you will have:

- X = master
- R = origin
- M = refs/heads/master
- one key S = +refs/heads/*:refs/remotes/origin/*

so the prefix "refs/heads/" is replaced with "refs/remotes/origin/" and 
the result is refs/remotes/origin/master.

Paolo
--

From: Jeff King
Date: Sunday, April 11, 2010 - 3:25 am

BTW, this procedure is complex enough that we have exposed it via a
plumbing interface:

  $ git for-each-ref --format='%(upstream)' refs/heads/master
  refs/remotes/origin/master

which does all of the correct magic internally.

-Peff
--

From: Ramkumar Ramachandra
Date: Thursday, April 8, 2010 - 9:50 pm

I personally use Magit [1]. Just thought you might want to look at it.

-- Ram

[1] http://zagadka.vm.bytemark.co.uk/magit/
--

From: =?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?=
Date: Thursday, April 8, 2010 - 10:01 pm

Eric might be a bit too personally invested vc.el at this point :)

But yeah, magit is great, unlike vc-dir and vc it makes really good
use of Git's index & stash features. Instead of staging individual
files for commit you stage chunks, the quality and granularity of my
commits has gone up since I switched to it from vc due to that.

But to help with the original question: magit has an ignore feature
but it doesn't check whether something is ignored, it just counts on
you not ignoring already ignored stuff because it isn't displayed to
you.

Depending on how you're planning to implement .gitignore support you
might want to go this route.
--

From: Eric Raymond
Date: Friday, April 9, 2010 - 3:50 am

Well, there's that, and then there's the fact that I really do use
multiple VCSes.  Consistent interface for all of them -> win. 
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
--

Previous thread: [PATCH] Makefile: Remove excess backslashes from sed by Brian Gernhardt on Thursday, April 8, 2010 - 8:22 pm. (8 messages)

Next thread: [PATCH 0/3] send-email: --smtp-domain improvements by Brian Gernhardt on Thursday, April 8, 2010 - 10:11 pm. (12 messages)