Re: [PATCH/RFC 0/3] Per-repository end-of-line normalization

Previous thread: Unable to commit with TortoiseGIT 1.4.4.0 for Windows by santos2010 on Wednesday, May 5, 2010 - 1:30 am. (4 messages)

Next thread: [GSoC update] git-remote-svn: Week 1 by Ramkumar Ramachandra on Wednesday, May 5, 2010 - 5:59 am. (5 messages)
From: mat
Date: Wednesday, May 5, 2010 - 3:01 am

Hi

I have two git projects:
-one (A) with linux people only
-one (B) with someone using windows

As we had "end of line" problems with the person using windows (B), I used:

git config --global core.autocrlf true

Following advices from:
http://help.github.com/dealing-with-lineendings/

So everything now if fine with project B, but now some problems using 
project (A): I wanted to copy the whole project file to another dir, and 
now it is complaining about the change, signaling warning:

CRLF will be replaced by LF in .../A.

So I don't know exactly what I should do...Should I change all the CRLF 
from project A, but people will have also problems, or can I switch the 
config, once I'm using project A and B? It is not so clear in my mind 
and I would appreciate any advice!!

Thanks a lot

Matthieu Stigler



--

From: Ramkumar Ramachandra
Date: Wednesday, May 5, 2010 - 6:27 am

Hi,


I'm not sure what you should be doing because I've never worked with
Windows, but the following information might be useful: Yes, you can
have project-specific config quite easily.

just drop `--global` and the setting becomes repository-specific.

-- Ram
--

From: mat
Date: Thursday, May 6, 2010 - 2:27 am

Thanks for your answer!!

I think what you suggest Ramkumar is indeed what I need, great! The 
suggestion from hasan to keep with those settings was not doable as the 
windows guy had the problem of that after even a clean cloning, git was 
signaling changes (see: http://help.github.com/dealing-with-lineendings/)

So I just did:

 git config --global --unset core.autocrlf

and then set for this specifical project:

 git config core.autocrlf true

Hope this is how you meant?

Thanks a lot!!

Matthieu


--

From: Erik Faye-Lund
Date: Thursday, May 6, 2010 - 3:03 am

This is a symptom that someone checked in files with CRLF into the
repo with core.autocrlf disabled, and the Windows guy having
core.autocrlf enabled.

I don't quite agree with Hasen about checking out LF on Windows,
though. There's just too many tools that gets slightly confused (as
well as some getting REALLY confused) by this in my experience. It's
sometimes the best trade-off, but quite often not IMO.

What I'd do, is to set core.autocrlf to "input" on non-Windows
machines, and "true" on Windows-machines. This makes sure that no
machines will check in CRLF. If there's already files checked in with
CRLF (as seems to be the case with your repo), the Windows-people will
be annoyed. So you'd need to make sure that the repo only contained
CRLFs, and you have basically two options:
1) Just call dos2unix on all files and commit the changes. This will
still cause problems for the Windows users if they need to check out
commits older than the dos2unix one.
2) Use git filter-branch to rewrite the history to pretend no one ever
made the mistake of committing CRLFs. This will make trouble for
anyone who's working on a branch. But it's a one-time issue (unless
someone manages to commit CRLF-files again, that is).

-- 
Erik "kusma" Faye-Lund
--

From: hasen j
Date: Wednesday, May 5, 2010 - 7:35 pm

I personally find that autocrlf causes more confusion than it solves problems.

I've yet to see a text editor on windows that can't handle \n line
endings. (Notepad doesn't count)

Just keep the project with \n line endings, disable autocrlf, and make
sure that people are aware of this.
--

From: Wilbert van Dolleweerd
Date: Thursday, May 6, 2010 - 12:29 am

Editors may handle it gracefully but older Windows programs will have problems.

For instance, Visual Studio 6 will barf on Visual Basic projectfiles
with non-windows line-style endings. (And please don't ask why I know
this....)

-- 
Kind regards,

Wilbert van Dolleweerd
Blog: http://walkingthestack.blogspot.com/
Twitter: http://www.twitter.com/wvandolleweerd
--

From: hasen j
Date: Thursday, May 6, 2010 - 8:34 am

Well, this is the exception that proves the rule then :)

Anyway, If it's a VB project, might as well just keep the files with
CRLF endings then.

I don't know all linux editors, but I've yet to see one that can't
handle CRLF endings.
--

From: Linus Torvalds
Date: Thursday, May 6, 2010 - 10:15 am

A _lot_ of UNIX editors will handle CRLF endings, but if you change a 
file, they often write the result back with _mixed_ endings. Some will 
also show the CR as '^M' or some other garbage at the end.

A number of tools will also end up confused, including very fundamental 
things like "grep". Try this:

	echo -e "Hello\015" > f
	grep 'Hello$' f

and notice how the grep does _not_ find the Hello at the end of the line, 
because grep sees another random character there (this might be 
unportable, I could easily imagine some versions of grep finding it).

So I would strongly suggest against CRLF on UNIX. It really doesn't work 
very well, even if some tools will handle it to some limited degree.

In short: having 'core.autocrlf' set will likely make it much more 
pleasant to work across different platforms. 

			Linus
--

From: Erik Faye-Lund
Date: Thursday, May 6, 2010 - 10:26 am

On Thu, May 6, 2010 at 7:15 PM, Linus Torvalds

Just for completeness: The inverse is also the case on Windows; a lot
of editors will handle LF endings, but a handful of them will insert
gladly insert CRLFs under certain circumstances. Microsoft Visual
Studio is one of these.

So yeah, neither CRLF or LF everywhere is generally a good idea.

-- 
Erik "kusma" Faye-Lund
--

From: hasen j
Date: Thursday, May 6, 2010 - 1:00 pm

When I'm on windows, I prefer LF (unless the project already uses
CRLF, or it's outside my control).

VB is very windowsy; I *really* doubt most VB developers use (or even
know) grep, so I don't think it's a problem if a VB project
standardizes line endings to be CRLF.

My problem with autocrlf is that, well, it converts line endings in
the working directory to CRLF, even though I don't always want it to.
(most of the time, I don't).

The other problem is, git will get confused if you set autocrlf *after
the fact*; i.e. you already cloned and have the files checked out,
maybe even made some commits.

Overall, I ran into many awkward situations with autocrlf (and I can't
remember them now), but if you google you can find some of the issues
people are having.

The whole problem would go away if there was no crlf, and that's not
impossible: any decent text editor can read/write files with Unix line
endings.

I wasn't aware that Visual Studio doesn't have an easy way to have it
write LF endings by default; I'm sure there are addons to make that
easier. Plus most open source projects are not usually setup with VS
as the development environment anyway, so it's really not a big
problem.

So yeah, I think LF everywhere is the better way to go most of the time.
--

From: Linus Torvalds
Date: Thursday, May 6, 2010 - 1:23 pm

You can just set it to 'input' if you want to. It's not just on/off, you 
can also say "I want to check out with no conversion (ie "just LF"), but 
convert CRLF to LF on input".

Btw, one thing to keep in mind with autocrlf is the "auto" part: it tries 
to do a good job noticing when something is text vs binary, but it _is_ a 
heuristic. I think it's a pretty good one, but if you do set autocrlf 
(whether to "true" or to "input"), at least think about attributes ("man 
gitattributes")

		Linus
--

From: Erik Faye-Lund
Date: Thursday, May 6, 2010 - 1:40 pm

"When I'm on windows" leads me to believe Windows is not your primary


core.autocrlf being on by default in Git for Windows greatly reduces
the risk for this. I with core.autocrlf was set to "input" by default

That's probably on of the things that makes a text-editor decent in
your book, but this opinion might not be shared with everyone. Perhaps

The problem with Visual Studio isn't that it doesn't write LFs
normally... the problem is that when you paste text, it retains the
newline style from the source you copied from. But it is not the only
tool with such issues, so playing the "VS is the problem"-card doesn't
stick IMO.

Even if it did, Open source isn't the only model for developing
software. And again... even if it were, working well together with
visual studio support would be very beneficial for quite a bit of
projects. Visual Studio is probably the most used code-editor among
Windows-developers (with a good margin too, I suspect), so ignoring it
is would just be sticking your head in the sand - or worse, asking for
less contributions from Windows-users (which can often be a problem in
the first place).

So no, I strongly doubt LF everywhere is the better way ;)

-- 
Erik "kusma" Faye-Lund
--

From: hasen j
Date: Thursday, May 6, 2010 - 3:14 pm

I used to be, I only moved to linux about a year ago, but I use

But it's probably the most common scenario where people run into line
ending issues.

If the project is a VS project, then it's probably not multi-platform,
plus everyone at the company would be using windows anyway, so there's

The problem can be avoided with a little bit of education. VS is not a
multiplatform IDE anyway
Sure, it can't work with LF endings as well as notepad++, but it's not
git's responsibility to try to fix that.

I just don't think it's a big enough issue to be built into git.

IMHO it's much better to work around the problem (if and when it
arises) by using clean and smudge filters in .gitattributes, than
having it built in and enabled by default in the msysgit installer.
--

From: Erik Faye-Lund
Date: Thursday, May 6, 2010 - 4:25 pm

Closed source does not imply a single operating system, and you get
these issues whenever you have a project with targets systems with
different newline style. In my day job I develop closed source,
multi-platform software, using git. So it's certainly not MY most
common scenario.

And even if it were, so what? When did we start only caring for the

Using VS on Windows does not exclude other platforms either. Either
one can maintain multiple build-systems for Windows and Unix-y
systems, or one can use a system like CMake that automate the job.

A typical case where you pretty much have to build using Visual Studio
is when you develop a C++ library, where your Windows users use Visual
Studio (due to C++' symbol-mangling you have to use the same
compiler). This is not an entirely uncommon situation for open source

Again, using VS on Windows does not exclude other platforms. I'm not
sure what you mean with "a little bit of education" here, though.

CRLF is Windows' native newline style. If git can't check out to that,
it'll look like a lot less attractive solution to anybody that targets
Windows compared to the competition. If it wasn't for core.autocrlf, I

But it IS built in. And it's very unlikely that this feature will ever
be removed. So what's the problem with using it?

And it's a very common thing to want to do, so why make everybody who
does have to jump through hoops just because YOU don't need it?

-- 
Erik "kusma" Faye-Lund
--

From: Anthony W. Youngman
Date: Tuesday, May 18, 2010 - 8:13 am

In message 
<o2v40aa078e1005061625md5fede79h660a22227c4f22d1@mail.gmail.com>, Erik 

And there's a lot more line endings out there than just lf or crlf.

Okay, the two I'm about to quote have, I believe, gone the way of the 
dinosaur, but wasn't the mac just cr? And what is *still* my favourite 
system, Prime (a multics derivative too), used a "packed lf", so your 
line ending could be either lf or lfnull depending on the line length 
(it was always stored on disk as an integral word-length, a word being 
16 bits. So if your text was an even number of characters, the ending 
was lfnull to pad it to the next word boundary).

Cheers,
Wol
-- 
Anthony W. Youngman - anthony@thewolery.demon.co.uk

--

From: Eyvind Bernhardsen
Date: Thursday, May 6, 2010 - 3:27 pm

This discussion couldn't be more timely, as I've recently acquired a
desperate need to solve CRLF problems at $dayjob.  This patch series
introduces a new way of turning on autocrlf normalization by splitting
the configuration into two:

- An attribute called "auto-eol" is set in the repository to turn on
  normalization of line endings.  Since attributes are content, the
  setting is copied when the repository is cloned and can be changed in
  an existing repository (with a few caveats).  Setting this attribute
  is equivalent to setting "core.autocrlf" to "input" or "true".

- A configuration variable called "core.eolStyle" determines which type
  of line endings are used when checking files out to the working
  directory.

How does this solve the current problems with core.autocrlf?  First,
let's enumerate them:


1. Setting core.autocrlf in your global or system configuration is a
pain since git will get confused whenever you work in a repository which
contains CRLF line endings.  If you have to work in both repositories
with normalization and repositories with mixed line endings, you have no
choice but to set core.autocrlf in each repository individually.

2. Setting core.autocrlf in an individual repository would be okay
except that naive users will do it after they have already cloned:
unless core.autocrlf is set globally, the clone will have the wrong line
endings, and the user needs to know how to refresh it manually (rm -rf *
&& git checkout -f).

3. Once somebody does it, _everyone_ has to do it: if someone checks in
a file with CRLFs, that file will cause trouble for everyone who has
autocrlf set.  That someone can be a Linux user who just copied a file
from Windows and didn't think to convert the line endings (BT, DT).

4. Once a repository contains CRLFs autocrlf can never sanely be
enabled; the CRLFs can be normalized in a commit, but there's no way to
say "all commits after this one are normalized, those that came before
were not".

5. On the other ...
From: Eyvind Bernhardsen
Date: Thursday, May 6, 2010 - 3:27 pm

Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
 t/t0025-auto-eol.sh |  180 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 180 insertions(+), 0 deletions(-)
 create mode 100755 t/t0025-auto-eol.sh

diff --git a/t/t0025-auto-eol.sh b/t/t0025-auto-eol.sh
new file mode 100755
index 0000000..5acee2d
--- /dev/null
+++ b/t/t0025-auto-eol.sh
@@ -0,0 +1,180 @@
+#!/bin/sh
+
+test_description='CRLF conversion'
+
+. ./test-lib.sh
+
+has_cr() {
+	tr '\015' Q <"$1" | grep Q >/dev/null
+}
+
+test_expect_success setup '
+
+	git config core.autocrlf false &&
+
+	for w in Hello world how are you; do echo $w; done >one &&
+	for w in I am very very fine thank you; do echo ${w}Q; done | q_to_cr >two &&
+	git add . &&
+
+	git commit -m initial &&
+
+	one=`git rev-parse HEAD:one` &&
+	two=`git rev-parse HEAD:two` &&
+
+	for w in Some extra lines here; do echo $w; done >>one &&
+	git diff >patch.file &&
+	patched=`git hash-object --stdin <one` &&
+	git read-tree --reset -u HEAD &&
+
+	echo happy.
+'
+
+test_expect_success 'default settings cause no changes' '
+
+	rm -f .gitattributes tmp one two &&
+	git read-tree --reset -u HEAD &&
+
+	if has_cr one || ! has_cr two
+	then
+		echo "Eh? $f"
+		false
+	fi &&
+	onediff=`git diff one` &&
+	twodiff=`git diff two` &&
+	test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'no auto-eol, explicit eolstyle=native causes no changes' '
+
+	rm -f .gitattributes tmp one two &&
+	git config core.eolstyle native &&
+	git read-tree --reset -u HEAD &&
+
+	if has_cr one || ! has_cr two
+	then
+		echo "Eh? $f"
+		false
+	fi &&
+	onediff=`git diff one` &&
+	twodiff=`git diff two` &&
+	test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_failure 'auto-eol=true, eolStyle=crlf <=> autocrlf=true' '
+
+	rm -f .gitattributes tmp one two &&
+	git config core.autocrlf false &&
+	git config core.eolstyle crlf &&
+	echo "* auto-eol" > .gitattributes &&
+	git read-tree --reset -u ...
From: Eyvind Bernhardsen
Date: Thursday, May 6, 2010 - 3:27 pm

Introduce a new attribute called "auto-eol" and a config variable,
"core.eolStyle", which will enable line ending normalisation using the
autocrlf mechanism.

The intent is to enable autocrlf in an alternative way, splitting the
existing "core.autocrlf" config variable into two:

- a per-repository "line endings should be normalised in this
  repository" setting, activated by setting the auto-eol attribute
  (usually on all files in the repository)

- a config variable, "core.eolStyle" which lets the user decide which
  line endings are preferred in the working directory

Possible values for "core.eolStyle" are:

- "lf", meaning that LF line endings are preferred
- "crlf", meaning that CRLF line endings are preferred
- "native" (the default), crlf or lf according to platform
- "false", which disables end-of-line conversion even when auto-eol is
  set

"core.autocrlf" will override auto-eol when set to anything but "false".

Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
 Makefile      |    3 +++
 cache.h       |   19 +++++++++++++++++++
 config.c      |   16 +++++++++++++++-
 environment.c |    1 +
 4 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 910f471..419532e 100644
--- a/Makefile
+++ b/Makefile
@@ -224,6 +224,8 @@ all::
 #
 # Define CHECK_HEADER_DEPENDENCIES to check for problems in the hard-coded
 # dependency rules.
+#
+# Define NATIVE_CRLF if your platform uses CRLF for line endings.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -989,6 +991,7 @@ ifeq ($(uname_S),Windows)
 	NO_CURL = YesPlease
 	NO_PYTHON = YesPlease
 	BLK_SHA1 = YesPlease
+	NATIVE_CRLF = YesPlease
 
 	CC = compat/vcbuild/scripts/clink.pl
 	AR = compat/vcbuild/scripts/lib.pl
diff --git a/cache.h b/cache.h
index 5eb0573..690511e 100644
--- a/cache.h
+++ b/cache.h
@@ -561,6 +561,25 @@ enum safe_crlf {
 
 extern enum safe_crlf safe_crlf;
 
+enum auto_crlf {
+	AUTO_CRLF_FALSE = ...
From: Eyvind Bernhardsen
Date: Thursday, May 6, 2010 - 3:27 pm

Implement an alternative end-of-line conversion setting which uses a new
attribute, "auto-eol", and a new config variable, "core.eolStyle" to
enable end-of-line conversion.

The auto-eol attribute enables automatic line ending detection and
conversion for files on which it is set.  Since attributes are under
version control, this setting is copied when the repository is cloned.
It can also be changed over the history of a repository, with some
caveats.

The core.eolStyle variable is used to decide if LF or CRLF line endings
are preferred in the working directory.  It is only used when auto-eol
is set, and defaults to the platform-native line ending.

"core.autocrlf" overrides auto-eol when set to anything but "false".

Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
 Documentation/config.txt        |   11 ++++-
 Documentation/gitattributes.txt |   92 +++++++++++++++++++++++++++++++++------
 convert.c                       |   48 ++++++++++++++------
 t/t0025-auto-eol.sh             |    4 +-
 4 files changed, 123 insertions(+), 32 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 92f851e..7bbf8a0 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -207,9 +207,16 @@ core.autocrlf::
 	the file's `crlf` attribute, or if `crlf` is unspecified,
 	based on the file's contents.  See linkgit:gitattributes[5].
 
+core.eolStyle::
+	Sets the line ending type to use for text files in the working
+	directory when the `auto-eol` property is set.  Alternatives are
+	'lf', 'crlf', 'native' and 'false'.  'native', the default, uses
+	the platform's native line ending.  'false' disables `auto-eol`
+	line ending conversion.  See linkgit:gitattributes[5].
+
 core.safecrlf::
-	If true, makes git check if converting `CRLF` as controlled by
-	`core.autocrlf` is reversible.  Git will verify if a command
+	If true, makes git check if converting `CRLF` is reversible when
+	end-of-line conversion is active.  Git will ...
From: Avery Pennarun
Date: Thursday, May 6, 2010 - 4:38 pm

On Thu, May 6, 2010 at 6:27 PM, Eyvind Bernhardsen

I definitely like this.  The existing core.autocrlf setting does cause
a lot of confusion for precisely the reason you stated: people often
forget to set it until *after* they've checked out the repo, at which
time all the files are already checked out wrong and total confusion
ensues.

Being able to globally set my preferred eol style in one place, but
only have it take effect on projects (and individual files in that
project) that we already know have eol constraints, would be
wonderful.

Of course this new feature would be in addition to the existing
core.autocrlf setting, not replacing it.

This would definitely help our Windows users at work.

Have fun,

Avery
--

From: Avery Pennarun
Date: Thursday, May 6, 2010 - 4:54 pm

Oh, just to clarify the rationale a bit more:

Whether a developer wants autocrlf or not actually is
project-dependent, not user-dependent or "all Windows users want
autocrlf."  For example, if I'm running Cygwin and I checkout a copy
of the git source code to build with Cygwin gcc, I definitely don't
want autocrlf.  (Actually, almost always, for C source code I don't
want autocrlf, or I want autocrlf=input.)

If I'm checking out a copy of our Delphi project on Windows, though, I
need autocrlf or the IDE goes bananas.  And our team would be happy to
put the right magic incantation in a .gitattributes file in our Delphi
project if it would make this work out automatically.

Setting core.autocrlf on one of our Windows developers' systems can't
cover both of those cases automatically, whereas the settings Eyvind
has proposed would solve our problem.

Have fun,

Avery
--

From: Erik Faye-Lund
Date: Friday, May 7, 2010 - 1:45 am

On Fri, May 7, 2010 at 12:27 AM, Eyvind Bernhardsen

Beautiful! This approach addresses most (all?) issues I've had with
core.autocrlf in a very elegant way IMO! :)

-- 
Erik "kusma" Faye-Lund
--

From: Junio C Hamano
Date: Friday, May 7, 2010 - 9:33 am

In what way is this attribute different from existing "crlf" attribute?

It feels as if this series is fixing shortcomings of the combination of
core.autocrlf configuration and crlf attribute while trying very hard to
keep their shortcomings when the user doesn't say so.  What is the
downside of making the existing "core.autocrlf" + "crlf" combination do
what your patch wanted to do without retaining this "keep the existing

This is a wrong thing to do to begin with, and not worth discussing.  You
know and your readers know that line ending convention in the repository
data (i.e. blobs) is under project control while line ending convention in

This may be a worthy goal.  But if a "auto-eol" attribute "fixes" this,
perhaps "crlf" attribute can be taught to fix it the same way, no?
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 9:57 am

Mostly that it relates to the new core.eolStyle config option instead
of core.autocrlf.  Arguably you could use the same gitattribute to set
both config options, but I don't know how you'd make that respond in a

Is this even possible?  If core.autocrlf is set, then files all over
the place start getting crlf conversion, even if no attributes are set
at all.  If core.eolStyle is set, only files with the auto-eol
attribute set appropriately will experience any conversion.

Maybe the options aren't named ideally.  "core.eolStyle" might better
be named "core.nativeEol" - it tells git what the native EOL style is
on your computer / in this repository, but it doesn't tell git to *do*
anything with this information.  The problem with core.autocrlf is
that it mixes two concepts: identifying your native EOL style, and
telling git to do stuff.  The existing gitattribute can then tell git
*not* to do stuff, but almost no projects have a .gitattributes file

Ha, doesn't msysgit do this by default?  It did at one point, anyway.
I use cygwin git (which doesn't because it thinks it's Unix) so I
don't know.

If this was ever the default behaviour, then it's at least not
*obviously* wrong.

The end result is that nobody really likes the current autocrlf
behaviour, though, so I'd agree that it *ends up* being wrong.  Just
as setting it on a per-checkout basis also ends up being wrong,


It fixes it by making the global setting actually do what people want.
 I'm not sure the existing config option can be made to work like
that.

Again, maybe it would make sense to combine a single attribute but
have two config options (and people can eventually just stop using
core.autocrlf altogether).  I suspect it might subtly break some
existing projects, though.

Have fun,

Avery
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 10:10 am

The existing crlf attribute is a no-op _unless_ core.autocrlf is set, 
isn't it?

The whole point of Eyvind's series is to be able to set crlf attributes 
without having to set the config option - because he wants to make sure 
that a new clone always gets the proper crlf handling without users 
having to do anything extra.

And I do have to say that it makes sense.

I also do think that maybe we could just change the existing crlf 
attribute to work even without 'core.autocrlf'. 

			Linus
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 12:02 pm

Btw, another option might be to start searching ".gitconfig", but only 
allow a certain "safe subset" of config options in that. Things that can 
really be about the project itself, and not per-user or per-repository.

And parse it before ~/.gitconfig and .git/config, so that people can 
always override it.

I dunno. Looking at the config options, there really aren't a lot of them 
that make sense on a project scale. There's a few, though. Things like

	core.autocrlf
	i18n.commitEnconfig

and possibly others..

		Linus
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 12:11 pm

On Fri, May 7, 2010 at 3:02 PM, Linus Torvalds

Unfortunately this option wouldn't be as flexible as Eyvind's current proposal.

What his method allows is to mark some files in a project as "these
should be the native EOL style" and others as "these should be left
alone."  Then each person can set a (usually global) config option
that states what the native EOL style should be.  Like core.autocrlf,
only it wouldn't affect projects without crlf attributes (like git.git
or linux.git) where CRLF translation is pretty much always wrong.
(And if one person disagrees that it's always wrong, well, he can
always set core.autocrlf for himeself.)

Have fun,

Avery
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 12:16 pm

But that's what a .gitconfig would too. We _already_ have that 
.gitattribute thing to then distinguish particular pathname rules. It's 
just that currently .git/config is needed to _enable_ it.

			Linus
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 12:35 pm

On Fri, May 7, 2010 at 3:16 PM, Linus Torvalds

Hmm, I don't think we're saying the same thing.  There are two
separate settings here:

1) Whether a project has files that should be EOL-converted
automatically (we seem to all agree that this is set in
.gitattributes, whichever attribute is used).

2) Whether a particular person wants those particular files to be
EOL-converted, and what to convert them to.

The existing semantics of core.autocrlf just don't let you express #2
in a useful way.  If I set --global core.autocrlf, it turns it on for
*all* projects, not just ones with the .gitattribute set.  If a
project has a .gitconfig inside that sets core.autocrlf, then it's
really just redundant with #1.  If I set .git/config on a particular
project, it works, but it's far too easy to forget (and there seems to
be no way to set this per-project at clone time, and setting it
*after* cloning causes git's index to get confused).

Eyvind's proposal is deceptively simple because it simply makes it
much less error prone for users to express something that's already
*technically* possible, but in practice, is very very frequently done
wrong.

Avery
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 12:45 pm

So? If we were to have a .gitconfig file, then both of those things would 
just work. It's no different from Eyvind's patch, except the exact details 
on syntax (and which file to set) would differ slightly.

So it's a syntactic difference, nothing more.

That said, I don't think the extra .gitconfig is even worth it, the same 
way I do _not_ think Eyvind's extra .gitattributes things are worth it. We 
already have perfectly good .gitattributes, and the only real issue is 
that they just don't take effect in some situations where people would 
_want_ them to take effect.

So just a small semantic change to how .gitattributes crlf works would 
likely make everybody happy.

The only downside is that it _is_ a semantic change. It really would 
change existing git behavior. Now, I think most people would consider the 
change in behavior to be a clear improvement, but hey...

			Linus
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 12:58 pm

On Fri, May 7, 2010 at 3:45 PM, Linus Torvalds

No!  The whole point is that each user *does* still want to be able to
decide how to convert the files tagged by the crlf gitattribute (or a
new attribute, I don't care).  Setting this in a .gitconfig file
inside the project is pointless; I need it in my *personal* config.
msysgit users want to set it globally to CRLF by default, Linux or
cygwin users probably want to set it to LF by default.

So #1 is useful to have in the repo, #2 is not.

I am a real live example of this.  For our Delphi projects at work, I
want to check it out with LF on my Linux machine (so I can
patch/diff/merge/grep/edit/etc easily), and CRLF on my Windows machine
(so that the Delphi IDE doesn't get confused).  Other projects I want
to have pure LF on both Linux and Windows, so setting
core.autocrlf=true globally will break things.

Eyvind's proposal (or a similar proposal where his new attribute is
just the crlf attribute) will get me and all my co-workers the
wonderful correct behaviour *by default*; the current behaviour, or an
in-repo .gitconfig, will not.  The key feature is the new

Do you even use any CRLF projects?  If not, then presumably none of
the options will seem worth it. :)

But the current behaviour really doesn't work for people who need CRLF
conversion, and an in-repo .gitconfig file won't help them.
core.eolStyle + a change to crlf attribute semantics will.

Have fun,

Avery
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 1:06 pm

Avery, you really don't _get_ it, do you?

If you want to set how the autocrlf conversion would be done, JUST DO IT. 
The .gitconfig file would be overridden by your personal settings.

So what you'd have is

 .gitconfig: core.autocrlf=true	# to enable .gitattributes

but then any .git/config setting (to "input", say) would still override 
that repository setting.

End result: exactly what you're talking about. With _simpler_ syntax than 
the one Eyvind had.

Now, the thing is, we can go for even simpler syntax still, by just making 
that ".gitconfig: core.autocrlf=true" entirely unnecessary. 

		Linus
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 1:17 pm

Exact semantics I'd suggest for 'core.autocrlf':

    Setting		path in .gitattributes	path _not_ in .gitattributes
    =======		======================	===========================
 - not set at all	attribute value		no crlf
 - "off"/"false"	no crlf			no crlf
 - "on"			attribute value		autocrlf	
 - "input"		attribute "input"	autocrlf "input"

Which is different from what we do now for the "not set at all" case, 
in that it still takes the .gitattributes value for those cases if a path 
matches.

We could add a few core.autocrlf entries, like "force" (to force output to 
be CRLF even on a platform where it isn't the default).

			Linus
--

From: Eyvind Bernhardsen
Date: Friday, May 7, 2010 - 1:42 pm

How can you say that this is simpler than my syntax?  I have an attribute that means "line endings should be normalised" and a configuration variable that decides what line endings should be used in the working directory for normalised files.  If you like CRLFs you set it to "crlf", if you like LFs you set it to "lf".

I'll replace "auto-eol" with something like "crlf=auto" because I actually think that's pretty neat, but I won't pretend that "true" and "input" are sane ways to indicate if you prefer CRLF or LF line endings in your working directory.
-- 
Eyvind

--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 1:57 pm

Because your syntax adds totally new attributes, so now you can't even 
take an existing .gitattributes and make it do something sane - instead 
you have to write totally new rules.

My suggestion just makes any existing usage do the "what you'd expect".

THAT is simpler.

		Linus
--

From: Eyvind Bernhardsen
Date: Friday, May 7, 2010 - 2:17 pm

Well, sort of, but "simple for someone who already knows how core.autocrlf works" isn't what I'm aiming for :)
-- 
Eyvind

--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 2:23 pm

I think "* auto-eol=true" is just crazy. We would _never_ want to do that. 
Any project that does that should be shot in the head.

So encouraging that as a format is just silly and stupid.

In contrast, the slight change in semantics (with no new config options 
_or_ attributes) that I suggest should just make everybody happy - because 
it takes care of the real life situation that people are in.

		Linus
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 2:30 pm

On Fri, May 7, 2010 at 5:23 PM, Linus Torvalds

In the interests of further making myself look like an idiot:

Just to clarify, is it crazy because that line would convert all
files, even binary ones, where core.autocrlf auto-detects whether
files are binary or text?

Avery
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 2:54 pm

No, presumably 'auto-eol' does the same auto-detection. Otherwise the name 
wouldn't make sense.

I just think that it's crazy because

 (a) you should try to avoid do things like that in the first place. For 
     something like an attribute file, you should just list the files you 
     want to convert. That's the _point_ of an attribute. So it's much 
     nicer if you instead actually are explicit about it, ie

	*.[ch] crlf
	*.txt crlf
	*.jpg -crlf

     should be the _primary_ way you do it, since the autocrlf thing is a 
     bit dangerous in theory.

 (b) But let's say that you want to do it anyway (because you're lazy 
     and because autocrlf works pretty damn well in practice), isn't that 
     a really ugly and crazy thing to add _another_ attribute name for 
     that?

     IOW, if you really want to say "do automatic crlf for this set of 
     paths", the natural syntax for that would be

	* crlf=auto

     No? Not some totally new attribute name.

And in the end, you always do want to have a config variable for the 
actual type of conversion. And like it or not, we already do end up having 
this mix-up between .gitattributes and git "core.autocrlf" config entry, 
so my suggested rule was kind of a "minimally invasive" suggestion to just 
turn that mixing of attributes and config entries into something more 
practically useful.

		Linus
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 3:14 pm

Btw, since we're discussing this, I do think that our current "crlf=input" 
syntax for .gitattributes is pretty dubious. 

I don't really see why it should be a path-dependent thing on whether you 
do crlf conversion on just input or on checkout too.  It smells odd. It 
makes more sense to me to have a global policy for what the output/input 
conversion should be, and then the path rules are just about whether that 
conversion gets done or not.

And like it or not, we called that global rule "autocrlf", and then mixed 
it up with the decision on whether we should do conversion at all. I do 
think that that was a mistake too, and that we could try to fix it, but I 
also think that's a fairly independent issue.

So we _could_ introduce a new "core.crlf" config option that talks purely 
about what kind of conversion gets done - not about _whether_ it gets 
done. So you could do

	[core]
		crlf=input

and it would imply that crlf conversion is only done on input, but it 
would differ from "autocrlf=input" in that it would _not_ imply that any 
paths not matched by gitattributes crlf rules would be automatically 
converted.

[ And in the above model, "core.autocrlf = input" would just be a 
  shorthand for saying "core.autocrlf=true" + "core.crlf=input")

So I think we could improve the config file syntax a bit.

But I think that's really a separate issue from the .gitattributes file, 
and whether the "crlf" attribute means anythin in the _absense_ of any 
config file rules about crlf.

			Linus
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 3:34 pm

On Fri, May 7, 2010 at 6:14 PM, Linus Torvalds

Me neither.  However, in the name of sanity, it sure would be great to
have the global configuration options exactly parallel the per-project
and per-file configuration options.  From that point of view, 'input'
exists just to keep things nice and symmetrical.  And considering how
complicated this discussion already is (compared to what a simple
concept CRLF conversion is), that's probably worth something in
itself.

Part of the confusion comes from the way the options are currently
declared.  set vs. unset vs. unspecified vs. "input" vs. "auto" for an
option named "crlf" is just very, very, unfriendly.  None of the words
*mean* anything.

Maybe we should rethink this from the top.  Imagine that we currently
have no crlf options whatsoever.  What *should* it look like?  I
suggest the following:

Config:
   core.eolOverride = lf / crlf / auto / binary / input
   core.eolDefault = lf / crlf / auto / binary / input

Attribute:
   eol = lf / crlf / auto / binary / input

If eolOverride is not "auto" or unspecified, we ignore eolDefault or
any attributes.

If the attribute is not "auto" or unspecified, we ignore eolDefault.

For all entries, unspecified is equivalent to "auto".

Of course the eol attribute could be named "crlf", but that might not
increase the sanity as much as we would like.

And "input" means "auto, but strip CR when committing."  Or maybe the
problem is that it doesn't belong here at all: maybe it should be an
entirely separate attribute that takes effect whenever the eol
attribute/config resolves to "auto."

Or maybe I'm just not thinking about it the right way?

Avery
--

From: hasen j
Date: Friday, May 7, 2010 - 3:54 pm

If we forget everything git has now, I would suggest the following:

- eol-normalization is per repository, per filetype (fnmatch filter)
- in a file separate from .git/config, such as .git/eol
- when you clone, you get this file

You specifies the 'standard' eol type for each file type in this project:

    *.c lf
    *.python lf
    *.vb crlf
    *.sln crlf
    etc (something like that)

committing and checking-out always normalize line endings; *always*

add (and commit) can take an option to keep eol as-is (i.e.
--no-eol-normalization or --keep-eol or --raw-eol)

In this model:

1- Anyone who clones gets the repository eol settings
2- No one can possibly commit in a different eol style unless he
explicitly says he wants to.
3- Naturally, eol-normalization doesn't apply to binary files

#2 is important, it's needed so you won't have someone making bad
commits because he has a settings some where in his global config to
always ignore eol normalization.
on the other hand, one can alias 'add --raw-eol' to something like
'eviladd', so he can do 'git eviladd file.c', which is fine because
it's explicit.

This would get rid of issues where an editor (such as VS) saves a file
with mixed line endings: we don't care because we normalize them.

This would also make it more transparent to windows users: they don't
even have to think about eol issues; they can't make bad commits
"by-accident". (provided the repo maintainer has set the eol filters
properly).

I have no idea what happens (or should happen) if the origin repo
maintainer updates the .git/eol file. Maybe it should be .giteol
instead of .git/eol
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 4:18 pm

Ugh. Hell no. What an ugly format. What does that crazy "override vs 
default" even _mean_?

So no.

Plus the above is confused anyway. The only reason to ever support 'lf' is 
if you're a total moron of a SCM, and you save files you know are text in 
CRLF format internally. That's just f*cking stupid.

So the above is just crazy talk.

The options that make sense is:

 - disabling all "text" issues, and considering everything to be pure 
   binary. This is the "I know I'm sane and unix" option, or the "doing 
   any conversion is always wrong" option.

   We'd call this "binary" or "off" or "false".

 - if you recognize a text-file, and consider it text and different from 
   binary, at a _minimum_ it needs what we call "input". Anything else is 
   crazy-talk. We don't save the same text-file in different formats, and 
   we know that CRLF (or CR) is just a stupid format for text.

   So there are zero options for the input side. If we don't do CRLF -> LF 
   conversion on input, it's worthless even _talking_ about text vs binary.

 - For output, there are exactly three choices: "do nothing" (aka just 
   "input", aka "LF"), output in native format (CRLF on Windows, LF on 
   UNIX), or "force CRLF" regardless of any defaults (and the last 
   probably doesn't make sense in practice, but is good for test-suites, 
   so that you can get CRLF output even on sane platforms.

So I think the _only_ sane choices are basically

	core.crlf=[off|input|on|force]

where you may obviously have aliases (ie "off", "false" and "binary" could 
all mean the same thing, and you could alias "input" to "lf" and "force" 
to "crlf").

And the above is basically what we have. Except that for historical 
reasons (ie we didn't even _have_ any attributes) it got mixed it up with 
"do we want to do this automatically", so "autocrlf=on" actually ends up 
being "yes, do automatic detection" _and_ what I'd call "core.crlf=force" 
above.

			Linus
--

From: hasen j
Date: Friday, May 7, 2010 - 4:47 pm

What if:

- The entire history of the file is stored in CRLF
- It's a windows-only file where the official "tool" that reads it
barfs on LF line endings.
- Third party tools also expect (or at least, handle) CRLF line endings.

Even if you end up deciding to store it with LF line endings
internally, it should still be *always* checked out with CRLF endings.

And no, just because I want certain files to be checked out with CRLF
endings, doesn't mean that I want all files to be checked out that
way. This is one of the areas where git's crlf handling is lacking
right now.

Also, git-diff should ignore eol differences by default, unless
explicitly asked not to (currently it's the other way around).
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 4:50 pm

Umm. Then it's not text, is it? What you are describing is a binary file 
that happens to look like text with CRLF.

If it's _text_, then you import it as such, and set crlf=true so that it 
gets checked out with crlf.

		Linus
--

From: hasen j
Date: Friday, May 7, 2010 - 5:19 pm

That depends on your definition of text.

Storing it with LF internally is ok, as long as we can have it

It should be the repository maintainer's responsibility to tell git to
always checkout that file with crlf.

Why?

Because it's part of the project. I never set crlf=true on windows,
but if some files just *have* to have crlf, then I wouldn't mind
having them that way.
This doesn't mean I should have to pollute all my files with crlf just
to please visual studio, or whatever tool requires the crlf endings.

Other developers (specially those new to git) shouldn't have to worry
about crlf issues: when they clone, git would automatically convert
some files to crlf on checkout, regardless of whether or not they set
crlf=true.

git currently has it backward: putting the onus on each individual
contributer to set autocrlf=true

This doesn't make any sense.

If someone did want everything to be crlf, sure, they can set crlf=true.

But there's another potential problem: what if some files just *can't*
have crlf? Say some build (or whatever) tool barfs on crlf files, and
the user sets crlf=true because that's his preferred eol style, but
the project has one of those lf-only files? In this case, we'd want
that file to be always checked out with LF, even if crlf=true is set.
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 5:33 pm

Well, my definition of text is "does it make sense to do any end-of-line 
conversions". That's the only definition that makes sense for an SCM, at 
least in the current context. If doing conversions on the line endings is 
wrong, then it's not text.

And your whole premise was that conversions were always wrong. So the way 

.. and that's what I suggested "core.crlf=on" would mean.

However, if you think that it needs to be CRLF on _all_ platforms, even 
platforms where CRLF is _wrong_ for a text-file, then see above: in that 
case it's not a text-file at all as far as the SCM is concerned.

In that case it's just a binary file, and CRLF is _not_ "end of text 
line", it's part of the definition of the format for that binary file.

			Linus
--

From: hasen j
Date: Friday, May 7, 2010 - 6:39 pm

(sorry about the previous message, forgot to make it reply all)

What does the platform care? This doesn't make any sense. Files that
need CRLF are not Unix files to begin with (e.g. sln).

My whole argument is based on a simple premise: LF -> CRLF doesn't
make sense because all windows editors can handle LF endings, and
because it just causes a lot of confusion.

Until Erik brought up the case where a multi-platform project uses
different build systems on each platform.

I don't know if .sln is one of these formats where the tools will
vomit if it's not crlf, but let's just assume so.

- *.sln is not a Unix file, so it's perfectly ok (maybe even
desirable) to check it out with crlf.
- it's an exception; git doesn't have to convert _all_ files to crlf;
just the .sln ones.
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 6:49 pm

Don't be silly.

The whole AND ONLY point of CRLF translation is that line-endings are 
different on different platforms.

So when you say "What does the platform care?", that is a totally idiotic 
and utterly stupid thing to ask.

And since you ask it, I can only assume that you don't understand anything 
about the whole CRLF discussion, that you don't care about cross-platform 
repositories, and that as a result you should NEVER EVER actually use any 
of the git crlf conversion code.

It's that simple. You seem to totally miss the whole point of the whole 
feature in the first place.

			Linus
--

From: hasen j
Date: Friday, May 7, 2010 - 7:49 pm

I worked on several projects on windows where ALL my files were LF;
the platform didn't give a shit and everything worked great.

I don't suppose you use the CRLF feature yourself, not to mention
doing any windows development (ever?).

The way git handles crlf is just confusing; in fact it's so confusing
that it's often better to just turn it off. I'm not the only person
who thinks that. It's specifically confusing because git thinks "if
you're on windows then ALL your files should be CRLF", which is
clearly what you think.

The platform is not windows, it's the development tools. Most
development tools don't actually mind if the line endings are LF only,
and since CRLF conversions in git cause endless confusion, it's better
to turn it off most of the time, unless you're dealing with a retarded
tool that think CRLF is the only line ending and fails to read files
with LF endings.

When that happens, it's most likely the case that these files are
platform-dependent anyway, and so converting them back and forth
between LF and CRLF is just a waste of time.

The whole idea behind my suggestion is to minimize confusion.
--

From: Robert Buck
Date: Friday, May 7, 2010 - 8:31 pm

Actually, Linus, that depends. And while you will recognize this, let
me state the obvious, that there are cases where for certain text
files the platform does not matter, that for all platforms they MUST
normalize to one setting. For instance there are cases where text
files MUST be LF ended on ALL platforms. Have you considered XML to be
one such example? The W3 XML spec states:

   ... [XML processors] MUST behave as if it normalized all line
breaks in external parsed entities (including the document entity) on
input, before parsing, by translating both the two-character sequence
#xD #xA and any #xD that is not followed by #xA to a single #xA
character.

So here is an example of a text file that by convention MUST be
LF-based, yes, even on Windows. And for the record, solution (sln)
files have been an XML format for seven years now. So in any one
workspace it is entirely reasonable that there may be some text files
that MUST have LF, while for other files they SHOULD have CR/LF. There
are also cases where some text files MUST have CR/LF (some scripting
languages barf on Windows otherwise).


Hasen makes a good point here. It is simply this, the LF issue does
not boil down to a single boolean switch. People who think of the
LF/CRLF issue as a boolean switch are not dealing with all the facts.
There's a lot of grey, not simply black and white.

Commercial systems, decent ones that is, have had this right for years
(12+ years as I recall). We wouldn't be asking Git to do the right
thing if we weren't sold on Git already. Git is otherwise fantastic
(with using it on Windows being the apparent exception, hence this
conversation).


I disagree on this one actually, this comment is not spot on. Again,
it depends. I'd generally say,

* perform conversions, or no conversions as the case may be, on the
obvious file types
* when conversions occur, normalize internally to only one convention

Confusion, yes. The Git documentation is very confusing on this
point... Linus and ...
From: Avery Pennarun
Date: Friday, May 7, 2010 - 8:45 pm

Erm, this seems to be a counterexample to your point.  It says very
clearly that the files can use either LF or CRLF line endings, and
will be parsed correctly either way, or your parser is broken.  So
pretty much any CRLF conversion rule (or none at all) will work with
such files.


True.  This discussion is about fixing that, though, so it seems

How on earth is anyone suggesting that it's a simple boolean switch?
Linus posted an 8-cell truth table earlier, and he hadn't even

Unfortunately those steps aren't clear enough to be helpful.  "as the
case may be" and "obvious file types" are definitely not obvious, or

I've learned that git people never learn from anyone's book.  svn has
also had this problem solved pretty much forever, and would be easy to
copy.  For better or for worse, it all has to be hashed out from

Well... obviously.  The former case is crlf=false; the latter is
crlf=true.  To bring up my point again about the confusing
configuration options, you might think that "crlf=true" means "always
CRLF", but in fact that's not the case.  In fact it works the way you
want.

Have fun,

Avery
--

From: hasen j
Date: Saturday, May 8, 2010 - 3:36 am

Sure, I won't deny, it always baffled me why it's built into git.

The only good reason I could think of is avoiding scenarios someone
saves a file with different line endings and then all merging hell
would break loose because all lines are changed. Although
theoretically I think that can be avoided if the merge algorithm
normalized line endings before the merge (but really, I don't know
anything about merging).

Under this assumption, the point of autocrlf is that windows users
should commit with LF endings even if they use CRLF in the working
directory (e.g. some stupid text editor resaves files with crlf).

If that's not the reason, then why the hell does git care about
converting line ending styles?

If the only reason is "LF is not a new line in Windows", then I'll go
back to my previous opinion that autocrlf is useless most of the time
and shouldn't be builtin; use smudge/clean filters instead if you



That's cool and all, but we need to simplify it; not make it more
confusing. The name autocrlf is confusing all by itself: what does it
mean? is it a two way conversion or a one way conversion? Where the
hell did "input" come from? I always have to pull up the man pages.

I'd rather be able to say:

- My over all preference is 'lf'
- For this repo, this file here is always 'lf' (takes precedence over
the above preference)
- And this other file here is always 'crlf' (ditto)


No, I actually think git got source control right exactly because it
didn't bother copying other existing systems. The other system's
solutions don't necessarily fit with git's model.
--

From: Robert Buck
Date: Saturday, May 8, 2010 - 4:36 am

Perhaps I was not clear, or you did not understand my point.

Read "...by translating... to #xA", XSLT output to a file therefore
MUST be LF by definition for it to be canonical form. This is an
example of a TEXT file that MUST by definition of the spec be LF based
on all platforms. Looking at the "auto" code that exists in Git, it
does not appear to support this very obvious standard, whereby for
this "file-type" it should always be checked out of source control
with LF regardless of how it came in. This is equivalent to the Git
"input" setting I believe (?), but on a file-type basis. Yes, Git
apparently does not have the notion of file-types, does it (e.g. *.xml
maps to text)?

The point I am really trying to make clear is that there are multiple
dimensions to this problem, and not making that succinct will result
in a botched attempt. We need to carefully distinguish file-types from
other switches that control whether or not to perform automatic
conversions. The two dimensions are eol-style and file-type.

THE SWITCHES

So for the switches, here is what would be meaningful to me, short, sweet:

core.autocrlf  :: true false
core.eolstyle  :: local share lf crlf

If autocrlf is false, then what comes out is exactly what goes in.

EOL-STYLE

The eolstyle property only applies to text files (discussed later):

- "local" means normalize "text" files to LF when read in, and convert
to the platform preferred setting when materializing workspaces.
- "share" means accept anything, but when writing files to a workspace
normalize to LF (XML, XSLT, some scripting languages ...)
- "lf" means always to accept anything though and convert to LF, output LF
- "crlf" means to accept anything and convert to CRLF on output

FILE-TYPES

Linus alluded above file-types, and being explicit about them. That's
great, I agree. Let me provide examples:

By extension:
    http://www.perforce.com/perforce/doc.current/manuals/cmdref/o.ftypes.html

By pathnames or extensions:
    ...
From: Avery Pennarun
Date: Friday, May 7, 2010 - 8:34 pm

On Fri, May 7, 2010 at 9:49 PM, Linus Torvalds

I guess there's your use case for being able to turn off crlf=input, then. :)

Hasen: you and Linus don't seem to be communicating clearly, but it
looks to me like Linus's proposed changes would work fine for your use
case.  What you want is for the repository maintainer to be able to
control whether a file is checked out with crlf or not; this is
possible with *either* a per-project .gitconfig or a crlf=true
attribute that works when core.autocrlf is unspecified, which are
Linus's two suggested options.  If you really, truly want your crlf
characters not to be messed with, then set crlf=false, which means
"binary." [1].

[1] Which reminds me of my opinion about it being too hard to tell
what you're specifying given the current set of config options. But
'man gitattributes' makes at least this point clear.

Have fun,

Avery
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 5:31 pm

On Fri, May 7, 2010 at 7:18 PM, Linus Torvalds

That's easy:

 - if "override" is set, it overrides any attribute setting.
 - if "default" is set, we use it when there's no attribute or override setting.

We can argue about whether having two config options is strictly
necessary from a formal truth table point of view, and you'll probably
win the argument because it all makes my head spin.  My argument is
simpler: if it makes my head spin, it probably makes other people's
heads spin.  The way I described is simple enough for anyone to

What I meant by "lf" is just what we currently mean by "crlf=false".
It's more clear for the average person to say "eol=lf" than
"crlf=false", because "crlf=false" doesn't say what you *do* want, it
only says what you *don't* want.

Clearly any repo storing some other weird line ending, then converting


That sounds good to me.  So this was a mistake in the original
implementation of autocrlf; let's just correct it, and make all text
modes do input conversion.

Note that, in prior threads on this topic, there was some objection to
doing crlf=anything by default because it wastes CPU in the common
case that people are running on Unix and aren't doing screwy things
with line endings.  Defaulting to crlf=input would require us to waste

One nice thing about my suggestion is that it completely avoids the
concept of a "native CRLF format."  Because nowadays, that's just not
very useful.  On Unix sometimes I need crlf files; on Windows
sometimes I need lf files.  Yes, we can still implement that in terms
of "native" terminology, but it seems to a roundabout way of stating

Functionally, yes, we have this already.  Your new proposal is
essentially to make crlf=auto (= unspecified) to actually always
include crlf=input behaviour, which sounds good to me, but may be
backwards incompatible in some important way.  (I wouldn't think
anybody would want the non-fixing-stuff behaviour.  But I wonder what
it would do to git-svn... maybe it could just ...
From: Avery Pennarun
Date: Friday, May 7, 2010 - 3:19 pm

On Fri, May 7, 2010 at 5:54 PM, Linus Torvalds

Oh, good grief, I'm just getting more and more confused.

So just to keep all of this straight, I think there are still two
proposals under consideration here:

a) add an in-project .gitconfig, in which case the above crlf=auto is
exactly equivalent to "crlf attribute missing" (which is different
from "crlf unset", hee hee, are we having fun yet?) since the crlf
attribute is ignored unless core.autocrlf=true, and missing means to
use the core.autocrlf setting;

OR

b) change the semantics of the crlf attribute, in which case crlf=auto
is a new mode that means "use autocrlf on this file even if
core.autocrlf is unset or unspecified".

Right?  So in case (a), the new crlf=auto option is unneeded.  Though
it does seem as if we're trending toward case (b).

Thanks,

Avery
--

From: Dmitry Potapov
Date: Saturday, May 8, 2010 - 1:49 pm

I like your proposal and it makes perfect sense to me, but I am not new
to git and core.autocrlf. I have observed that many people who were new
to Git often got confused by meaning of the crlf attribute. In essence,
at first, they thought that it means what you would probably describe as
crlf=force. Thus, seeing something like this:

    *.sln -crlf

baffled them, because sln files have CRLF as ending. So, it was very
counter-intuitive for them. Of course, you can explain that Git stores
text files with LF internally, and why it is the sane thing to do, and
why sln files are not exactly text files (at least, non-text in sense
of eol-conversion), etc... but I believe that all those discussion and
explanation could be easily avoided by renaming 'crlf' as 'eol'.  Now,
if you look at this:

      *.sln -eol
      *.jpg -eol
      *.txt eol
      *.[ch] eol

it is clear that .sln and .jpg files are stored "as is", while Git does
the end-of-line conversion for others files in accordance with user's
preference. Why should users bother at all how Git stores text files
internally? They do not need to know that Git stores text files with LF
internally. They just want to checkout those files with the right ending
for their platform.

So, perhaps, 'eol' would be a better name than 'crlf' for new Git users.



Dmitry
--

From: Linus Torvalds
Date: Saturday, May 8, 2010 - 2:54 pm

Right. Look at it. It's totally incomprehensible. It's _worse_ than "crlf" 
as a name.

What the f*ck does "jpg" have to do with "eol"? Nothing.

You could talk about "binary" vs "text", and it would make sense, but your 
argument that "eol" is somehow better than "crlf" is just insane.

So I could certainly see

	*.jpg binary
	*.txt text

making sense. But "eol" is certainly no better than "crlf". 

In the end, crlf is what we have. We're not getting rid of it, so if 
somebody were to actually rename it, that would just mean that there are 
_two_ different ways to say the same thing. And quite frankly, I think 
that's worse than what we have now, so I don't think it's worth it.

		Linus
--

From: Dmitry Potapov
Date: Saturday, May 8, 2010 - 4:42 pm

Right, nothing, in other words, no eol conversion... and "-eol" seems to
express this idea well. So, I don't see why it is worse than "crlf"...

Personally, I do not care whether it is "crlf", or "eol", but a lot of
people that I know were confused by crlf, because they thought that it
means that this file is stored with crlf, while this attribute actually

What about .sln files? They are xml files with CRLF ending. Does it mean
they are binary? Based on how it is stored, it is certainly binary, but
when it comes to "diff" or even "merge" you may want to think about them
as text, and, in general, people tend to think about them as text files.

Another example is shell scripts. You really want them to be LF even on
Windows. So, is it a binary file too?

So, this approach is not so intuitive as it may appear if you consider

I was not sure myself that the idea of renaming worth it... While I do
think that "eol" is a better name than "crlf", but not by big margin,
and as you said crlf is what we have now... so be it...


Dmitry
--

From: Eyvind Bernhardsen
Date: Sunday, May 9, 2010 - 12:49 am

I think "binary" and "text" are the wrong things to talk about in this case.

If we were to following Avery's suggestion that we look at what we would have implemented had autocrlf not already existed, it would be better to call "crlf" something like "eolconv".  You're not saying that a file is text or binary as such, rather that "I want eol conversion for this file" or "I don't want eol conversion for this file".

Flagging a file as "-eolconv" because it should always have LFs or always CRLFs seems logical to me.  "eolconv=auto" also makes sense.


Renaming "crlf" might not be worth it, but thinking about what it should look like definitely is worth it.  Since I already have a patch series that changes this area, I'd like for it to be future proof.

I think the idea that we're stuck with "crlf" (or any bad ui design) for ever and ever is depressing, and I reject it.  It would be easy to create a new attribute with a better name that is the same setting under the hood, and deprecate "crlf".  The old attribute would still work in existing repositories (indefinitely, if needs be), but new users wouldn't have to be confused by its poor name.

I'm not saying I want to replace "crlf" right now!  I'm just saying that it makes sense to think about how we would want to replace it, and try not to introduce any new change that will make it harder to do the right thing later.
-- 
Eyvind

--

From: Robert Buck
Date: Sunday, May 9, 2010 - 3:35 am

On Sun, May 9, 2010 at 3:49 AM, Eyvind Bernhardsen

Linus - Perhaps I missed this, but where would you this typemap exist?
I like this sort of prescriptive approach; out of the box users would
get a bunch of reasonable defaults, but they could customize it by
adding/changing them.
--

From: Eyvind Bernhardsen
Date: Friday, May 7, 2010 - 2:37 pm

Just to clarify a bit more, that is _not_ what it would do.  The  
"crlf" attribute is still respected, of course.

Also, I meant to write "* crlf=auto", not "* auto-eol=true", if that  
makes it any less crazy.
-- 
Eyvind
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 2:58 pm

Oh, yes. See my other email. "* crlf=auto" is at least sensible, although 
somewhat scary. At least with core.autocrlf=true, the user has to had 
consciously set it. It was the "whole new attribute name" that I thought 
pushed it from "slightly scary" to "crazy".

		Linus
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 1:58 pm

On Fri, May 7, 2010 at 4:06 PM, Linus Torvalds

I was going to say that I do get it, but I guess I didn't.  You're
right, your proposal is functionally equivalent.  Feel free to stop
reading the rest of this post :)

For the benefit of those who might have misunderstood as I did, the
reason they're equivalent is that "core.eolStyle = LF" is the same as
saying "never do EOL conversion" since an unconverted file is
implicitly LF.  And there is already a way to say "never do EOL
conversion," which is to set core.autocrlf=False.

By adding core.autocrlf=True to an in-project .gitconfig file, we can
fix a mistake in the original definition of the crlf attribute, ie.,
it should be able to force CRLF conversion even when a user hasn't set
core.autocrlf explicitly.  But that new ability doesn't take away a
person's ability to override it globally because .git/config and
~/.gitconfig take precedence.  Notably, this solution doesn't break
any backward compatibility.

Linus's second proposed option would be to slightly change the way the
crlf attribute works, by making core.autocrlf a tri-state variable
instead of just true/false.  "Undefined" would mean "use the crlf
attribute" where currently it means (rather unhelpfully) "always use
LF even if .gitattributes says otherwise."  However, this would be a
backward-incompatible change.  Arguably, not one that anyone would
care about.  (For the record, none of my co-workers would care.  The
current behaviour is sufficiently unhelpful that we have to use
core.autocrlf=True anyway, so .gitattributes crlf hasn't been useful.)

Now, arguably, the current semantics, and even Linus's proposed
improved semantics, are still pretty hard to explain.  "This file
should always be unchanged" and "this file should always use native
line endings" and "this is my native line ending style" is very simple
and straightforward.  But I'm sure others would argue the opposite,
and it's just a matter of preference.

Have fun,

Avery
--

From: Eyvind Bernhardsen
Date: Friday, May 7, 2010 - 12:23 pm

Thanks for the support!

My objection to this idea is more practical: I suspect that parsing .gitconfig from the repository would be a lot more work than my simple hack :)
-- 
Eyvind

--

From: Nicolas Pitre
Date: Friday, May 7, 2010 - 12:31 pm

Given that only a subset of gitconfig could make sense to have 
distributed, I think the file should be named .gitparams to make the 
distinction clear.


Nicolas
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 12:36 pm

Since the options it *does* have are exactly the same as .git/config,
however, naming it .gitconfig makes sense.  I'd say just print a
warning when reading options that are going to be ignored for security
reasons (or because they're not known at all, or whatever).

Avery
--

From: Nicolas Pitre
Date: Friday, May 7, 2010 - 1:29 pm

Or just make it .gitparams (or anything you wish) which is not the same 
as gitconfig. This way it is less likely to get bogus bug reports for 
options that aren't supported.


Nicolas
From: Avery Pennarun
Date: Friday, May 7, 2010 - 2:00 pm

It has exactly the same syntax as ~/.gitconfig, and the options it
does support can all be carried over literally to ~/.gitconfig.
Calling it something else would imply that it deserves its own man
page, which would need to repeat all the options that are already
documented for ~/.gitconfig.

I'd say something that's syntactically identical, and in some cases
actually interchangeable, should have the same name.  Using a
different name could actually be *misleading*.

Have fun,

Avery
--

From: Nicolas Pitre
Date: Friday, May 7, 2010 - 2:12 pm

Absolutely not.

Most options for ~/.gitconfig simply make no sense in a distributed 

No because most of those options don't and can't apply to a distributed 

Indeed.  But this is not the case here.


Nicolas
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 2:26 pm

No, that's the converse of what I said.

Try this in your head:

    cp .gitconfig .git/config

Perfectly valid.  Copying the other way might (or might not) result in
invalid options in .gitconfig, which probably ought to be warned


Hmm, how to name the file is most a matter of opinion, but this last
bit is just factual ;)  They're syntactically identical.  And in some
cases, they're interchangeable.  I don't see how one could argue
otherwise.

Have fun,

Avery
--

From: A Large Angry SCM
Date: Friday, May 7, 2010 - 3:09 pm

Avery Pennarun wrote:
[...]

Which one takes precedence? I *MUST* be able to override a distributed 
.gitconfig/.gitparams/.gitparameters file.
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 3:10 pm

Yes, absolutely.  As Linus said, the in-project file is lower priority
than your .git/config and ~/.gitconfig files.

Avery
--

From: Linus Torvalds
Date: Friday, May 7, 2010 - 12:40 pm

I went through the options listed in "man gitconfig", and quite frankly, I 
didn't find any new ones. I didn't grep the source, and I'm sure they're 
not all documented, but if it really is just two options, I doubt it's 
worth it at all.

Hopefully nobody sane uses any non-utf8 encoding for commit messages 
anyway (but what do I know - I have no idea about Asian usage, where it 
may make more sense than in US/Western Europe). So i18n.commitEnconfig is 
not likely to be a big deal.

And just making the crlf attribute work regardless of core.autocrlf sounds 
like it wouldn't be a bad idea. Just _maybe_ we could actually make an 
_explicit_ "core.autocrlf = off/false" actually disable any .gitattribute 
crlf settings, but I'm not sure even that is a good idea.

So I'd suggest relegating "core.autocrlf" to just files that are _not_ 
covered by some explicit .gitattribute setting. After all, that just more 
solidly puts the "auto" in autocrlf.

		Linus
--

From: Nicolas Pitre
Date: Friday, May 7, 2010 - 1:32 pm

I don't dispute that.

I was merely pointing out that naming such a file .gitconfig is a bad 
idea if it doesn't duplicate the entire .git/config functionality.


Nicolas
--

From: Junio C Hamano
Date: Friday, May 7, 2010 - 12:06 pm

Yes, that is exactly what I was alluding to.
--

From: Eyvind Bernhardsen
Date: Friday, May 7, 2010 - 12:25 pm

Ah, of course.  Thanks for the clarification!  I didn't understand what Junio meant (and was composing a long email which may or may not have had a bitter tone); now I'm preparing a new patch series instead.
-- 
Eyvind

--

From: Finn Arne Gangstad
Date: Friday, May 7, 2010 - 12:41 pm

The crlf attribute says whether to enable autocrlf functionality for a
file, but that is not what is really wanted. auto-eol instead says how
line endings should be stored in the repository. Also, auto-eol will
only affect files auto-detected as text (or forced to be treated as

Maybe it is sufficient to add a new value to "crlf" that means:

- If the file is autodetected as text:
  - Convert to LF only on commit, and
  - Convert to your preferred EOL style on checkout.

I don't think autocrlf is a good place to specify preferred EOL
style, it is too dangerous to set autocrlf to true by default, but it should
not be dangerous to say that your preferred EOL style is CRLF.

- Finn Arne
--

From: Avery Pennarun
Date: Friday, May 7, 2010 - 1:06 pm

Assuming it's updated to reuse the existing crlf attribute instead of
adding a new one, that seems to be exactly what this patch series is
about.  "Your preferred EOL style" is the newly introduced
core.eolStyle config option.  So... good idea :)

Have fun,

Avery
--

From: Eyvind Bernhardsen
Date: Friday, May 7, 2010 - 1:11 pm

I think keeping the existing shortcomings is partly necessary because I don't want to break any existing repositories by changing the meaning of "core.autocrlf=input" and "core.autocrlf=true".

I also like "core.eolStyle" because I want a config setting that explicitly says "crlf" or "lf" rather than forcing the user to remember what "true" and "input" mean.  The new series will keep core.eolStyle.



Yes.  And it shall!
-- 
Eyvind

--

From: Gelonida
Date: Friday, May 7, 2010 - 12:15 am

I'm not convinced, that one policy is a good solution, but it really
depends on your project.


What we do:
.bat files with windows line endings
.cmd .vbs files with windows line endings
.sh files with unix file endings
 source files (.c .h .py .pl) with unix file endings
.txt files with unix file endings

The rest untouched:
you might add a precommti hook to verify this.
SO war we din't bother to automate it, but I must admint, that we had
occasional rare jickups.


bye


N



--

Previous thread: Unable to commit with TortoiseGIT 1.4.4.0 for Windows by santos2010 on Wednesday, May 5, 2010 - 1:30 am. (4 messages)

Next thread: [GSoC update] git-remote-svn: Week 1 by Ramkumar Ramachandra on Wednesday, May 5, 2010 - 5:59 am. (5 messages)