Hi! When I clone git://git.debian.org/git/turqstat/turqstat.git using the msys-Windows version of git (1.5.4-rc2), some but not all the files get autoconverted to CRLF. Is it possible to set properties for the files that are text, to make sure they are converted properly? -- \\// Peter - http://www.softwolves.pp.se/ -
Per default, CRLF conversion is disabled in msysgit. Git should
not convert a single file. Does it really convert some?
You can verify that CRLF conversion is off by running
git config core.autocrlf
which should just print an empty line.
You can enable automatic conversion for all text files by running
git config core.autocrlf true
(this can be set on a per-repository basis or you can set a
default for your account if you pass the '--global' option.)
A difficulty you'll run into is that you need to set
"core.autocrlf true" before you checkout. But because git clone
fuses git init, git fetch, and git checkout into a single
operation, you can't use it as is if you like to enable CRLF
on a per-repository basis (it works if you set a global default).
You can either use
git clone -n URL # -n tells clone to stop before checkout
cd turqstat
git config core.autocrlf true
git checkout -b master origin/master
or you can manually do what clone would do for you, i.e.
mkdir turqstat
cd turqstat
git init
git config core.autocrlf true
git remote add origin git://git.debian.org/git/turqstat/
turqstat.git
git fetch origin
git checkout -b master origin/master
(this is what I typically do).
BTW, I think that git clone should be improved to avoid the
workaround described above. Maybe it could ask the user if it
should set up a specific line ending conversion before checkout.
Unfortunately, I had no time to write a patch, yet.
Steffen
-I use an alternate workaround that clones the repository, removes the checked out files, sets autocrlf, then checks out the files again: $ git clone git://git.debian.org/git/turqstat/turqstat.git $ cd turqstat $ git config --add core.autocrlf true $ rm -rf * .gitignore $ git reset --hard The result should now be the same as using Steffen's system. However, there is still an unresolved problem with git's way of treating cr/lf as an attribute only of the checkout and not the repository itself: $ git status # On branch master # Changed but not updated: # (use "git add <file>..." to update what will be committed) # # modified: visualc/.gitignore # modified: visualc/turqstat.sln # modified: visualc/turqstat.vcproj # no changes added to commit (use "git add" and/or "git commit -a") So, checking out the repository with cr/lf true has now caused misalignment of files that were originally checked in with existing cr/lf's in place. Visual Studio in fact happily works with files that only have lf endings, _except_ *.sln and *.vcproj files, which it much prefers to have with cr/lf endings. The _real_ solution to this problem for the moment is _not_ to mix files with both lf and cr/lf endings in the repository. So, the original author of the repository should _also_ have used core.autocrlf true, thus causing the *sln and *vcproj to have their cr's stripped on checkin, but replaced on checkout when checking out with autocrlf true. ------------------------------------------------------------------------ Peter Klavins -
I didn't verify, but it was only some files that had LFs, perhaps the files that I added while on the Windows machine had CRLFs. That's bad. The project files were added to the repository on the Windows box (obviously), so those are correct. So apparently my repository is a bit broken at the moment with LF on some files and CRLF on some. That's bad. I just assumed everything worked, it used to "just work" for CVS (except for when you actually tried to add binary files, of course). -- \\// Peter - http://www.softwolves.pp.se/ -
This is a typical problem. Once CRLFs are in your repository autocrlf can't "just work" anymore. You need to commit a fixed "Just works" has a different meaning for git than it has for CVS. For git, it means that once you _told_ git how to convert line endings (that is you have correctly configured autocrlf), git will automatically detect text files and convert them, but leave binary files untouched. It "just works" in the sense that you do not need to explicitly tell git about every single binary files (no cvs -kb needed). Git will auto-detect the file type. But if you does tell git to convert line endings it "just works" as if every file was binary. Per default, git does not modify your content. And for some people, "just works" means exactly this: leave my content as is. So it really depends on the context and therefore some configuration is inevitable. git requires you to configure autocrlf. cvs requires you to set -kb. You may, though, set "core.autocrlf true" globally for your account. After you did this, git should "just work" for you; if "just works" means convert CRLF in _all_ text files in _every_ repository. Steffen -
LOL. Exactly. That's my only gripe with git, there's still some way to go before it's as usable as CVS in this regard, but of course in every other feature it's way superior. If you follow the steps I listed, you will have new .sln and .vcproj files that you can commit over the top of the ones already there, and everything will be fixed! I checked out your project and it built fine. ------------------------------------------------------------------------ Peter Klavins -
You could try .gitattributes to exclude files from crlf
conversion. But I'd not recommend this, because the mechanism
has some deficiencies, as discussed in
For cross-platform projects, I recommend to explicitly configure
autocrlf on Windows and Unix. On Windows set
git config core.autocrlf true # on Windows
and on Unix set
git config core.autocrlf input # on Unix
This ensures that the repository only contains LF. Even if someone
emails source code from Windows to Unix and commits it there.
Steffen
-I don't know if there are other options that might impact how clone
works, but something like the patch below might make sense. It would
allow:
git clone -c core.autocrlf=true ...
Note that the patch should not be applied; it doesn't handle values with
whitespace (and hopefully builtin clone will come soon after v1.5.4,
which would make doing it right much simpler).
---
diff --git a/git-clone.sh b/git-clone.sh
index b4e858c..a002550 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -23,6 +23,7 @@ reference= reference repository
o,origin= use <name> instead of 'origin' to track upstream
u,upload-pack= path to git-upload-pack on the remote
depth= create a shallow clone of that depth
+c,config= set a config option of the form key=value
use-separate-remote compatibility, do not use
no-separate-remote compatibility, do not use"
@@ -127,6 +128,7 @@ use_separate_remote=t
depth=
no_progress=
local_explicitly_asked_for=
+config=
test -t 1 || no_progress=--no-progress
while test $# != 0
@@ -173,6 +175,9 @@ do
--depth)
shift
depth="--depth=$1" ;;
+ -c|--config)
+ shift
+ config="$config $1" ;;
--)
shift
break ;;
@@ -242,6 +247,12 @@ fi &&
export GIT_DIR &&
GIT_CONFIG="$GIT_DIR/config" git-init $quiet ${template+"$template"} || usage
+for i in $config; do
+ key=`echo $i | cut -d= -f1`
+ value=`echo $i | cut -d= -f2-`
+ git config $key $value
+done
+
if test -n "$bare"
then
GIT_CONFIG="$GIT_DIR/config" git config core.bare true
-You can also set the option globally. Maybe something for the installer or a first time wizard. But I do think git should have this option set right from the beginning. It could print out somethig to notify the user that (and which) some options are not set the same as on unix. -- robin -
Hi, On Mon, 7 Jan 2008, Robin Rosenberg wrote: > m
Indeed, but the most common SCM's detect binary files automatically, either by suffix or content analysis, so I think that is what user's expect. It will be right for more projects that the current behaviour. -- robin -
as a user, I expect a SCM to only modify a file when I have explicitly asked it to do so. Automatically conversion by guessing file types are evil, as they _will_ go wrong, and then mess some files. This "intelligent" file handling is a pain to use. You end up in situations were builds work on some platforms but not on others, which gets even more confusion with NFS home directories. So, please do not enable core.autocrlf by default on Windows. It might be reasonable for some projects, but not for all of them, and it will break some projects. Perhaps a project should be able to enable (or "suggest" it) in a repo-wide setting somehow, which would avoid the git clone problem. Thomas -
As a user, I exepect things to just work. With RCS/CVS/Subversion, it does, because it differentiates between text files (internally encoding NLs with "LF", but I couldn't care less what it uses there) and binary files (which it doesn't change). With git it currently doesn't since it treats everything as binary files. Yes, it's the whole text vs. binary file issue. We do live in a world where different systems store text differently. We have to deal with it. Preferrably, the computer should deal with it without me having to do anything about it. After all, that's what computers are good at. If I occasionally need to do a git add -kb binary.txt to flag a file explicitely, that's a small price to pay for everything else to work out of the box. FWIW, I wouldn't care if git internally stored all texts as SCSU/BOCU (or UTF-32, for that matter, if Git's compression engine is better than SCSU or BOCU) using PARAGRAPH SEPARATOR to separate lines, just as long as I could get back the text I checked in. Come to think about it, locale autoconversion of text files would be a nice way to work between systems that want different encodings, like how Windows prefers UTF-16LE, Mac OS X prefers UTF-8 and Linux systems prefers whatever I have set my locale to (I still use iso-8859-1, so shoot me). -- \\// Peter - http://www.softwolves.pp.se/ -
With subversion you must explicitely enable it to "just" work. Subversion auto-tags files with specified extensions, when they are added, with svn:eol property specifying how the file should be converted and than converts (everywhere) the files to specified line endings. However, AFAIK, it does not convert anything unless the properties are set and the default config has the automatic setting *commented out*. -- Jan 'Bulb' Hudec <bulb@ucw.cz> -
Actually, Subversion does the Right Thing, and treats everything as a binary file until and unless you explicitly set the svn:eol-style property on each file that you want it to mangle. Maybe you set up Subversion auto-props and forgot about it? That would be almost (but not really) like setting autocrlf=true in your global git config. Peter Harris -
Not exactly. Actually, Subversion detects binary or text file based on heuristic. http://subversion.tigris.org/faq.html#binary-files But the status of a file as binary or text has no effect whatsoever on on CRLF conversation, which is controlled by another property. By default, most of your text files will be detected as text (unless you use non-ASCII character like Cyrillic), but they will not have CRLF conversation. Now, you have to set svn:eol-style=native for each new file, which of course can be done automatically based on file extension, but that should be configured by each user in his or her global config file. Obviously, it does not work well for cross-platform projects, because many users forget to set svn:eol-style=native for some extensions. Moreover, the issue tends to repeat itself for every newly introduced file extension... IMHO, having the binary or text status completely independent from CRLF conversation is insanity... Dmitry -
I'd actually like a feature like this. On the internal subversion tree I'm working on (using git-svn), there are quite a bit of files that have CRLF endings -- we are a cross platform development group. The solution to this in subversion was that everyone had the same .subversion/config with a bunch of autoprops set; i.e.: [auto-props] *.H = svn:eol-style=native *.h = svn:eol-style=native *.CPP = svn:eol-style=native *.cpp = svn:eol-style=native and I can't do the same using git-svn. Thankfully emacs detects CRLFs and adjusts accordingly, and that's my workaround for it, but it would be nice to have some kind of gitattribute that allows you to set the autocrlf according to a filter. -- Kelvie Wong -
Actually, I've never actively set up a Subversion server myself, nor created any projects in Subversion (I have checked out some Subversion repos, though). I started using RCS and CVS, and now I'm migrating at least parts of that to Git (not all). Since Git is better than CVS in many ways, I would like it to be better than CVS in this one as well. -- \\// Peter - http://www.softwolves.pp.se/ -
Hi, <tongue-in-cheek>Hey, if Subversion does what you want, why not just use it?</tongue-in-cheek> Ciao, Dscho -
For you, perhaps, since you apparently infrequently commit binary files and derive some benefit from CRLF conversion. But please bear in mind that there are people on the other end of the spectrum who want the opposite (i.e., who could care less about CRLF, but _do_ have binary files). -Peff -
Hi, Do not forget the people who say that git is a content tracker (as opposed to a content munger). Git was really intended as a tracker of octet strings which are organised in tree structures, and where you can have revisions over those tree structures. That is the beauty of git: it keeps simple things simple. Now, for some, this is a curse ;-) Ciao, Dscho -
Yeah, I suspect it's not only the "expected" behavior, but people have had years of getting used to the whole binary issue, and are much more likely to expect binary corruption than to expect to have to worry about CRLF. And while it's true that it probably doesn't matter at all as long as you stay windows-only (and everything is CRLF), it's also true that (a) maybe you don't necessarily even know that some day you might want to cast off the shackles of MS and (b) even under Windows you do end up having some strange tools end up using LF (ie you may be using some tools that were just straight ports from Unix, and that write just LF). So defaulting to (or asking) "autocrlf" at install time is probably the safest thing, and then people can edit their global .gitconfig to turn it off. Linus -
Indeed. A checkbox in the Windows installer (like Cygwin has) would be nice. -- \\// Peter - http://www.softwolves.pp.se/ -
Hi, No. There are different needs for different projects, and having different defaults just adds to the confusion. I am no longer opposed to setting crlf=true by default for Git (although this does not necessarily hold true for msysGit, but that could be helped by explicitely unsetting crlf for the repositories we check out with the netinstaller). Ciao, Dscho -
I'll further think about "crlf=safe" (see another mail in this thread). I like the idea of safe because it guarantees that data will never be corrupted. But I have no time to think about it immediately. Steffen -
crlf=safe [i.e. munging CRLFs only if there are no bare LFs] sounds appealing to me as well because it looks like munging that is always reversible. However there could still be problems at checkout. To be really safe, it seems to me that it must be 1) reversible in practice and 2) ALWAYS reversed unless we explicitly ask for no gnuming at checkout. Why? Re point (1) to be reversible in practice, we need to know who we've munged. Otherwise when gnuming blindly at checkout we might damage some innocent bystander file that only ever had LFs in the first place. So it seems we would have to keep track of who was munged. But do we want to store this in the repository? Re (2) well if we happen to munge a file on checkin that is actually binary, it must be gnumed on the way out otherwise it will be broken for the user. -
If you work on Windows and you have clrf=safe, you cannot put a text file that has only LFs, because naked LF is not allowed. If you want to have naked LF in some file, you have to say that explicitly in .gitattributes. If you work on cross platform project, and somebody else put a file with bare LFs, which is not text though heauristic wrongly detected it as text then you can remove this file from your working directory, correct .gitattributes and checkout this file again. The idea of crlf=safe is that information is never lost. It is always fully reversible, and if you put something into the repostory, you always get back exactly the Of course, it will, because the same heuristic will detect it as text, and convert it back. So as long as you stay on the same platform and with the same .gitattributes, you always get back exactly what you put. Dmitry -
Dmitry, I think all of your comments are correct, BUT, this behaviour as currently proposed still does not seem to me safe (or perhaps transparent) enough to be enabled by default on a Windows platform (or for that matter a Unix one). If LF text files checked in on Windows get turned into CRLF files on checkout by default then I think plenty of people would be surprised and probably unhappy. Similarly I think it would be a bad thing if a binary file that looked like LF only text got mangled on checkout by LF->CRLF conversion - although I agree that it would be possible to recover from this situation with a bit of juggling. So my view is still that this behaviour would be a useful option when explicitly enabled by .gitattributes (as opposed to the current auto CRLF implementation, which could lead to irreversible munging) but that it is not an appropriate system-wide default. I could however see that sane people might disagree! For that matter autocrlf=true,input,safe are all slightly dubious when used as config vars rather than as attributes for the same collateral damage reason discussed above. The only way to prevent collateral damage is to consult .gitattributes on checkout (as Dmitry seemed to be assuming above) rather than gnuming anything in the repository that looked like LF only text. Of course even .gitattributes can change over time, so only by storing a "munged" metadata attribute in the repository could you guarantee that everything came out as it went in - which I think is a highly desirable base state. Greg. -
Hi Gregory,
LF text cannot be checked in with autocrlf=safe without marking that there
is no CRLF conversation for this file. So, what you describe is impossible.
Again, you can't do that with autocrlf=safe. Yes, it is possible that
someone else on Unix to put a file like this, but it is a rare event and
easy to recover. So, it is a very small price to pay for cross-platform
Yes, I assumed this. Isn't it how it is implemented now?
static int crlf_to_worktree(const char *path, const char *src, size_t len,
struct strbuf *buf, int action)
{
char *to_free = NULL;
struct text_stat stats;
if ((action == CRLF_BINARY) || (action == CRLF_INPUT) ||
auto_crlf <= 0)
return 0;
If crlf=false for some file then action will be CRLF_BINARY, and
crlf_to_worktree will not convert LF to CRLF. Did I miss somthing?
Dmitry
-Hi, There is a bigger problem here, though: As of now, you can add a (loose) object from a big file pretty easily even on a small machine, because you do not need the whole buffer, but you stream it to hash-object. IIRC Junio wrote a patch to allow this with "git-add", using fast-import, but that patch probably hasn't been applied. Ciao, Dscho -
I don't think that crlf=safe requires that the whole file was put into the buffer. It can work with stream, but it will call die() if a file that was detected as text has a naked LF. Dmitry -
Hi, [msysGit Cc'ed, since it is massively concerned by this thread] On Mon, 7 Jan 2008, Robin Rosenberg wrote: > m
Eventually I gave in and even voted for "Git does not modify content unless explicitly requested otherwise". Here's the full discussion: http://code.google.com/p/msysgit/issues/detail?id=21 I believe the main question is which type of projects we would like to support by our default. For real cross-platform projects that will be checked out on Windows and Unix we should choose "core.autocrlf true" as our default. But if our default are native Windows projects that will never be checked out on Unix, then we should not set core.autocrlf by default. I once fought for "real cross-platform", because this is what I need in my daily work. Note, however, that this setting bears the slight chance of git failing to correctly detect a binary file. In this case git would corrupt the file. So there is a tiny chance of data loss with "core.autocrlf true". The safest choice is to leave core.autocrlf unset. Steffen -
If the policy really depends on the project, then surely the default behavior should be determined by information carried in the project itself (e.g., the .gitattributes)? For that reason it strikes me as a mistake to ignore the crlf attribute by default (assuming that is indeed the current behavior; apologies for not checking). If crlf is set then I think it should be assumed that crlf conversion should be done unless that has been explicitly turned off somehow. --b. -
That sounds like a mistake if you are installing a port to a platform whose native line ending convention is different from where plain git natively runs on (i.e. UNIX). -
I'm not sure that I understand the whole deal about platform default line
endings. Isn't plain git functionally agnostic about line endings? You can
check in CRLF text files to git and it doesn't care. You can diff, show etc
just fine. I haven't yet found anything that breaks with CRLF files. In
this sense plain git is already Windows ready. Maybe I'm missing something?
Doesn't the problem only come if you try to diff a CRLF file with a new
version that has LF only line endings? Then right now you have to use
something like:
git diff --ignore-space-at-eol
Or if a Windows user clones a repository created on another system. For
these cross-platform circumstances, it seems to me sensible to have an
option (probably enabled by default on all platforms) that allows files to
be munged on check in to whatever EOL style the repository creator preferred
(probably stored in .gitattributes and could be different for different
files in the repo - e.g. a windows vendor src dir on a cross-platform
project). Note that this means that munging would only happen if someone
actually asked for it - which would be a sensible thing to do as the
administrator of a cross-platform project.
Then there would be a separate option (probably not enabled by default) to
check out with the platform's native line ending instead of whatever is in
the repo. This would allow people to work with inflexible toolsets.
Finally for people who want to work with native line endings that are
different from repository line endings, then it might be necessary to
improve the handling of diffs by providing a config var to make
--ignore-space-at-eol the default (or perhaps more correctly
--ignore-line-endings) for text files. From my preliminary reading of list
history improving the inspection of content rather than trying to change
content might be the more gitish thing to do.
In conclusion all of these CRLF options are designed to help Windows users
play nicely with others. But it seems to me naïve Win...One example that bit me recently was "git-apply --whitespace=strip" I have files with CRLF in my repo, but git was stripping the CR from lines that I applied via a patch. I worked around it with a smudge/clean filter of "dos2unix | unix2dos" (first removes all CR's, second puts one back on each line) Rogan -
You might want to go back the list archive for a few days to
find this patch:
[PATCH 2/2] core.whitespace cr-at-eol-is-ok
and try it out.
-OK so that's interesting. Is it a case where core git is not crlf agnostic? Looks like CR is being considered whitespace. I think git diff --ignore-space-at-eol also works because CR is considered whitespace. Maybe that's the wrong behaviour. So the big question for me. Should git expect that text files inside a repository have to have LF only line endings? I don't think that it should, but should accommodate both CRLF and LF. I guess at the moment git normally accommodates CRLF files because they look like an LF file that happens to have a funky whitespace char in front of the LFs. Maybe it would be better if edge cases like the one you described were ironed out. -
If you work together with other people on other platforms, then CRLF is a major pain in the *ss. So you have various options: - only develop on unix-like platforms: lines end with LF, and nobody has any problems regardless of autocrlf behaviour. Might as well consider everything binary. - only develop on windows, using only one set of basic tools: lines normally end with CRLF, and nobody cares. Migth as well consider everything binary. - Mixed windows/unix platfoms, but the Windows people are constrained to use only tools that write text-files with LF. Might we well consider everything binary. Quite frankly, Johannes seems to argue that this is a viable alternative, but I seriously doubt that is really true. Yes, there are lots of Windows tools (pretty much all of them by now, I suspect) that *understand* LF-only line endings, but it's also undoubtedly the case that if you allow windows developers to use their normal tools, a number of them *will* write files with CRLF. - Mixed windows usage - either with other UNIX users, or even just *within* a windows environment if *some* of the tools are basically UNIX ports (ie MinGW or Cygwin without text-mounts) In this case, some tools will write files with CRLF, and others will write them with LF. Again, usually all tools can *read* either form, but the writing is mixed and depends on the tool (so if you work in a group where different people use different editors, you will literally switch back-and-forth between LF and CRLF, sometimes mixing the two in the same file!). This one - at the very least - basically requires "autocrlf=input". Anything else is just madness, because otherwise you'll get files that get partly or entirely rewritten in the object database just due to line ending changes. So in *most* of the situations, you probably don't need to worry about autocrlf. But the thing is, I'm almost 100% convinced t...
So this is what has to be accommodated. But instead of having autocrlf always set on Windows and always converting to LF in the repository, why not do nothing by default unless the repository contains some information specifying that it wants some or all text files to have a particular kind of line ending (e.g. in gitattributes). Then the choice of line ending inside the repository is up to the people creating/maintaining the repo, which just seems right. Insisting that repos created on windows should have textfiles munged to LF by default doesn't seem right. Even using Dmitry's clever autocrlf=safe option on Windows would lead to inconvenience since all LF files have to be explicitly attributed as text. We should be helping Windows people to use LF files rather than hindering them! -
Why? You can screw yourself more, and much more easily (and much more subtly), by leaving CRLF alone on Windows. The thing is, 99.9% of all people will be *much* better off with autocrlf=true on Windows than with it defaulting to off (or even fail). Isn't *that* the whole point of having a default? Pick the thing that is the right thing for almost everybody? And no, "but think of the children.." is not a valid argument. Sure, you *can* corrupt binary imags with CRLF conversion. But it's really quite hard, since the git heuristics for guessing are rather good. You really have to work at it, and if you do, you're pretty damn likely to know about the issue, so that 0.1% that really needs to not convert (and it's usually one specific file type!) would probably not even turn off CRLF, but rather add a .gitattributes entry for that one filetype! (Side note: if there are known filetype extensions that have problems with the git guessing, we sure as heck could take the filename into account when guessing! There's absolutely nothing that says that we only have to look at the contents when guessing about the text/binary thing!) Linus -
Are you also for "autocrlf=input" as the default on Unix? This ... and then Windows and Unix users would have the same chance of data corruption. Which is very low, yes, but unfortunately it already hit me once and I didn't immediately recognized what happend. I guess that less experienced git used would have a harder time to understand. However, I don't have a test case at hand. I should probably better go and find one. So for now, you may just want to ignore this comment. Yet, I'm a bit paranoid about the potential data corruption. The way data would be corrupted during commit can't be easily fixed. You only have a chance for fixing this if you recognize the problem before you delete the file in your work tree. But because git is extremely good at preserving your data once you committed a file, I tend to feel _very_ safe after I committed and I am teaching all people that once they committed data to git they'll not loose it until the reflog expires (well and obviously Looking on the content seems the right thing to do. The filetype extension could be misleading. Maybe a mechanism similar to the file command would be more valuable. I guess a stripped down variant should be sufficient. Steffen -
No. What would it help? "autocrlf" on Windows actually helps you (big upside, very small downside). On Unix or other sane systems, it has zero upside, so while the risk is still very small, there is now no big upside to counteract it. Again, what is "default" supposed to be? I argue that it's supposed to be the thing that is right for 99.9% of all people. And that simply isn't true on Unix. Linus -
You may later decide that you want to check out your project on Windows. In this case your repository should not contain CRLF. autocrlf=input ensures this. So given the current options, autocrlf=input is the only reasonable default on Unix if git wants to support cross-platform autocrlf=input is true for the very same people that need autocrlf=true on Windows. Every developer who ever plans to check out his code on Windows and on Unix should have these default. I don't think the CRLF problem is a Windows vs. Unix discussion. In my view, the discussion is wether git will have real cross- platform support as its default or not. The current default is sane for native Unix or native Windows projects. For cross- platform projects the default needs to be changed in the way described above. Git needs to ensure that CRLF never enters the repository for text files. If you did not set autocrlf=true, copying source code from Windows to Unix would not be supported. But as you earlier mentioned, this seems to be a common operation and I am observing the same. So I recommend autocrlf=input on Unix if you plan to ever go cross-platform. Steffen -
Sticking my head above the parapet again ... LF only repositories are model that everyone is tending towards but I feel that there are (sane) people out there who would sometimes like to have CRLF files in the repository and do cross-platform development (I would developing on a Mac for a Windows originated Win/Mac project or if I were keeping vendor source code in a tree). In spite of the plethora of autocrlf variants so far there is still none that on unix would give you LF->CRLF on check in and CRLF->LF on checkout! This should be perfectly compatible with git's internals and I think it should be possible to allow this without breaking anything for other situations. One solution, which would have other uses, would be to allow checkin conversion to a specified line ending and checkout conversion to platform line ending as separately configurable options. If this seems outrageous then it should be made perfectly clear that the git project strongly discourages CRLF text files in cross-platform repositories, that to prevent CRLF creep we disallow them by default even in the privacy of your own OS (if it's Windows) and that if you want to do this you're on your own mate. But I think that would be a shame, inflexible and definitely For me this is kind of the mathematician vs the engineer. I think Steffen is logically correct in saying that autocrlf=input on unix is the direct orthologue of autocrlf=true on windows and I dislike the idea that git should show logically different behaviour on different platforms. However I think Linus's cost/benefit analysis is right: CRLF files appear infrequently on unix system and often as not it's because someone specifically wants them to stay that way. So I think autocrlf=input is a useful option but not a necessary default on unix. -
Git internally considers only LF as the EOL marker. I think there are more three hundreds places in Git where the decision about end-of-line is made based on that. Though CRLF may appear to work, but it is more an artifact caused by its LF ending, so what it actually works is LF and nothing else. IOW, CRLF from the Git's point of view is no better EOL Because LF is the only true EOL marker, and CRLF is not and never will be. In fact, Git is written in C, and the decision of what is EOL in C is made many years ago. So, it is the only sane choice to use LF for _internal_ representation. It can be said that *nix users are lucky in that their OS uses the same symbol, but it is similar to big-endian platforms being lucky with byte order when it comes to TCP/IP. That is not because TCP/IP wants to discourage little-endian platforms, but having the single encoding is the only sane choice if you care about interoperability, and any other decision will end up being much worse. Dmitry -
But under Unix, it would never do that *anyway*, unless the file for some reason really needs it (which I cannot imagine, but I've never seen anything so craptastically stupid that some crazy person hasn't done it) So your argument is bogus. Linus -
Ah sorry, I misunderstood you in [1]. I thought your last point "Mixed Windows usage" meant what I have in mind: A user working in a mixed Windows/Unix environment who creates a file using Windows tools and commits it in the Unix environment. In this case the CRLF file will be transferred from Windows to Unix without git being involved. The right thing for git on Unix is to remove CRLF during a commit but still write only LF during check out. So autocrlf=input is the right choice. [1] http://article.gmane.org/gmane.comp.version-control.git/70082 It happens that people working in a mixed environment do such things. They just copy files from Windows to Unix and commit there. Not very often, but it happens. So it would be nice if git would handle this situation and it actually can by setting autocrlf=input. My point is that perfect support for mixed environments requires that git removes CRLF from any input on any platform. However, git should behave differently during checkout. In this case the native line ending should be written (LF on Unix, CRLF on Windows). The difference happens during check out; commit should be handled identically. Steffen -
Oh, ok, I didn't realize. But yes, if you use a network share across windows and Unixand actually *share* the working tree over it, then yes, you'd want "autocrlf=input" on the unix side. However, I think that falls under the "0.1%" case, not the "99.9%" case. I realize that people probably do that more often with centralized systems, but with a distributed thing, it probably makes a *ton* more sense to have separate trees. But I could kind of see having a shared development directory and accessing it from different types of machines too. I'd also bet that crlf behavior of git itself will be the *least* of your problems in that situation. You'd have all the *other* tools to worry about, and would probably be very aware indeed of any CRLF issues. So at that point, the "automatic" or default behaviour is probably not a big deal, because everything _else_ you do likely needs special effort too! Linus -
On Fri, 11 Jan 2008 10:10:00 -0800 (PST) That's how I work all the time. My Linux box is a Samba server where I check things out from perforce (with the "share" settings for end of line which means that text files are checked out with LF only and CRLF is converted to LF on checkin). Having the data on the Linux box is nice since I can have all the nice Unix tools such as sed, find, grep, and they run fast on a native Linux system, which is not true about We're working in a mixed environment, and even though I do most of my development on Linux I usually want to make sure that things build in Visual Studio before I check in, so the easiest thing to do is to point Visual Studio at the files on the Samba share. Same thing when using Altera's tools to do CPLD development, I run the Altera tools on Windows (their free version is Windows only) but all the files are on the Linux box. My tools that take the SVF file (the "binary image" for the CPLD) and program the CPLD all run under Linux though. A lot of my colleagues have Windows on the desktop, and when they develop on Linux they usually edit the files locally using the Samba share, and then they have a Putty (ssh) connected to the Linux box where they build and test the software. Actually I seldom have any problems with CRLF at all. Sometimes the Xilinx or Altera editors will insert some stray CRLFs in some files, but all the tools I use seem to tolerate that. And as soon as I check in the CRLFs disappear anyway. We just have to make sure to turn on the "share" setting in our Perforce views and everything just works. /Christer -
It just happens yesterday that I copied a file from Unix to Windows (lucky I am ;) for a quite simple reason. I fetched and merged and realized that another developer forgot to check in a new file. He had already left. So I just looked into his workspace and copied the file. This has nothing to do with centralized system or not. We're just working in a mixed OS environment with shared filesystems. I didn't even think about the line endings in this situation because everything just worked. Actually I like the idea that I do not need to think about the endings because git will care about them. Actually many other tools work well with CRLF. For example, vi just displays [dos] in its status bar; but besides this everything I don't think so. In the setting I described above, the questions I receive are not about the other tools but about git. I already started to teach everyone the new "autocrlf=input" policy to avoid these questions. I don't care that much about potential file corruption (though I'd feel more comfortable if I knew git would have stronger guarantees). During the next checkout on Windows file corruption would happen anyway. Steffen -
I certainly don't think "autocrlf=input" is wrong. It might even be a reasonable default on Unix, although I don't think it's nearly as obvious as the Windows case. I wouldn't mind using it myself, for example, although probably only because I know that for the stuff I work on it simply cannot possibly ever do the wrong thing. In fact, we had a case of bogus CRLF in one of the kernel documentation files for some reason that we ended up fixing by hand. "autocflf=input" would have fixed it (except in that case it wouldn't have, since it came from the original kernel tree, long before crlf was an issue for git ;) So I'd say that autocrlf=input is quite possibly a good idea on Unix in general, but my gut feel is still that it's not a big enough issue to be actually worth making a default change over. But there's absolutely nothing wrong with having it as a policy at a company that has mixed Unix and Windows machines. (Every place I've ever been at, people who had a choice would never ever develop under Windows, so I've never seen any real mixing - even when some parts of the project were DOS/Windows stuff, there was a clear boundary between the stuff that was actually done under Windows) Linus -
I promised to think about the CRLF discussion and here is what
I believe we could do:
- Leave the current core.autocrlf mechanism as is.
- Add a mechanism to warn the user if an irreversible conversion happens
- After we have the mechanisms for configuring the conversion and for
configuring the safety level, we can decide which defaults to use on
the different platforms, namely Windows and Unix.
I propose to set the following defaults:
- Unix: core.autocrlf=input, core.safecrlf=warn
- Windows: core.autocrlf=true, core.safecrlf=warn
This patch is declared as WIP because tests and a documentation are missing.
I'm also not sure if calling warning() and die() is the right thing to do at
this place. Interestingly, in some (all?) cases, crlf_to_git() is called two
times for a path during git add, resulting in the warning printed two times. I
didn't yet analyze why this happens. Maybe the the warnings and errors printed
should be more verbose?
[ Linus, Dimitry was right about stats.lf. ]
Steffen
---- snip snap ---
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that containes a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this does not really matter because we do not care about
the line endings anyway; but for binary files that are
accidentally classified as text the conversion can result in
corrupted data.
If you recognize such corruption during commit you can easily fix
it by setting the conversion type explicitly in .gitattributes.
Right after committing you still have the original file in your
work tree and this file is not yet corrupted.
However, in mixed Windows/Unix environments text files quite
easily can end up containing a mixture of CRLF and LF line
endings and git should handle such situations gracefully. For
example a user could copy a CRLF file from Windows to Unix and
mix it with an existing LF fi...this is not, because if you really want to be sure that file will not be mangled
by checkout, you should not allow a text file with naked LF when autocrlf=true.
And the following lines after gather_stats() can cause:
/* No CR? Nothing to convert, regardless. */
if (!stats.cr)
return 0;
So, I propose a slightly different patch for convert.c:
diff --git a/convert.c b/convert.c
index 4df7559..9fd88d9 100644
--- a/convert.c
+++ b/convert.c
@@ -90,9 +90,6 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
return 0;
gather_stats(src, len, &stats);
- /* No CR? Nothing to convert, regardless. */
- if (!stats.cr)
- return 0;
if (action == CRLF_GUESS) {
/*
@@ -108,8 +105,23 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
*/
if (is_binary(len, &stats))
return 0;
+
+ if (safe_crlf) {
+ /* check if we have "naked" LFs */
+ if (stats.lf != stats.crlf) {
+ if (safe_crlf == SAFE_CRLF_WARN)
+ warning(
+ "Checkout will replace LFs with CRLF in %s", path);
+ else
+ die("Checkout would replace LFs with CRLF in %s", path);
+ }
+ }
}
+ /* No CR? Nothing to convert, regardless. */
+ if (!stats.cr)
+ return 0;
+
/* only grow if not in place */
if (strbuf_avail(buf) + buf->len < len)
strbuf_grow(buf, len - buf->len);
@@ -131,6 +143,16 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
if (! (c == '\r' && (1 < len && *src == '\n')))
*dst++ = c;
} while (--len);
+
+ if (safe_crlf && (action == CRLF_INPUT || auto_crlf <= 0)) {
+ /* autocrlf=input: check if we removed CRLFs */
+ if (buf->len != dst - buf->buf) {
+ if (safe_crlf == SAFE_CRLF_WARN)
+ warning("Stripped CRLF from %s.", path);
+ else
+ die("Refusing to strip CRLF from %s.", path);
+ }
+ }
}
strbuf_setlen(buf, dst - buf->buf);
return 1;
Dmitry
-This version gets the naked LF/autocrlf=true case right.
However, different from what Dimitry suggested, the safety check
is run for all cases that are irreversible. Dimitry suggested to
run it only for the CRLF_GUESS case. I believe this is not
sufficient: the explicit CFLF_TEXT case should also be checked.
The user explicitly marked the file as text but the conversion is
nonetheless irreversible in the current setting. This might be
unexpected and we should warn about it. Paranoid users can even
ask git to fail in this case. Such users would need to manually
fix the file, e.g. running dos2unix.
I also added basic tests.
A documentation is yet missing.
Steffen
---- snip snap ---
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that containes a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this does not really matter because we do not care about
the line endings anyway; but for binary files that are
accidentally classified as text the conversion can result in
corrupted data.
If you recognize such corruption during commit you can easily fix
it by setting the conversion type explicitly in .gitattributes.
Right after committing you still have the original file in your
work tree and this file is not yet corrupted.
However, in mixed Windows/Unix environments text files quite
easily can end up containing a mixture of CRLF and LF line
endings and git should handle such situations gracefully. For
example a user could copy a CRLF file from Windows to Unix and
mix it with an existing LF file there. The result would contain
both types of line endings.
Unfortunately, the desired effect of cleaning up text files
with mixed lineendings and undesired effect of corrupting binary
files can not be distinguished. In both cases CRLF are removed
in an irreversible way. For text files this is the right thing
to do, while for binary f...The reality I see is the other way around as common practice. For people that has never tried a Linux box the barrier is quite high and they prefer to stick with Windows. Where I work today and in several other places I know of the default choice is to work on Windows and use a Linux box only for cross compilation. This is common practice in many smaller embedded companies and it is also these companies that like to be able to build Linux on a Windows box. Sam -
And for those who have never tried Windows, it would be a great learning barrier as well, and it is far for obvious what would be easy to learn for someone has never had any experience with either of them before... Of course, most people who has used computers for some time could not escape having at least some experience with Windows, and, naturally people prefer to stick to what they know, especially those who do not like or find difficult to learn new stuff. Based on my observation, I would say that those found learning Linux difficult would also find difficult to learn other new things (like a new programming language), and usually had more troubles in dealing with novel situations or doing anything that required out-of-the-box thinking... Usually, they are good only on one thing -- doing what they were told. There are some exceptions, of course, but take a look at the number of open source projects (where people write for fun of programming) and compare how many of them are done by *nix users and Windows users. Isn't obvious what most people who like programing prefer to use? Dmitry -
Hi, Not in my world. I see a few people who are stuck to Windows, but they are so because they are lazy. They do not ever do something interesting with computers in their free time, and while working, they only do what they are told to do. That might sound cynical, but you will have to _show_ me different examples to make me reconsider. And no, my work with msysgit did a poor job to convince me otherwise. Ciao, Dscho -
Some of the people I have in my mind I will certainly not call lazy, but the I just wanted to say that things looks different in some places of the world nad for some types of development. I do not even know what I should try to make you reconsider - as I did not follow the full thread. Just stumbled over this statement. Sam -
You do not have to yell. Instead, just give yourself a pat in the back for having a brilliant foresight to give "path" parameter when you did 6c510bee2013022fbce52f4b0ec0cc593fc0cc48 (Lazy man's auto-CRLF) to convert_to_git() function, even though the code originally did not use it back then ;-). -
I think people may have different preferences about that. Some people may want to have text files with CRLF but others with LF. Some trust Git heuristic for detecting text files (which seems works rahter good for most commonly used formats) but others are paranoid about loss some data. Finally, there are some people, who just wants to store their messy files as is. Based on that, the following options are possible: 1. autocrlf=input for those who want LF and trust Git text heuristic 2. autocrlf=true is for those who want CRLF and trust Git text heuristic 3. autocrlf=fail for those who want LF but do not trust Git heuristic 4. autocrlf=safe for those who want CRLF but do not trust Git heuristic 5. autocrlf=false for those who like messy files with different EOLs All these options have been mentioned in this thread, and I don't think we are likely to come up with a better solution, because "better" depends in which category of people you fall. IMHO, #5 is the least reasonable of all. Dmitry -
