Marco Costalba <mcostalba@yahoo.it> writes:I suspect that trying to have a globa single encoding is a wrong approach overall. There are a handful issues to think about. First the easiest one -- the commit log messages. We encourage use of UTF-8, because they hold people's names which are the significant source of i18n needs. Even if your project starts out with just a group of people whose names can be spelled in ASCII just fine, or all western european in which case you might be tempted to do latin-1 or 8859-15, or all Japanese with EUC-JP, your project _might_ someday have a member whose name cannot be spelled if you pick one of these encodings [*2*]. So we encourage use of UTF-8 and help people by having -u flag in git-mailinfo, for example. But we do not _enforce_ UTF-8. I've never worked actively with non-English free software projects, and I do not know if people in, say, Japan write their commit log messages in Japanese. I however think it is a reasonable thing to do if the project is local and exclusive (e.g. a company internal project). And if local encoding is more convenient to handle for users (e.g. their "less" is configured to expect something non UTF-8), it is perfectly reasonable to do all their commit log messages in their local encoding, as long as the members understand that is the project's policy. And it is probably reasonable to assume that within one project, a single encoding is used for commit log messages, although it may not be UTF-8. What this means for qgit is that it is often sufficient for its log browser to support one encoding at a time, provided if it allows the user to switch which encoding to use depending on what project is being viewed. The trouble you had is however not with commit messages, but with the blob contents (i.e. user data). First of all, the user data is just any binary goo for git, and the interpretation of it is left to the Porcelains and the users. There are two things to talk about here. I imagine qgit has, in addition to diff-between-revisions viewer, a blob viewer that lets your users browse a whole file in a given revision. The files in the Documentation directory in git.git/ right now happens to be encoded in UTF-8 because they are asciidoc sources. But other projects may use different encodings, and even a single project can have its i18n message files in different encodings and charsets in different files. Users probably want to be able to view all of them, even if they only understand a couple of languages and not others. What this means for qgit is that at least you should be able to show a whole file in a single encoding, but if you show more than two files at the same time, one in each window, these windows may be showing its contents in different encodings and charsets. So you would need to give a way to your users to tell you what encoding each file is in. Using global locale as the default and having a way to override that per file basis would be sufficient. A more interesting thing about user data is the diff between revisions. How should you show the diff between git-pack-redundant documentation that changes Lukas' name encoded from latin-1 to UTF-8? First, git-diff ("diff" it calls) is encoding agnostic. The lines that come out from it are taken from its input files and if two revisions you compared both store your words in Italian encoded in iso-8859-15, the output would be iso-8859-15. IOW, diff would end up doing the right thing but without knowing it. The Lukas' name encoding patch would be handled the same way. Diff output would just say "these lines have different bytes", and spit out '-' line encoded in latin-1 and '+' in UTF-8. This is a mess in one sense but not necessarily so. If we could say "Mr diff, the first file is in latin-1 and the second one is in UTF-8, so compare them accordingly, and by the way we would like the output in UTF-8 please", then it would be interesting and may be sometimes useful, but in this particular case it is useless -- there would not be any diff. And the reason _I_ ran diff before committing that change was to make sure I did the encoding conversion correctly, so noticing the byte-level differences was the right thing to do. What should qgit do about this? This "encoding gotcha fix patch" is a special case, so I think it is perfectly reasonable if qgit did not do anything special about it. So the "per-file encoding override" I suggested for whole-file viewer would be adequate. For the special "encoding fix" case, as I discussed above, the most useful output is "raw bytes", so maybe throw in a special encoding "raw" as one of the choices and I think you are done. [Footnote] *1* Some people may argue unicode cannot spell all names, but let's pretend it can -- or at least it can cover wider character sets than any single local encodings. Some people further may argue that iso-2022 does better, but let's not go there. *2* We do not forbid you to store arbitrary binary blob in your commit objects. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Sitsofe Wheeler | Reading EeePC900 battery info causes stalls (was Re: How how latent should non-pre... |
| Rafael J. Wysocki | [Bug #10954] hda_intel: azx_get_response timeout, switching to polling mode: last ... |
| Artem Bityutskiy | [RFC PATCH 06/26] UBIFS: add superblock and master node |
| Alan Cox | Re: TALPA - a threat model? well sorta. |
git: | |
| Michael Hendricks | removing content from git history |
| Daniel Berlin | git annotate runs out of memory |
| Abdelrazak Younes | Git-windows and git-svn? |
| Shawn O. Pearce | Re: git-import.sh using git-fast-import |
| Darrin Chandler | Re: bcw(4) is gone |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Stuart Henderson | Re: SMTP flood + spamdb |
| Theo de Raadt | Re: Richard Stallman... |
| Johannes Berg | mac80211 truesize bugs |
| Mike Galbraith | Re: [tbench regression fixes]: digging out smelly deadmen. |
| Florian Wiessner | Re: POHMELFS high performance network filesystem. Transactions, failover, performa... |
| Jussi Kivilinna | [PATCH v2 2/2] [iproute2/tc] hfsc: add link layer overhead adaption |
