[PATCH] treat any file with NUL as binary

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Junio C Hamano <gitster@...>
Cc: <git@...>, Steffen Prohaska <prohaska@...>, Dmitry Potapov <dpotapov@...>
Date: Tuesday, January 15, 2008 - 9:59 pm

There are two heuristics in Git to detect whether a file is binary
or text. One in xdiff-interface.c (which is taken from GNU diff)
relies on existence of the NUL byte at the beginning. However,
convert.c used a different heuristic, which relied on the percent
of non-printable symbols (less than 1% for text files).

Due to differences in detection whether a file is binary or not,
it was possible that a file that diff treats as binary could be
treated as text by CRLF conversion. This is very confusing for a
user who sees that 'git diff' shows the file as binary expects it
to be added as binary.

This patch makes is_binary to consider any file that contains at
least one NUL character as binary, to ensure that the heuristics
used for CRLF conversion is tighter than what is used by diff.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---
 convert.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/convert.c b/convert.c
index 5adef4f..a51da1f 100644
--- a/convert.c
+++ b/convert.c
@@ -17,8 +17,8 @@
 #define CRLF_INPUT	2
 
 struct text_stat {
-	/* CR, LF and CRLF counts */
-	unsigned cr, lf, crlf;
+	/* NUL, CR, LF and CRLF counts */
+	unsigned nul, cr, lf, crlf;
 
 	/* These are just approximations! */
 	unsigned printable, nonprintable;
@@ -51,6 +51,9 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
 			case '\b': case '\t': case '\033': case '\014':
 				stats->printable++;
 				break;
+			case 0:
+				stats->nul++;
+				/* fall through */
 			default:
 				stats->nonprintable++;
 			}
@@ -66,6 +69,8 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
 static int is_binary(unsigned long size, struct text_stat *stats)
 {
 
+	if (stats->nul)
+		return 1;
 	if ((stats->printable >> 7) < stats->nonprintable)
 		return 1;
 	/*
-- 
1.5.3.5

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] treat any file with NUL as binary, Dmitry Potapov, (Tue Jan 15, 10:28 am)
Re: [PATCH] treat any file with NUL as binary, Junio C Hamano, (Tue Jan 15, 9:21 pm)
[PATCH] treat any file with NUL as binary, Dmitry Potapov, (Tue Jan 15, 9:59 pm)
Re: [PATCH] treat any file with NUL as binary, Steffen Prohaska, (Tue Jan 15, 5:03 pm)