[PATCH 4/4] utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: =?ISO-8859-15?Q?Ren=E9_Scharfe?=
Date: Friday, February 19, 2010 - 3:20 pm

is_utf8() works by calling utf8_width() for each character at the
supplied location.  In strbuf_add_wrapped_text(), we do that anyway
while wrapping the lines.  So instead of checking the encoding
beforehand, optimistically assume that it's utf-8 and wrap along
until an invalid character is hit, and when that happens start over.

This pays off if the text consists only of valid utf-8 characters.
The following command was run against the Linux kernel repo with
git 1.7.0:

	$ time git log --format='%b' v2.6.32 >/dev/null

	real	0m2.679s
	user	0m2.580s
	sys	0m0.100s

	$ time git log --format='%w(60,4,8)%b' >/dev/null

	real	0m4.342s
	user	0m4.230s
	sys	0m0.110s

And with this patch series:

	$ time git log --format='%w(60,4,8)%b' >/dev/null

	real	0m3.741s
	user	0m3.630s
	sys	0m0.110s

So the cost of wrapping is reduced to 70% in this case.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Missing: numbers for a non-utf-8 repo.

 utf8.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/utf8.c b/utf8.c
index 87437b0..84cfc72 100644
--- a/utf8.c
+++ b/utf8.c
@@ -324,16 +324,21 @@ static size_t display_mode_esc_sequence_len(const char *s)
  * consumed (and no extra indent is necessary for the first line).
  */
 int strbuf_add_wrapped_text(struct strbuf *buf,
-		const char *text, int indent, int indent2, int width)
+		const char *text, int indent1, int indent2, int width)
 {
-	int w = indent, assume_utf8 = is_utf8(text);
-	const char *bol = text, *space = NULL;
+	int indent, w, assume_utf8 = 1;
+	const char *bol, *space, *start = text;
+	size_t orig_len = buf->len;
 
 	if (width <= 0) {
-		strbuf_add_indented_text(buf, text, indent, indent2);
+		strbuf_add_indented_text(buf, text, indent1, indent2);
 		return 1;
 	}
 
+retry:
+	bol = text;
+	w = indent = indent1;
+	space = NULL;
 	if (indent < 0) {
 		w = -indent;
 		space = text;
@@ -385,9 +390,15 @@ new_line:
 			}
 			continue;
 		}
-		if (assume_utf8)
+		if (assume_utf8) {
 			w += utf8_width(&text, NULL);
-		else {
+			if (!text) {
+				assume_utf8 = 0;
+				text = start;
+				strbuf_setlen(buf, orig_len);
+				goto retry;
+			}
+		} else {
 			w++;
 			text++;
 		}
-- 
1.7.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/4] utf8.c: strbuf'ify strbuf_add_wrapped_text(), =?ISO-8859-15?Q?Ren= ..., (Fri Feb 19, 3:13 pm)
[PATCH 1/4] utf8.c: remove print_wrapped_text(), =?ISO-8859-15?Q?Ren= ..., (Fri Feb 19, 3:15 pm)
[PATCH 2/4] utf8.c: remove print_spaces(), =?ISO-8859-15?Q?Ren= ..., (Fri Feb 19, 3:15 pm)
[PATCH 3/4] utf8.c: remove strbuf_write(), =?ISO-8859-15?Q?Ren= ..., (Fri Feb 19, 3:16 pm)
[PATCH 4/4] utf8.c: speculatively assume utf-8 in strbuf_a ..., =?ISO-8859-15?Q?Ren= ..., (Fri Feb 19, 3:20 pm)
Re: [PATCH 0/4] utf8.c: strbuf'ify strbuf_add_wrapped_text(), Johannes Schindelin, (Sat Feb 20, 2:03 am)
Re: [PATCH 4/4] utf8.c: speculatively assume utf-8 in strb ..., Johannes Schindelin, (Sat Feb 20, 2:14 am)