Re: [PATCH 8/8] Add SVN dump parser

Previous thread: COX.NET SUPPORT by COX.NET WEBMAIL ACCOUNT on Wednesday, July 14, 2010 - 3:08 am. (1 message)

Next thread: [PATCH] Get rr/svn-fe-contrib merged by Ramkumar Ramachandra on Thursday, July 15, 2010 - 9:25 am. (2 messages)
From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:22 am

Hi,

I've decided to get this series merged now instead of waiting for the
ternary treap refactor. It's in excellent shape, thanks to Jonathan's
constant reviews/ fixes and David's constant refactoring. Since
Jonathan's last send, I've incoporated some suggestions from my own
review of Jonathan's series and split it into two series as I proposed
earlier; I'll send the second series (rr/contrib-svn-fe) shortly after
this.

Once this series is merged, I estimate that the following will come in
as incremental patches when the work is finished:
rr/ternary-trp-refactor
rr/zero-tree-refactor
rr/dumpfilev3-parser

Thanks.

-- Ram

David Barr (5):
  Add memory pool library
  Add string-specific memory pool
  Add stream helper library
  Add infrastructure to write revisions in fast-export format
  Add SVN dump parser

Jason Evans (1):
  Add treap implementation

Jonathan Nieder (2):
  Export parse_date_basic() to convert a date string to timestamp
  Introduce vcs-svn lib

 Makefile              |   12 ++-
 cache.h               |    1 +
 date.c                |   14 +--
 vcs-svn/LICENSE       |   33 +++++
 vcs-svn/fast_export.c |   75 +++++++++++
 vcs-svn/fast_export.h |   14 ++
 vcs-svn/line_buffer.c |   93 ++++++++++++++
 vcs-svn/line_buffer.h |   14 ++
 vcs-svn/obj_pool.h    |   80 ++++++++++++
 vcs-svn/repo_tree.c   |  335 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   26 ++++
 vcs-svn/string_pool.c |  114 +++++++++++++++++
 vcs-svn/string_pool.h |   15 +++
 vcs-svn/svndump.c     |  289 ++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h     |    8 ++
 vcs-svn/trp.h         |  220 ++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt       |  102 +++++++++++++++
 17 files changed, 1436 insertions(+), 9 deletions(-)
 create mode 100644 vcs-svn/LICENSE
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create ...
From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:23 am

[Empty message]
From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 12:09 pm

I still haven’t checked this implementation in detail, but it seemed
to work in practice and is about to change anyway.

I like the documentation updates.  What else changed from the
previous round?
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 12:18 pm

Hi again,


Nothing else. I should have posted a diff from last time.

-- Ram
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:22 am

From: David Barr <david.barr@cordelta.com>

Add a memory pool library implemented using C macros. The obj_pool_gen()
macro creates a type-specific memory pool API.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This allows the memory pool to persist between runs of an application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs, and for now it is simplest to avoid relying
on the target VCS for that.

obj_pool_gen(pre, obj_t, initial_capability)

	pre: Prefix for generated functions (example: string).
	obj_t: Type for treap data structure (example: char).
	initial_capacity: Initial size of the memory pool (example: 4096).

void pre_init(void);

	Read values from a previous run to initialize the pool.
	If this function is not called, the pool begins valid but empty.

uint32_t pre_alloc(uint32_t nmemb);

	Reserve space for a few objects in the pool and return an
	offset to the first one.

uint32_t pre_free(uint32_t nmemb);

	Unreserve the last few objects reserved.

uint32_t pre_offset(obj_t *pointer);
obj_t *pre_pointer(uint32_t offset);

	Convert between pointers into the in-memory pool and offsets
	from the beginning (or ~0 for the NULL pointer).  Pointers are
	not guaranteed to remain valid after a pre_alloc() operation
	or pre_reset() followed by pre_init(), but offsets are.

void pre_commit(void);

	Write the pool to file.  A pre_reset() followed by pre_init()
	(pehaps with exit() in between) will return the pool to the
	last committed state.

void pre_reset(void);

	Deinitialize ...
From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 11:57 am

Except as a proof of concept, this is the wrong API to have.  The problem
is that the caller cannot choose the filename, so it ends up being a .bin
file in the current directory, wherever that is.

The log message leaves out a subtlety: this also increases the
‘committed’ value, and bookkeeping for that might be useful to some
callers.


If you just want something working, I’d suggest stubbing this out:

 static MAYBE_UNUSED void pre##_init(void) \
 { \
 } \

It even almost makes sense as API: the _init function does all
initialization tasks required, which is to say, none.  (The {0, ...}

This can be simplified

 static MAYBE_UNUSED void pre##_commit(void) \
 { \
	pre##_pool.committed = pre##_pool.size; \
 } \

In other words, maybe something like this on top?  This includes the
vestigal _init() function which really should not be there (it is
confusing that some callers use it and others don’t).  I did not
spend much time on it because in the end I suspect we might throw
obj_pool away anyway.

---
diff --git a/Makefile b/Makefile
index fc31ee0..386a586 100644
--- a/Makefile
+++ b/Makefile
@@ -409,6 +409,7 @@ TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-match-trees
+TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 680d7d6..262f304 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -12,4 +12,10 @@ test_expect_success 'character classes (isspace, isalpha etc.)' '
 	test-ctype
 '
 
+test_expect_success 'allocator for svn importer' '
+	printf "%s\n" 0 0 0 0 0 2 2 2 2 2 -1 >expected &&
+	test-obj-pool >actual &&
+	test_cmp expected actual
+'
+
 test_done
diff --git a/test-obj-pool.c b/test-obj-pool.c
new file mode 100644
index 0000000..1300049
--- ...
From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 12:12 pm

Hi Jonathan,


Oh, right. I remember that you asked to turn off persistence for this
merge. We can include persistence it in a later series.

Junio: Could you squash this diff into the commit?

diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
index f60c872..7a256d4 100644
--- a/vcs-svn/obj_pool.h
+++ b/vcs-svn/obj_pool.h
@@ -20,17 +20,6 @@ static struct { \
 } pre##_pool = { 0, 0, 0, NULL, NULL}; \
 static MAYBE_UNUSED void pre##_init(void) \
 { \
-	struct stat st; \
-	pre##_pool.file = fopen(#pre ".bin", "a+"); \
-	rewind(pre##_pool.file); \
-	fstat(fileno(pre##_pool.file), &st); \
-	pre##_pool.size = st.st_size / sizeof(obj_t); \
-	pre##_pool.committed = pre##_pool.size; \
-	pre##_pool.capacity = pre##_pool.size * 2; \
-	if (pre##_pool.capacity < initial_capacity) \
-		pre##_pool.capacity = initial_capacity; \
-	pre##_pool.base = malloc(pre##_pool.capacity * sizeof(obj_t)); \
-	fread(pre##_pool.base, sizeof(obj_t), pre##_pool.size, pre##_pool.file); \
 } \
 static MAYBE_UNUSED uint32_t pre##_alloc(uint32_t count) \
 { \
@@ -62,9 +51,7 @@ static MAYBE_UNUSED obj_t *pre##_pointer(uint32_t offset) \
 } \
 static MAYBE_UNUSED void pre##_commit(void) \
 { \
-	pre##_pool.committed += fwrite(pre##_pool.base + pre##_pool.committed, \
-		sizeof(obj_t), pre##_pool.size - pre##_pool.committed, \
-		pre##_pool.file); \
+	pre##_pool.committed = pre##_pool.size; \
 } \
 static MAYBE_UNUSED void pre##_reset(void) \
 { \

--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:23 am

From: David Barr <david.barr@cordelta.com>

Intern strings so they can be compared by address and stored without
wasting space.

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile              |    4 +-
 vcs-svn/string_pool.c |  114 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/string_pool.h |   15 ++++++
 3 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h

diff --git a/Makefile b/Makefile
index 663a366..e11e588 100644
--- a/Makefile
+++ b/Makefile
@@ -1740,7 +1740,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS =
+VCSSVN_OBJS = vcs-svn/string_pool.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1864,7 +1864,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h vcs-svn/trp.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
new file mode 100644
index 0000000..bd5a380
--- /dev/null
+++ b/vcs-svn/string_pool.c
@@ -0,0 +1,114 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "trp.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+static struct trp_root tree = { ~0 };
+
+struct node {
+	uint32_t offset;
+	struct trp_node children;
+};
+
+/* Two memory pools: one for struct node, ...
From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:22 am

From: Jonathan Nieder <jrnieder@gmail.com>

Teach the build system to build a separate library for the
upcoming subversion interop support.

The resulting vcs-svn/lib.a does not contain any code, nor is
it built during a normal build.  This is just scaffolding for
later changes.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Makefile |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 9aca8a1..6441dcb 100644
--- a/Makefile
+++ b/Makefile
@@ -468,6 +468,7 @@ export PYTHON_PATH
 
 LIB_FILE=libgit.a
 XDIFF_LIB=xdiff/lib.a
+VCSSVN_LIB=vcs-svn/lib.a
 
 LIB_H += advice.h
 LIB_H += archive.h
@@ -1739,7 +1740,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS)
+VCSSVN_OBJS =
+OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
 dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS))))
@@ -1860,6 +1862,8 @@ http.o http-walker.o http-push.o remote-curl.o: http.h
 xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
+
+$(VCSSVN_OBJS):
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -1908,6 +1912,8 @@ $(LIB_FILE): $(LIB_OBJS)
 $(XDIFF_LIB): $(XDIFF_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(XDIFF_OBJS)
 
+$(VCSSVN_LIB): $(VCSSVN_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(VCSSVN_OBJS)
 
 doc:
 	$(MAKE) -C Documentation all
-- 
1.7.1

--

From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 10:46 am

$ make vcs-svn/lib.a V=1
 rm -f vcs-svn/lib.a && ar rcs vcs-svn/lib.a 
 ar: vcs-svn/lib.a: No such file or directory
 make: *** [vcs-svn/lib.a] Error 1

That is because the vcs-svn directory does not exist.  So
probably the LICENSE should be added with the same patch
(and git should learn to track empty directories).

Jonathan
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 12:15 pm

Hi Jonathan,


Oops. Sorry about not checking this: it looked alright at a
glance. Yes, we can add LICENSE with this patch.

Junio: Could you squash in this diff?

--- /dev/null
+++ b/vcs-svn/LICENSE
@@ -0,0 +1,26 @@
+Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice(s), this list of conditions and the following disclaimer
+   unmodified other than the allowable addition of one or more
+   copyright notices.
+2. Redistributions in binary form must reproduce the above copyright
+   notice(s), this list of conditions and the following disclaimer in
+   the documentation and/or other materials provided with the
+   distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-- Ram
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:23 am

From: David Barr <david.barr@cordelta.com>

This library provides thread-unsafe fgets()- and fread()-like
functions where the caller does not have to supply a buffer.  It
maintains a couple of static buffers and provides an API to use
them.

NEEDSWORK: what should buffer_copy_bytes do on error?

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile              |    5 ++-
 vcs-svn/line_buffer.c |   93 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/line_buffer.h |   14 +++++++
 3 files changed, 110 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h

diff --git a/Makefile b/Makefile
index e11e588..8223d9b 100644
--- a/Makefile
+++ b/Makefile
@@ -1740,7 +1740,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/string_pool.o
+VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1864,7 +1864,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
+	vcs-svn/line_buffer.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
new file mode 100644
index 0000000..0f83426
--- /dev/null
+++ b/vcs-svn/line_buffer.c
@@ -0,0 +1,93 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+
+#include "line_buffer.h"
+#include "obj_pool.h"
+
+#define LINE_BUFFER_LEN 10000
+#define ...
From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 12:19 pm

For consistency with the rest of vcs-svn, it should do nothing. :)

I would love to see svn-fe diagnosing and recovering somehow from
faulty input.  For now it follows the easier route of just ignoring
(and skipping) confusing input.

Probably this should be mentioned in the man page somewhere.


The next input/output operation will fail, causing svn-fe to quit
early, so it would not be easy for such an error to go unnoticed.
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:23 am

From: David Barr <david.barr@cordelta.com>

repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile              |    5 +-
 vcs-svn/fast_export.c |   75 +++++++++++
 vcs-svn/fast_export.h |   14 ++
 vcs-svn/repo_tree.c   |  335 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   26 ++++
 5 files changed, 453 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h

diff --git a/Makefile b/Makefile
index 8223d9b..7c66dcc 100644
--- a/Makefile
+++ b/Makefile
@@ -1740,7 +1740,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o
+VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1865,7 +1866,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 
 $(VCSSVN_OBJS): \
 ...
From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 12:28 pm

David tweaked the API nicely upstream to enforce this constraint.  So

The usual convention within git is to rely on .c files to include

Are the semicolons necessary?  (A nitpick, I know).

That said, this part is looking pretty good.
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:22 am

From: Jonathan Nieder <jrnieder@gmail.com>

approxidate() is not appropriate for reading machine-written dates
because it guesses instead of erroring out on malformed dates.
parse_date() is less convenient since it returns its output as a
string.  So export the underlying function that writes a timestamp.

While at it, change the return value to match the usual convention:
return 0 for success and -1 for failure.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 cache.h |    1 +
 date.c  |   14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..68258be 100644
--- a/cache.h
+++ b/cache.h
@@ -811,6 +811,7 @@ const char *show_date_relative(unsigned long time, int tz,
 			       char *timebuf,
 			       size_t timebuf_size);
 int parse_date(const char *date, char *buf, int bufsize);
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset);
 void datestamp(char *buf, int bufsize);
 #define approxidate(s) approxidate_careful((s), NULL)
 unsigned long approxidate_careful(const char *, int *);
diff --git a/date.c b/date.c
index 3c981f7..00f9eb5 100644
--- a/date.c
+++ b/date.c
@@ -586,7 +586,7 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
 
 /* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
    (i.e. English) day/month names, and it doesn't work correctly with %z. */
-int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
 {
 	struct tm tm;
 	int tm_gmt;
@@ -642,17 +642,16 @@ int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
 
 	if (!tm_gmt)
 		*timestamp -= *offset * 60;
-	return 1; /* success */
+	return 0; /* success */
 }
 
 int parse_date(const char *date, char *result, int ...
From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 10:25 am

Junio: I think this should be ejected from the series as an
independently useful cleanup.

Currently parse_date_toffset() is exported but not declared anywhere.
This patch gives it a more predictable API and adds a declaration.

Ram: thanks for the reminder.
--

From: Junio C Hamano
Date: Thursday, July 15, 2010 - 3:54 pm

Yeah, that makes sense.  What this patch does seems to be what c5043cc
(Refactor parse_date for approxidate functions, 2010-06-03) should have
done from the beginning.

Thanks.
--

From: Ramkumar Ramachandra
Date: Thursday, July 15, 2010 - 9:23 am

From: David Barr <david.barr@cordelta.com>

svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile          |    5 +-
 vcs-svn/LICENSE   |    4 +
 vcs-svn/svndump.c |  289 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h |    8 ++
 4 files changed, 304 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h

diff --git a/Makefile b/Makefile
index 7c66dcc..e7b37e0 100644
--- a/Makefile
+++ b/Makefile
@@ -1741,7 +1741,7 @@ endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
-	vcs-svn/repo_tree.o vcs-svn/fast_export.o
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1866,7 +1866,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
-	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h
+	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
+	vcs-svn/svndump.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index a3d384c..0a5e3c4 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -4,6 +4,10 @@ All rights reserved.
 Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
 All rights reserved.
 
+Copyright (C) ...
From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 12:52 pm

Probably worth mentioning the this requires a dumpfile v2 (i.e., it

A strbuf would do just as well.  But using obj_pool looks like a fine

Neat.  This is a textbook example of where to use a perfect hash, but
comparing interned strings is simpler and fast enough (and the

Unknown properties are ignored.  Adding stream comments to allow

A simple reader does not cope well with this kind of function.  It is
hard to know if it exhaustively deals with all cases.


Too long.  I realize that writing a state machine can be hard in C;
maybe it would be easiest to package up the state in a struct and
have a separate function for the main loop body.

The patches I didn’t comment on all look good.  I don’t think anything I did
comment on should prevent this reaching a wider audience.

Thanks for the pleasant read,
Jonathan
--

From: Jonathan Nieder
Date: Thursday, July 15, 2010 - 1:04 pm

Except for the obj_pool persistence bit, but you already commented on
that.

Happy travels,
Jonathan
--

From: Jonathan Nieder
Date: Friday, July 16, 2010 - 3:13 am

Hi,

While Ram travels halfway across the globe, I have rerolled at his
request.  You can find the result at
  git://repo.or.cz/git/jrn.git rr/svn-fe

The main difference from the version sent already is the addition of
tests and a small tweak to the treap implementation to make them
pass.  I will send the two patches with new tests as replies to this
message.  Thoughts welcome, as always.

David Barr (5):
  Add memory pool library
  vcs-svn: Add string-specific memory pool
  Add stream helper library
  Add infrastructure to write revisions in fast-export format
  Add SVN dump parser

Jason Evans (1):
  Add treap implementation

Jonathan Nieder (3):
  Export parse_date_basic() to convert a date string to timestamp
  Introduce vcs-svn lib
  Add a sample user for the svndump library

 .gitignore                |    2 +
 Makefile                  |   14 ++-
 cache.h                   |    1 +
 contrib/svn-fe/.gitignore |    4 +
 contrib/svn-fe/Makefile   |   64 +++++++++
 contrib/svn-fe/svn-fe.c   |   15 ++
 contrib/svn-fe/svn-fe.txt |   68 +++++++++
 date.c                    |   14 +-
 t/t0080-vcs-svn.sh        |  101 ++++++++++++++
 test-obj-pool.c           |  116 ++++++++++++++++
 test-treap.c              |   65 +++++++++
 vcs-svn/LICENSE           |   33 +++++
 vcs-svn/fast_export.c     |   75 ++++++++++
 vcs-svn/fast_export.h     |   14 ++
 vcs-svn/line_buffer.c     |   91 +++++++++++++
 vcs-svn/line_buffer.h     |   12 ++
 vcs-svn/obj_pool.h        |   61 +++++++++
 vcs-svn/repo_tree.c       |  331 +++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h       |   26 ++++
 vcs-svn/string_pool.c     |  101 ++++++++++++++
 vcs-svn/string_pool.h     |   12 ++
 vcs-svn/svndump.c         |  289 +++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h         |    8 +
 vcs-svn/trp.h             |  223 ++++++++++++++++++++++++++++++
 vcs-svn/trp.txt           |   98 +++++++++++++
 25 files changed, 1829 insertions(+), 9 deletions(-)
 create mode ...
From: Jonathan Nieder
Date: Friday, July 16, 2010 - 3:16 am

From: David Barr <david.barr@cordelta.com>

Add a memory pool library implemented using C macros. The
obj_pool_gen() macro creates a type-specific memory pool.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This could allow the memory pool to persist between runs of an
application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Stripped out pool_init.  And added tests!  There is not really
much an allocator can do, so it is fun to play around with.

 .gitignore         |    1 +
 Makefile           |    4 +-
 t/t0080-vcs-svn.sh |   79 +++++++++++++++++++++++++++++++++++
 test-obj-pool.c    |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/obj_pool.h |   61 +++++++++++++++++++++++++++
 5 files changed, 260 insertions(+), 1 deletions(-)
 create mode 100755 t/t0080-vcs-svn.sh
 create mode 100644 test-obj-pool.c
 create mode 100644 vcs-svn/obj_pool.h

diff --git a/.gitignore b/.gitignore
index 14e2b6b..1e64a6a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -167,6 +167,7 @@
 /test-genrandom
 /test-index-version
 /test-match-trees
+/test-obj-pool
 /test-parse-options
 /test-path-utils
 /test-run-command
diff --git a/Makefile b/Makefile
index d6a779b..3b873cd 100644
--- a/Makefile
+++ b/Makefile
@@ -409,6 +409,7 @@ TEST_PROGRAMS_NEED_X += test-delta
 ...
From: Jonathan Nieder
Date: Friday, July 16, 2010 - 3:23 am

From: Jason Evans <jasone@canonware.com>

Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Tweaked treap_search() to always return the node after a missing
node, like it is documented to.  For vcs-svn this doesn’t matter
but the predictable semantics should make debugging easier.

The rest of the patches are almost identical to the versions Ram
sent; see the aforementioned git tree if you are interested in
trying them out.  Testing would be quite welcome.

 .gitignore         |    1 +
 Makefile           |    3 +-
 t/t0080-vcs-svn.sh |   22 +++++
 test-treap.c       |   65 +++++++++++++++
 vcs-svn/LICENSE    |    3 +
 vcs-svn/trp.h      |  223 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt    |   98 +++++++++++++++++++++++
 7 files changed, 414 ...
From: Jonathan Nieder
Date: Friday, July 16, 2010 - 11:26 am

Or rather, it does.  Sorry about that.

-- 8< --
Subject: vcs-svn: treap_search should return NULL for missing items

In a misguided attempt to make the code match the documentation,
commit 4692f8e7d (Add treap implementation, 2010-07-15) changed
the semantics of treap_search to return the /next/ node when a
node is missing.

That is great in some circumstances (and the new tests even rely on
it), but the rest of vcs-svn relies on treap_search to return
NULL in that case instead.  The documentation only suggested
otherwise because of a typo.

So fix it: now treap_search can do what it was always supposed
to (return NULL on failure) and Jason Evans’s treap_nsearch function
can be used to keep the test suite working.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 test-treap.c    |    2 +-
 vcs-svn/trp.h   |   13 +++++++++++++
 vcs-svn/trp.txt |    9 +++++++--
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/test-treap.c b/test-treap.c
index eae7324..cdba511 100644
--- a/test-treap.c
+++ b/test-treap.c
@@ -52,7 +52,7 @@ int main(int argc, char *argv[])
 		next = node_offset(treap_next(&root, node_pointer(item)));
 
 		treap_remove(&root, node_pointer(item));
-		item = node_offset(treap_search(&root, tmp));
+		item = node_offset(treap_nsearch(&root, tmp));
 
 		if (item != next && (!~item || node_pointer(item)->n != tmp->n))
 			die("found %"PRIuMAX" in place of %"PRIuMAX"",
diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
index 49940cf..1f5f51f 100644
--- a/vcs-svn/trp.h
+++ b/vcs-svn/trp.h
@@ -138,6 +138,19 @@ a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
 	uint32_t ret = treap->trp_root; \
 	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
 		if (cmp < 0) { \
+			ret = trp_left_get(a_base, a_field, ret); \
+		} else { \
+			ret = trp_right_get(a_base, a_field, ret); \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##nsearch(struct ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 2:57 pm

Hi!

svn-fe has some serious changes on the horizon.  As a preparation,
let’s round up what we have now.

The most controversial change is probably the new svn-fe test, which
takes about 15 seconds (for the “svnadmin load”, not the svn-fe
step :)).  It is in the t9* series, so hopefully that will not
dissuade people from running the earlier tests.

The main highlight in the changes is a new

	Input error

to stderr if a system call failed in reading in the dump file.
It still returns status 0 in this and other error situations,
though.

Based on maint (for no good reason; that’s just where I tried it).
Intended to replace rr/svn-export in pu (only if Ram likes it, of
course).

Thoughts welcome.

David Barr (5):
  Add memory pool library
  Add string-specific memory pool
  Add stream helper library
  Infrastructure to write revisions in fast-export format
  SVN dump parser

Jason Evans (1):
  Add treap implementation

Jonathan Nieder (4):
  Export parse_date_basic() to convert a date string to timestamp
  Introduce vcs-svn lib
  Update svn-fe manual
  svn-fe manual: Clarify warning about deltas in dumpfiles

 .gitignore                |    5 +
 Makefile                  |   25 +++-
 cache.h                   |    1 +
 contrib/svn-fe/svn-fe.c   |    1 +
 contrib/svn-fe/svn-fe.txt |   19 ++--
 date.c                    |   14 +-
 t/t0080-vcs-svn.sh        |  171 +++++++++++++++++++++++
 t/t9010-svn-fe.sh         |   32 +++++
 test-line-buffer.c        |   46 +++++++
 test-obj-pool.c           |  116 ++++++++++++++++
 test-string-pool.c        |   31 +++++
 test-svn-fe.c             |   18 +++
 test-treap.c              |   65 +++++++++
 vcs-svn/LICENSE           |   33 +++++
 vcs-svn/fast_export.c     |   74 ++++++++++
 vcs-svn/fast_export.h     |   11 ++
 vcs-svn/line_buffer.c     |  102 ++++++++++++++
 vcs-svn/line_buffer.h     |   12 ++
 vcs-svn/line_buffer.txt   |   62 +++++++++
 vcs-svn/obj_pool.h        |   61 +++++++++
 vcs-svn/repo_tree.c       ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:01 pm

approxidate() is not appropriate for reading machine-written dates
because it guesses instead of erroring out on malformed dates.
parse_date() is less convenient since it returns its output as a
string.  So export the underlying function that writes a timestamp.

While at it, change the return value to match the usual convention:
return 0 for success and -1 for failure.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
As before, I think this improves code clarity, independently of
its use for svn-fe.  So I would not be unhappy if it is applied
as a separate topic.

No change from last round.

 cache.h |    1 +
 date.c  |   14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..68258be 100644
--- a/cache.h
+++ b/cache.h
@@ -811,6 +811,7 @@ const char *show_date_relative(unsigned long time, int tz,
 			       char *timebuf,
 			       size_t timebuf_size);
 int parse_date(const char *date, char *buf, int bufsize);
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset);
 void datestamp(char *buf, int bufsize);
 #define approxidate(s) approxidate_careful((s), NULL)
 unsigned long approxidate_careful(const char *, int *);
diff --git a/date.c b/date.c
index 3c981f7..00f9eb5 100644
--- a/date.c
+++ b/date.c
@@ -586,7 +586,7 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
 
 /* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
    (i.e. English) day/month names, and it doesn't work correctly with %z. */
-int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
 {
 	struct tm tm;
 	int tm_gmt;
@@ -642,17 +642,16 @@ int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
 
 	if (!tm_gmt)
 		*timestamp -= *offset * 60;
-	return 1; /* success */
+	return ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:04 pm

Teach the build system to build a separate library for the
upcoming subversion interop support.

The resulting vcs-svn/lib.a does not contain any code, nor is
it built during a normal build.  This is just scaffolding for
later changes.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
This is just for reference; no change from last round.

 Makefile        |    8 +++++++-
 vcs-svn/LICENSE |   26 ++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletions(-)
 create mode 100644 vcs-svn/LICENSE

diff --git a/Makefile b/Makefile
index f33648d..71cca35 100644
--- a/Makefile
+++ b/Makefile
@@ -468,6 +468,7 @@ export PYTHON_PATH
 
 LIB_FILE=libgit.a
 XDIFF_LIB=xdiff/lib.a
+VCSSVN_LIB=vcs-svn/lib.a
 
 LIB_H += advice.h
 LIB_H += archive.h
@@ -1739,7 +1740,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS)
+VCSSVN_OBJS =
+OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
 dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS))))
@@ -1860,6 +1862,8 @@ http.o http-walker.o http-push.o remote-curl.o: http.h
 xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
+
+$(VCSSVN_OBJS):
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -1908,6 +1912,8 @@ $(LIB_FILE): $(LIB_OBJS)
 $(XDIFF_LIB): $(XDIFF_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(XDIFF_OBJS)
 
+$(VCSSVN_LIB): $(VCSSVN_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(VCSSVN_OBJS)
 
 doc:
 	$(MAKE) -C Documentation all
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
new file mode 100644
index 0000000..6e52372
--- ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:11 pm

From: David Barr <david.barr@cordelta.com>

Add a memory pool library implemented using C macros. The
obj_pool_gen() macro creates a type-specific memory pool.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This could allow the memory pool to persist between runs of an
application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs.

[rr: minor cleanups]
[jn: added tests; removed file system backing for now]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
The only change from last round is the notes at the end of the commit
message.  Hopefully David is less likely to be blamed for bugs I
introduced this way. :)

 .gitignore         |    1 +
 Makefile           |    4 +-
 t/t0080-vcs-svn.sh |   79 +++++++++++++++++++++++++++++++++++
 test-obj-pool.c    |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/obj_pool.h |   61 +++++++++++++++++++++++++++
 5 files changed, 260 insertions(+), 1 deletions(-)
 create mode 100755 t/t0080-vcs-svn.sh
 create mode 100644 test-obj-pool.c
 create mode 100644 vcs-svn/obj_pool.h

diff --git a/.gitignore b/.gitignore
index 14e2b6b..1e64a6a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -167,6 +167,7 @@
 /test-genrandom
 /test-index-version
 /test-match-trees
+/test-obj-pool
 /test-parse-options
 /test-path-utils
 /test-run-command
diff --git a/Makefile b/Makefile
index ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:17 pm

From: Jason Evans <jasone@canonware.com>

Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]
[rr: Squelched compiler warnings]
[db: Added support for immutable treap nodes]
[jn: Reintroduced treap_nsearch(); with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
With the treap_nsearch() fixup from last time squashed in and some
more history in the log message.

 .gitignore         |    1 +
 Makefile           |    3 +-
 t/t0080-vcs-svn.sh |   22 +++++
 test-treap.c       |   65 ++++++++++++++
 vcs-svn/LICENSE    |    3 +
 vcs-svn/trp.h      |  236 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt    |  103 +++++++++++++++++++++++
 7 files changed, 432 insertions(+), 1 deletions(-)
 create mode 100644 test-treap.c
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

diff --git a/.gitignore b/.gitignore
index 1e64a6a..af47653 100644
--- a/.gitignore
+++ b/.gitignore
@@ -173,6 +173,7 @@
 /test-run-command
 /test-sha1
 ...
From: Junio C Hamano
Date: Thursday, August 12, 2010 - 10:22 am

SP after "," (same for nsearch)
--

From: Jonathan Nieder
Date: Thursday, August 12, 2010 - 3:02 pm

Good catch.  checkpatch also notices some long lines, but I think
that’s worth ignoring.

-- 8< --
Subject: treap: style fix

Missing spaces in while (0) and trpn_pointer(a, b).

Remove parentheses around return value.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/trp.h |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
index 1f5f51f..ee35c68 100644
--- a/vcs-svn/trp.h
+++ b/vcs-svn/trp.h
@@ -37,7 +37,7 @@ struct trp_root {
 			*trpn_pointer(a_base, a_offset) = \
 				*trpn_pointer(a_base, old_offset); \
 		} \
-	} while (0);
+	} while (0)
 
 /* Left accessors. */
 #define trp_left_get(a_base, a_field, a_node) \
@@ -46,7 +46,7 @@ struct trp_root {
 	do { \
 		trpn_modify(a_base, a_node); \
 		trp_left_get(a_base, a_field, a_node) = (a_left); \
-	} while(0)
+	} while (0)
 
 /* Right accessors. */
 #define trp_right_get(a_base, a_field, a_node) \
@@ -55,7 +55,7 @@ struct trp_root {
 	do { \
 		trpn_modify(a_base, a_node); \
 		trp_right_get(a_base, a_field, a_node) = (a_right); \
-	} while(0)
+	} while (0)
 
 /*
  * Fibonacci hash function.
@@ -72,7 +72,7 @@ struct trp_root {
 	do { \
 		trp_left_set(a_base, a_field, (a_node), ~0); \
 		trp_right_set(a_base, a_field, (a_node), ~0); \
-	} while(0)
+	} while (0)
 
 /* Internal utility macros. */
 #define trpn_first(a_base, a_field, a_root, r_node) \
@@ -90,7 +90,7 @@ struct trp_root {
 		trp_right_set(a_base, a_field, (a_node), \
 			trp_left_get(a_base, a_field, (r_node))); \
 		trp_left_set(a_base, a_field, (r_node), (a_node)); \
-	} while(0)
+	} while (0)
 
 #define trpn_rotate_right(a_base, a_field, a_node, r_node) \
 	do { \
@@ -98,7 +98,7 @@ struct trp_root {
 		trp_left_set(a_base, a_field, (a_node), \
 			trp_right_get(a_base, a_field, (r_node))); \
 		trp_right_set(a_base, a_field, (r_node), (a_node)); \
-	} while(0)
+	} while (0)
 
 #define trp_gen(a_attr, a_pre, a_type, a_field, ...
From: Jonathan Nieder
Date: Thursday, August 12, 2010 - 3:11 pm

Here are a few more (but feel free to ignore them).

-- 8< --
Subject: Standardize do { ... } while (0) style

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 base85.c   |    6 +++---
 cache.h    |    2 +-
 diffcore.h |    8 ++++----
 http.h     |    4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/base85.c b/base85.c
index e459fee..781b575 100644
--- a/base85.c
+++ b/base85.c
@@ -7,9 +7,9 @@
 #define say1(a,b) fprintf(stderr, a, b)
 #define say2(a,b,c) fprintf(stderr, a, b, c)
 #else
-#define say(a) do {} while(0)
-#define say1(a,b) do {} while(0)
-#define say2(a,b,c) do {} while(0)
+#define say(a) do { /* nothing */ } while (0)
+#define say1(a,b) do { /* nothing */ } while (0)
+#define say2(a,b,c) do { /* nothing */ } while (0)
 #endif
 
 static const char en85[] = {
diff --git a/cache.h b/cache.h
index 68258be..37ef9d8 100644
--- a/cache.h
+++ b/cache.h
@@ -449,7 +449,7 @@ extern int init_db(const char *template_dir, unsigned int flags);
 				alloc = alloc_nr(alloc); \
 			x = xrealloc((x), alloc * sizeof(*(x))); \
 		} \
-	} while(0)
+	} while (0)
 
 /* Initialize and use the cache information */
 extern int read_index(struct index_state *);
diff --git a/diffcore.h b/diffcore.h
index fed9b15..05ebc11 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -98,7 +98,7 @@ struct diff_queue_struct {
 		(q)->queue = NULL; \
 		(q)->nr = (q)->alloc = 0; \
 		(q)->run = 0; \
-	} while(0);
+	} while (0)
 
 extern struct diff_queue_struct diff_queued_diff;
 extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
@@ -118,9 +118,9 @@ void diff_debug_filespec(struct diff_filespec *, int, const char *);
 void diff_debug_filepair(const struct diff_filepair *, int);
 void diff_debug_queue(const char *, struct diff_queue_struct *);
 #else
-#define diff_debug_filespec(a,b,c) do {} while(0)
-#define diff_debug_filepair(a,b) do {} while(0)
-#define diff_debug_queue(a,b) do {} while(0)
+#define diff_debug_filespec(a,b,c) do { ...
From: Junio C Hamano
Date: Thursday, August 12, 2010 - 3:44 pm

This is a _bad_ one.  Thanks.
--

From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:34 pm

From: David Barr <david.barr@cordelta.com>

Intern strings so they can be compared by address and stored without
wasting space.

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

[rr: added API docs]
[jn: with some API simplifications, new documentation and tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
New test.  The return value from pool_tok_seq is not checked by
the vcs-svn lib but trying to use it in tests revealed it was not
so intuitive.  pool_tok_seq() was behaving strangely when passed
an array of size 0; I think there is nothing sane to do in that
case --- maybe it should abort().  The API was passing around
char * that cannot be modified; changed to const char *.

Another set of eyes on this would be welcome.

 .gitignore              |    1 +
 Makefile                |    9 +++-
 t/t0080-vcs-svn.sh      |   16 +++++++
 test-string-pool.c      |   31 ++++++++++++++
 vcs-svn/string_pool.c   |  102 +++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/string_pool.h   |   11 +++++
 vcs-svn/string_pool.txt |   43 ++++++++++++++++++++
 7 files changed, 210 insertions(+), 3 deletions(-)
 create mode 100644 test-string-pool.c
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h
 create mode 100644 vcs-svn/string_pool.txt

diff --git a/.gitignore b/.gitignore
index af47653..9f109db 100644
--- a/.gitignore
+++ b/.gitignore
@@ -173,6 +173,7 @@
 /test-run-command
 /test-sha1
 /test-sigchain
+/test-string-pool
 /test-treap
 /common-cmds.h
 *.tar.gz
diff --git a/Makefile b/Makefile
index e7c33ec..24103c9 100644
--- a/Makefile
+++ b/Makefile
@@ -415,6 +415,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += ...
From: Junio C Hamano
Date: Thursday, August 12, 2010 - 10:22 am

Please fix decl-after-stmt here.
--

From: Jonathan Nieder
Date: Thursday, August 12, 2010 - 2:30 pm

Good catch.  Here’s a fixup for patch 2 (“Introduce vcs-svn lib”).

-- 8< --
Subject: vcs-svn: remove build artifacts on “make clean”

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
diff --git a/Makefile b/Makefile
index 2418820..24c4b3d 100644
--- a/Makefile
+++ b/Makefile
@@ -2184,8 +2184,8 @@ distclean: clean
 	$(RM) configure
 
 clean:
-	$(RM) *.o block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o \
-		builtin/*.o $(LIB_FILE) $(XDIFF_LIB)
+	$(RM) *.o block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o vcs-svn/*.o \
+		builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) -r bin-wrappers
-- 
--

From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:39 pm

From: David Barr <david.barr@cordelta.com>

This library provides thread-unsafe fgets()- and fread()-like
functions where the caller does not have to supply a buffer.  It
maintains a couple of static buffers and provides an API to use
them.

[rr: allow input from files other than stdin]
[jn: with tests, documentation, and error handling improvements]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
New tests and API docs.  The return value from buffer_deinit
can be used to check for errors now (I found this useful when
writing tests).

 .gitignore              |    1 +
 Makefile                |    8 +++-
 t/t0080-vcs-svn.sh      |   54 ++++++++++++++++++++++++++
 test-line-buffer.c      |   46 ++++++++++++++++++++++
 vcs-svn/line_buffer.c   |   97 +++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/line_buffer.h   |   12 ++++++
 vcs-svn/line_buffer.txt |   58 ++++++++++++++++++++++++++++
 7 files changed, 274 insertions(+), 2 deletions(-)
 create mode 100644 test-line-buffer.c
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create mode 100644 vcs-svn/line_buffer.txt

diff --git a/.gitignore b/.gitignore
index 9f109db..8c0512e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -166,6 +166,7 @@
 /test-dump-cache-tree
 /test-genrandom
 /test-index-version
+/test-line-buffer
 /test-match-trees
 /test-obj-pool
 /test-parse-options
diff --git a/Makefile b/Makefile
index 24103c9..a76cce5 100644
--- a/Makefile
+++ b/Makefile
@@ -408,6 +408,7 @@ TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
+TEST_PROGRAMS_NEED_X += test-line-buffer
 TEST_PROGRAMS_NEED_X += test-match-trees
 TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
@@ -1743,7 +1744,7 @@ ifndef NO_CURL
 endif
 ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:48 pm

From: David Barr <david.barr@cordelta.com>

repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

[rr: squelched compiler warnings]
[jn: removed support for maintaining state on-disk, though we may
want to add it back later]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
No tests for this one, since the next few patches exercise it and this
is not a general-purpose API.  Relative to last round, the
`pool_commit` and `commit_commit` calls have been eliminated; unlike
the `dir_commit` et al calls, those were only meant for committing
state to disk, and the changing high-water mark was not being used.

 Makefile              |    5 +-
 vcs-svn/fast_export.c |   74 +++++++++++
 vcs-svn/fast_export.h |   11 ++
 vcs-svn/repo_tree.c   |  328 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   26 ++++
 5 files changed, 442 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h

diff --git a/Makefile b/Makefile
index a76cce5..b873399 100644
--- a/Makefile
+++ b/Makefile
@@ -1744,7 +1744,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 ...
From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:55 pm

From: David Barr <david.barr@cordelta.com>

svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

[rr: allow input from files other than stdin]
[jn: with test, more error reporting]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
New test.  It is slow; work by svn gurus to speed this up would
be nice.  The test program is very similar to svn-fe from contrib,
except it exercises Ram’s change to read from a file other than
stdin.

 .gitignore              |    1 +
 Makefile                |    8 +-
 contrib/svn-fe/svn-fe.c |    1 +
 t/t9010-svn-fe.sh       |   32 +++++
 test-svn-fe.c           |   18 +++
 vcs-svn/LICENSE         |    4 +
 vcs-svn/svndump.c       |  302 +++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h       |    9 ++
 8 files changed, 373 insertions(+), 2 deletions(-)
 create mode 100644 t/t9010-svn-fe.sh
 create mode 100644 test-svn-fe.c
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h

diff --git a/.gitignore b/.gitignore
index 8c0512e..258723f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -175,6 +175,7 @@
 /test-sha1
 /test-sigchain
 /test-string-pool
+/test-svn-fe
 /test-treap
 /common-cmds.h
 *.tar.gz
diff --git a/Makefile b/Makefile
index b873399..6228f66 100644
--- a/Makefile
+++ b/Makefile
@@ -417,6 +417,7 @@ TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
 TEST_PROGRAMS_NEED_X += test-string-pool
+TEST_PROGRAMS_NEED_X += test-svn-fe
 TEST_PROGRAMS_NEED_X += test-treap
 TEST_PROGRAMS_NEED_X += test-index-version
 
@@ -1745,7 +1746,7 @@ endif
 ...
From: Junio C Hamano
Date: Thursday, August 12, 2010 - 10:22 am

Style: static char *log_copy(...)

--

From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:55 pm

The svn-fe example does not litter the working directory with
.bin files any more (hoorah!).

The permissive error handling implies a known bug.  We should
be flagging iffy input and, even if we continue, reporting it
on exit.

Cc: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/svn-fe/svn-fe.txt |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index de30f83..fb0ee56 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -43,11 +43,9 @@ user <user@UUID>
 as committer, where 'user' is the value of the `svn:author` property
 and 'UUID' the repository's identifier.
 
-To support incremental imports, 'svn-fe' will put a `git-svn-id`
-line at the end of each commit log message if passed an url on the
-command line.  This line has the form `git-svn-id: URL@REVNO UUID`.
-
-Empty directories and unknown properties are silently discarded.
+To support incremental imports, 'svn-fe' puts a `git-svn-id` line at
+the end of each commit log message if passed an url on the command
+line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
@@ -56,9 +54,9 @@ may be useful for this purpose.
 
 BUGS
 ----
-Litters the current working directory with .bin files for
-persistence. Will be fixed when the svn-fe infrastructure is aware of
-a Git working directory.
+Empty directories and unknown properties are silently discarded.
+
+The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-- 
1.7.2.1.544.ga752d.dirty

--

From: Jonathan Nieder
Date: Monday, August 9, 2010 - 3:58 pm

Those in the know would notice that dump file format version 2
means "svnadmin dump --no-deltas", but for the rest of us, an
explicit reminder is useful.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That’s the end of the series.  Thanks for reading.

On the horizon, as you may have guessed, are changes to use
dumpfile format v3.  Doing so sanely requires two-way communication
with fast-import, I think (as discussed).  Ram has already put
together a prototype delta applier, so it seems to be mostly a matter
of plumbing now.

 contrib/svn-fe/svn-fe.txt |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index fb0ee56..35f84bd 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -12,7 +12,7 @@ svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
 DESCRIPTION
 -----------
 
-Converts a Subversion dumpfile (version: 2) into input suitable for
+Converts a Subversion dumpfile into input suitable for
 git-fast-import(1) and similar importers. REPO is a path to a
 Subversion repository mirrored on the local disk. Remote Subversion
 repositories can be mirrored on local disk using the `svnsync`
@@ -25,6 +25,9 @@ Subversion's repository dump format is documented in full in
 Files in this format can be generated using the 'svnadmin dump' or
 'svk admin dump' command.
 
+Dumps produced with 'svnadmin dump --deltas' (dumpfile format v3)
+are not supported.
+
 OUTPUT FORMAT
 -------------
 The fast-import format is documented by the git-fast-import(1)
-- 
1.7.2.1.544.ga752d.dirty

--

From: Ramkumar Ramachandra
Date: Tuesday, August 10, 2010 - 5:53 am

Hi Jonathan,




Thanks for re-rolling (again)! You've also added a note to the commit
messages briefly explaining what each contributor has done. I'd
expected some incremental patches instead of a full re-roll, but




We have to fix this real soon- I'm waiting for the weekend so I get
some solid chunks of hacking time.

-- Ram
--

From: Jonathan Nieder
Date: Tuesday, August 10, 2010 - 6:53 pm

Yeah, I think after this series the topic has stabilized enough

Thanks for checking.
--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:34 pm

Hi,

The svndiff format has proved more difficult to parse than expected.
This series documents the current state of things, and though it is
not complete, it should be ready for nitpicking by the masses.

Patches 1-4 modify the line_buffer API by introducing a struct
line_buffer to collect state that was previously held in global
variables.  Callers can use multiple line_buffers to manage input from
multiple files at a time.

Patches 5-10 add various utility functions to the line_buffer API
(wrapping strbuf_fread(), fgetc(), etc).  Putting the helpers there
instead of having callers work with the FILE* directly means one
could easily

 - tweak the input stream (to insert "link: " at the beginning
   for symlinks?);
 - trace reads, for debugging; or
 - use read() directly in place of stdio and limit the number of bytes
   buffered

if one wants to.

Patch 11 adds a data structure and function to manage a "sliding
window" without using mmap() or fseek().  See the svndiff0 spec[1] for
how this would be used.

Patches 12 and 13 are some basic components for reading an svndiff0
file: reading variable-length integers and the opening magic bytes.

Patch 15 makes the svn-fe test usable on systems (like Ram's) without
libsvn-perl installed.  It also should make the test easier to read
for people unfamiliar with lib-git-svn.sh.

Patch 16 is the delta parser/applier.  This patch does _not_ add it to
contrib/svn-fe, even though that would be useful, since the
command-line interface is not set in stone yet.  If you want to try it
out, use the test-svn-fe command:

	test-svn-fe -d <preimage> <delta> <delta length>

The preimage or delta arg can be /dev/stdin for use in a pipeline.
Both are only read sequentially; they do not need to be regular files.

One of the test cases is enormous.  The svn delta lib doesn't use
multiple windows except when dealing with relatively big files, but
probably the test case should be replaced with a smaller, artificial
example.

One of the ...
From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:37 pm

The data stored in byte_buffer[] is always either discarded or
written to stdout immediately.  No need for it to persist between
function calls.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 1543567..f22c94f 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -14,7 +14,6 @@
 obj_pool_gen(blob, char, 4096)
 
 static char line_buffer[LINE_BUFFER_LEN];
-static char byte_buffer[COPY_BUFFER_LEN];
 static FILE *infile;
 
 int buffer_init(const char *filename)
@@ -68,6 +67,7 @@ char *buffer_read_string(uint32_t len)
 
 void buffer_copy_bytes(uint32_t len)
 {
+	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
 	while (len > 0 && !feof(infile) && !ferror(infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
@@ -83,6 +83,7 @@ void buffer_copy_bytes(uint32_t len)
 
 void buffer_skip_bytes(uint32_t len)
 {
+	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
 	while (len > 0 && !feof(infile) && !ferror(infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:39 pm

Prepare for the line_buffer lib to support input from multiple files,
by collecting global state in a struct that can be easily passed around.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |   45 ++++++++++++++++++++++-----------------------
 vcs-svn/line_buffer.h |   11 +++++++++++
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 6f32f28..e7bc230 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -7,17 +7,16 @@
 #include "line_buffer.h"
 #include "strbuf.h"
 
-#define LINE_BUFFER_LEN 10000
 #define COPY_BUFFER_LEN 4096
-
-static char line_buffer[LINE_BUFFER_LEN];
-static struct strbuf blob_buffer = STRBUF_INIT;
-static FILE *infile;
+static struct line_buffer buf_ = LINE_BUFFER_INIT;
+static struct line_buffer *buf;
 
 int buffer_init(const char *filename)
 {
-	infile = filename ? fopen(filename, "r") : stdin;
-	if (!infile)
+	buf = &buf_;
+
+	buf->infile = filename ? fopen(filename, "r") : stdin;
+	if (!buf->infile)
 		return -1;
 	return 0;
 }
@@ -25,10 +24,10 @@ int buffer_init(const char *filename)
 int buffer_deinit(void)
 {
 	int err;
-	if (infile == stdin)
-		return ferror(infile);
-	err = ferror(infile);
-	err |= fclose(infile);
+	if (buf->infile == stdin)
+		return ferror(buf->infile);
+	err = ferror(buf->infile);
+	err |= fclose(buf->infile);
 	return err;
 }
 
@@ -36,13 +35,13 @@ int buffer_deinit(void)
 char *buffer_read_line(void)
 {
 	char *end;
-	if (!fgets(line_buffer, sizeof(line_buffer), infile))
+	if (!fgets(buf->line_buffer, sizeof(buf->line_buffer), buf->infile))
 		/* Error or data exhausted. */
 		return NULL;
-	end = line_buffer + strlen(line_buffer);
+	end = buf->line_buffer + strlen(buf->line_buffer);
 	if (end[-1] == '\n')
 		end[-1] = '\0';
-	else if (feof(infile))
+	else if (feof(buf->infile))
 		; /* No newline at end of file.  That's fine. */
 	else
 		/*
@@ -51,23 +50,23 @@ char ...
From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:41 pm

Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

The Subversion-format delta applier will use this to stream a delta
and the preimage it applies to at the same time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.txt |    5 +++--
 vcs-svn/fast_export.c   |    6 +++---
 vcs-svn/fast_export.h   |    5 ++++-
 vcs-svn/line_buffer.c   |   20 ++++++++------------
 vcs-svn/line_buffer.h   |   14 +++++++-------
 vcs-svn/svndump.c       |   29 ++++++++++++++++-------------
 test-line-buffer.c      |   17 +++++++++--------
 7 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/vcs-svn/line_buffer.txt b/vcs-svn/line_buffer.txt
index 8906fb1..f8eaa4d 100644
--- a/vcs-svn/line_buffer.txt
+++ b/vcs-svn/line_buffer.txt
@@ -14,14 +14,15 @@ Calling sequence
 
 The calling program:
 
+ - initializes a `struct line_buffer` to LINE_BUFFER_INIT
  - specifies a file to read with `buffer_init`
  - processes input with `buffer_read_line`, `buffer_read_string`,
    `buffer_skip_bytes`, and `buffer_copy_bytes`
  - closes the file with `buffer_deinit`, perhaps to start over and
    read another file.
 
-Before exiting, the caller can use `buffer_reset` to deallocate
-resources for the benefit of profiling tools.
+When finished, the caller can use `buffer_reset` to deallocate
+resources.
 
 Functions
 ---------
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 6cfa256..260cf50 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -63,14 +63,14 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link ...
From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:44 pm

Tell the caller know how many bytes were actually skipped.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.txt |    3 ++-
 vcs-svn/line_buffer.c   |   15 ++++++++-------
 vcs-svn/line_buffer.h   |    2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/vcs-svn/line_buffer.txt b/vcs-svn/line_buffer.txt
index f8eaa4d..d06db24 100644
--- a/vcs-svn/line_buffer.txt
+++ b/vcs-svn/line_buffer.txt
@@ -53,7 +53,8 @@ Functions
 
 `buffer_skip_bytes`::
 	Discards `len` bytes from the input stream (stopping early
-	if necessary because of an error or eof).
+	if necessary because of an error or eof).  Return value is
+	the number of bytes successfully read.
 
 `buffer_reset`::
 	Deallocates non-static buffers.
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 806932b..999368b 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -72,15 +72,16 @@ void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
 	}
 }
 
-void buffer_skip_bytes(struct line_buffer *buf, uint32_t len)
+uint32_t buffer_skip_bytes(struct line_buffer *buf, uint32_t nbytes)
 {
-	char byte_buffer[COPY_BUFFER_LEN];
-	uint32_t in;
-	while (len > 0 && !feof(buf->infile) && !ferror(buf->infile)) {
-		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
-		in = fread(byte_buffer, 1, in, buf->infile);
-		len -= in;
+	uint32_t done = 0;
+	while (done < nbytes && !feof(buf->infile) && !ferror(buf->infile)) {
+		char byte_buffer[COPY_BUFFER_LEN];
+		uint32_t len = nbytes - done;
+		uint32_t in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		done += fread(byte_buffer, 1, in, buf->infile);
 	}
+	return done;
 }
 
 void buffer_reset(struct line_buffer *buf)
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index fb37390..2796ba7 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -17,7 +17,7 @@ int buffer_deinit(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char ...
From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:46 pm

Tweak the line_buffer API to permit seeking and cat-ing segments
longer than 4 GiB.  This would be particularly useful for applying
deltas that remove a large segment from the middle of a file.

Callers would still have to be updated to take advantage of this.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Since off_t is a signed type, on systems with 32-bit file offsets,
this might make things worse.  Is that worth worrying about?

 vcs-svn/line_buffer.c |    8 ++++----
 vcs-svn/line_buffer.h |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 999368b..fd1d3c3 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -56,7 +56,7 @@ char *buffer_read_string(struct line_buffer *buf, uint32_t len)
 	return ferror(buf->infile) ? NULL : buf->blob_buffer.buf;
 }
 
-void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
+void buffer_copy_bytes(struct line_buffer *buf, off_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
@@ -72,12 +72,12 @@ void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
 	}
 }
 
-uint32_t buffer_skip_bytes(struct line_buffer *buf, uint32_t nbytes)
+off_t buffer_skip_bytes(struct line_buffer *buf, off_t nbytes)
 {
-	uint32_t done = 0;
+	off_t done = 0;
 	while (done < nbytes && !feof(buf->infile) && !ferror(buf->infile)) {
 		char byte_buffer[COPY_BUFFER_LEN];
-		uint32_t len = nbytes - done;
+		off_t len = nbytes - done;
 		uint32_t in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
 		done += fread(byte_buffer, 1, in, buf->infile);
 	}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 2796ba7..2849faa 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -16,8 +16,8 @@ int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t ...
From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:47 pm

buffer_read_binary() writes to a strbuf so the caller does not need
to keep track of the number of bytes read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    6 ++++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index fd1d3c3..6dd0189 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -56,6 +56,12 @@ char *buffer_read_string(struct line_buffer *buf, uint32_t len)
 	return ferror(buf->infile) ? NULL : buf->blob_buffer.buf;
 }
 
+void buffer_read_binary(struct strbuf *sb, uint32_t size,
+			struct line_buffer *buf)
+{
+	strbuf_fread(sb, size, buf->infile);
+}
+
 void buffer_copy_bytes(struct line_buffer *buf, off_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 2849faa..873b0e4 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -16,6 +16,7 @@ int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
+void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
 void buffer_copy_bytes(struct line_buffer *buf, off_t len);
 off_t buffer_skip_bytes(struct line_buffer *buf, off_t len);
 void buffer_reset(struct line_buffer *buf);
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:47 pm

The buffer_at_eof() function returns 1 if and only if all input from
the input stream has been exhausted (because of EOF or error).  The
implementation calls fgetc() followed by ungetc() to force an EOF
condition when there is no more input remaining.

Like many functions in the line_buffer API, this function is not
thread-safe.  It could be made to be so with a mutex if needed.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |   10 ++++++++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 6dd0189..19caa21 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -27,6 +27,16 @@ int buffer_deinit(struct line_buffer *buf)
 	return err;
 }
 
+int buffer_at_eof(struct line_buffer *buf)
+{
+	int ch;
+	if ((ch = fgetc(buf->infile)) == EOF)
+		return 1;
+	if (ungetc(ch, buf->infile) == EOF)
+		return error("cannot unget %c: %s\n", ch, strerror(errno));
+	return 0;
+}
+
 /* Read a line without trailing newline. */
 char *buffer_read_line(struct line_buffer *buf)
 {
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 873b0e4..0269aed 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -14,6 +14,7 @@ struct line_buffer {
 
 int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
+int buffer_at_eof(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
 void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:51 pm

Add a buffer_ferror() function to read the error flag from the input
stream, so callers can do:

	some_error_prone_operation(f, ...);
	if (buffer_ferror(f))
		return error("input error: %s", strerror(errno));

instead of waiting until it is time to close the file.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    5 +++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 19caa21..43da509 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -27,6 +27,11 @@ int buffer_deinit(struct line_buffer *buf)
 	return err;
 }
 
+int buffer_ferror(struct line_buffer *buf)
+{
+	return ferror(buf->infile);
+}
+
 int buffer_at_eof(struct line_buffer *buf)
 {
 	int ch;
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 0269aed..4899289 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -14,6 +14,7 @@ struct line_buffer {
 
 int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
+int buffer_ferror(struct line_buffer *buf);
 int buffer_at_eof(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:52 pm

buffer_read_char() can be used in place of buffer_read_string(1)
to avoid consuming valuable static buffer space.  The delta applier
will use this to read variable-length integers one byte at a time.

Underneath, it is fgetc(), wrapped so the line_buffer library can
maintain its role as gatekeeper of input.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    5 +++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 43da509..c54031b 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -42,6 +42,11 @@ int buffer_at_eof(struct line_buffer *buf)
 	return 0;
 }
 
+int buffer_read_char(struct line_buffer *buf)
+{
+	return fgetc(buf->infile);
+}
+
 /* Read a line without trailing newline. */
 char *buffer_read_line(struct line_buffer *buf)
 {
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 4899289..2375ee1 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -18,6 +18,7 @@ int buffer_ferror(struct line_buffer *buf);
 int buffer_at_eof(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
+int buffer_read_char(struct line_buffer *buf);
 void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
 void buffer_copy_bytes(struct line_buffer *buf, off_t len);
 off_t buffer_skip_bytes(struct line_buffer *buf, off_t len);
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:53 pm

Subversion's delta format has the convenient property that applying
each section of the delta only requires examining (and keeping in
memory) a small portion of the preimage.  At any moment, this portion
begins at a well-defined file offset and has a well-defined length,
and as the delta is applied, it moves from the beginning to the end
of the file.  Add a move_window() function to keep track of such a
window into a file.

You can use it like this:

	struct line_buffer preimage = LINE_BUFFER_INIT;
	buffer_init(&preimage, NULL);
	struct view window = {&preimage, 0, STRBUF_INIT};
	move_window(&window, 3, 7);	/* (1) */
	move_window(&window, 5, 5);	/* (2) */
	move_window(&window, 12, 2);	/* (3) */
	strbuf_release(&window.buf);
	buffer_deinit(&preimage);

In this example: (1) reads 10 bytes and discards the first 3;
(2) discards the first 2, which are not needed any more; and (3)
skips 2 bytes and reads 2 new bytes to work with.

Whenever move_window() returns, the file position indicator is at
position window->off + window->buf.len and the data from positions
window->off to the current file position are stored in window->buf.

This function does only sequential access and never seeks, so it
can be safely used on pipes and sockets.

On end-of-file, move_window() just silently reads less than the
caller requested.  On other errors, it prints a message to stderr
and returns -1.

Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile                 |    5 ++-
 vcs-svn/sliding_window.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/sliding_window.h |   14 ++++++++++
 vcs-svn/LICENSE          |    2 +
 4 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/sliding_window.c
 create mode 100644 vcs-svn/sliding_window.h

diff --git a/Makefile b/Makefile
index 1f1ce04..d99da33 100644
--- a/Makefile
+++ b/Makefile
@@ -1765,7 +1765,8 @@ endif
 XDIFF_OBJS = xdiff/xdiffi.o ...
From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:55 pm

The humble beginnings of the svn-format delta applier.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Maybe this should be squashed with patch 16.

Ideas for eliminating the code duplication?

 vcs-svn/svndiff.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 59 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/svndiff.c

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
new file mode 100644
index 0000000..4d122a5
--- /dev/null
+++ b/vcs-svn/svndiff.c
@@ -0,0 +1,59 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "line_buffer.h"
+
+/*
+ * svndiff0 applier
+ *
+ * See http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff.
+ *
+ * int ::= highdigit* lowdigit;
+ * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
+ * lowdigit ::= # 7 bit value;
+ */
+
+#define VLI_CONTINUE	0x80
+#define VLI_DIGIT_MASK	0x7f
+#define VLI_BITS_PER_DIGIT 7
+
+static int read_int(struct line_buffer *in, uintmax_t *result, off_t *len)
+{
+	off_t sz = *len;
+	uintmax_t rv = 0;
+	while (sz) {
+		int ch = buffer_read_char(in);
+		if (ch == EOF)
+			break;
+		sz--;
+		rv <<= VLI_BITS_PER_DIGIT;
+		rv += (ch & VLI_DIGIT_MASK);
+		if (!(ch & VLI_CONTINUE)) {
+			*result = rv;
+			*len = sz;
+			return 0;
+		}
+	}
+	return error("Invalid delta: incomplete integer %"PRIuMAX, rv);
+}
+
+static int parse_int(const char **buf, size_t *result, const char *end)
+{
+	const char *pos;
+	size_t rv = 0;
+	for (pos = *buf; pos != end; pos++) {
+		unsigned char ch = *pos;
+		rv <<= VLI_BITS_PER_DIGIT;
+		rv += (ch & VLI_DIGIT_MASK);
+		if (!(ch & VLI_CONTINUE)) {
+			*result = rv;
+			*buf = pos + 1;
+			return 0;
+		}
+	}
+	return error("Invalid instruction: incomplete integer %"PRIu64,
+		     (uint64_t) rv);
+}
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:58 pm

The magic number of svn deltas is SVN followed by a null byte.
An alternative format (with compressed text) uses magic number SVN\1,
but that is deprecated in favor of compressing the deltas as a whole
as far as I can tell.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/svndiff.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 4d122a5..df0b1a2 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -11,6 +11,7 @@
  *
  * See http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff.
  *
+ * svndiff0 ::= 'SVN\0' window window*;
  * int ::= highdigit* lowdigit;
  * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
  * lowdigit ::= # 7 bit value;
@@ -20,6 +21,23 @@
 #define VLI_DIGIT_MASK	0x7f
 #define VLI_BITS_PER_DIGIT 7
 
+static int read_magic(struct line_buffer *in, off_t *len)
+{
+	static const char magic[] = {'S', 'V', 'N', '\0'};
+	struct strbuf sb = STRBUF_INIT;
+	if (*len < sizeof(magic))
+		return error("Invalid delta: no file type header");
+	buffer_read_binary(&sb, sizeof(magic), in);
+	if (sb.len != sizeof(magic))
+		return error("Invalid delta: no file type header");
+	if (memcmp(sb.buf, magic, sizeof(magic)))
+		return error("Unrecognized file type %.*s",
+			     (int) sizeof(magic), sb.buf);
+	*len -= sizeof(magic);
+	strbuf_release(&sb);
+	return 0;
+}
+
 static int read_int(struct line_buffer *in, uintmax_t *result, off_t *len)
 {
 	off_t sz = *len;
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 7:59 pm

The idiom (a + b < a) works fine for detecting that an unsigned
integer has overflowed, but the more explicit

	unsigned_add_overflows(a, b)

might be easier to read.

Define such a macro, expanding roughly to ((a) < UINT_MAX - (b)).
Because the expansion uses each argument only once outside of sizeof()
expressions, it is safe to use this macro with arguments that have
side-effects.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 git-compat-util.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 2af8d3e..817f045 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -31,6 +31,9 @@
 #define maximum_signed_value_of_type(a) \
     (INTMAX_MAX >> (bitsizeof(intmax_t) - bitsizeof(a)))
 
+#define maximum_unsigned_value_of_type(a) \
+    (UINTMAX_MAX >> (bitsizeof(uintmax_t) - bitsizeof(a)))
+
 /*
  * Signed integer overflow is undefined in C, so here's a helper macro
  * to detect if the sum of two integers will overflow.
@@ -40,6 +43,9 @@
 #define signed_add_overflows(a, b) \
     ((b) > maximum_signed_value_of_type(a) - (a))
 
+#define unsigned_add_overflows(a, b) \
+    ((b) > maximum_unsigned_value_of_type(a) - (a))
+
 #ifdef __GNUC__
 #define TYPEOF(x) (__typeof__(x))
 #else
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 8:00 pm

From: Ramkumar Ramachandra <artagnon@gmail.com>

The svn-fe test script only requires git and the svn command-line
tools.  Make these tests easier to read and run by not using the perl
libsvn bindings and instead duplicating only the relevant code from
lib-git-svn.sh.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9010-svn-fe.sh |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index a713dfc..fd851a4 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -2,9 +2,19 @@
 
 test_description='check svn dumpfile importer'
 
-. ./lib-git-svn.sh
+. ./test-lib.sh
 
-test_dump() {
+svnconf=$PWD/svnconf
+export svnconf
+
+svn_cmd () {
+	subcommand=$1 &&
+	shift &&
+	mkdir -p "$svnconf" &&
+	svn "$subcommand" --config-dir "$svnconf" "$@"
+}
+
+test_dump () {
 	label=$1
 	dump=$2
 	test_expect_success "$dump" '
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 8:11 pm

The buffer_read_string() function returns a temporary string of
size specified by the caller.  It currently uses an obj_pool to
store the return value, but that is overkill: all we need is a
buffer that can grow between requests to accomodate larger
strings.

Use a strbuf instead.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
[Resent after messing up the message header; sorry for the noise.]

 vcs-svn/line_buffer.c |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index f22c94f..6f32f28 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -5,15 +5,13 @@
 
 #include "git-compat-util.h"
 #include "line_buffer.h"
-#include "obj_pool.h"
+#include "strbuf.h"
 
 #define LINE_BUFFER_LEN 10000
 #define COPY_BUFFER_LEN 4096
 
-/* Create memory pool for char sequence of known length */
-obj_pool_gen(blob, char, 4096)
-
 static char line_buffer[LINE_BUFFER_LEN];
+static struct strbuf blob_buffer = STRBUF_INIT;
 static FILE *infile;
 
 int buffer_init(const char *filename)
@@ -58,11 +56,9 @@ char *buffer_read_line(void)
 
 char *buffer_read_string(uint32_t len)
 {
-	char *s;
-	blob_free(blob_pool.size);
-	s = blob_pointer(blob_alloc(len + 1));
-	s[fread(s, 1, len, infile)] = '\0';
-	return ferror(infile) ? NULL : s;
+	strbuf_reset(&blob_buffer);
+	strbuf_fread(&blob_buffer, len, infile);
+	return ferror(infile) ? NULL : blob_buffer.buf;
 }
 
 void buffer_copy_bytes(uint32_t len)
@@ -94,5 +90,5 @@ void buffer_skip_bytes(uint32_t len)
 
 void buffer_reset(void)
 {
-	blob_reset();
+	strbuf_release(&blob_buffer);
 }
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Sunday, October 10, 2010 - 9:01 pm

Implement an svndiff 0 interpreter, for use by the dumpfilev3
importer.  It is slower than it needs to be (e.g., it does not use
fseek() on input) for simplicity.

This is based only on the spec and not Subversion's implementation of
the svndiff0 spec.

The tests come from various deltas encountered in importing the
Apache SVN repo.

The svndiff0 semantics are not completely documented, meaning that
some of this work had to be done by guesswork.  It is not complete.

This version of the patch omits the enormous Xerces.cpp test (which
is not so interesting because it passes, anyway).

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Helped-by: David Barr <david.barr@cordelta.com>
Not-signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
[alternate ending, for those who do not like reading 3-MiB messages -
sorry about that]

Still not signed off because I haven't checked the copyright of the
tests.  I would prefer tiny deltas or deltas of some public-domain
work (e.g., drafts of foundation documents of some country).

This patch owes a great deal to David and Ram, probably more than
it owes to me.  David made lots and lots of fixes.  Ram introduced
the tests and helped with design.  Thomas provided an early sanity
check for code clarity.

Thanks for reading.

 Makefile              |    4 +-
 t/t9010-svn-fe.sh     |   14 ++
 t/t9010/newdata.diff0 |  Bin 0 -> 19392 bytes
 t/t9010/newdata.done  |  522 +++++++++++++++++++++++++++++++++++++++++++++++++
 t/t9010/src.diff0     |  Bin 0 -> 74 bytes
 t/t9010/src.done      |  522 +++++++++++++++++++++++++++++++++++++++++++++++++
 test-svn-fe.c         |   37 +++-
 vcs-svn/svndiff.c     |  265 +++++++++++++++++++++++++
 vcs-svn/svndiff.h     |    9 +
 9 files changed, 1364 insertions(+), 9 deletions(-)
 create mode 100644 t/t9010/blank.done
 create mode 100644 t/t9010/newdata.diff0
 create mode 100644 t/t9010/newdata.done
 create mode 100644 t/t9010/src.diff0
 create ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:17 am

I hear that was nigh unreadable, so here's a reroll.  Less
cargo-cult support for broken deltas, more readability and tests.
Patches apply on top of "[PATCH 15/16] t9010 (svn-fe): Eliminate
dependency on svn perl bindings".  As before, the end result
includes a 'test-svn-fe -d' command that can apply svndiff0-format
deltas, meaning less binary garbage to worry about as you puzzle
over that confusing "svnrdump dump" output in debugging sessions.

Questions?  Improvements?  Bugs?

Patch 1 is a fixup to the variable-length integer parsing code, to
report unexpected EOF (i.e., declared content length too long)
correctly when it occurs in the middle of such an integer.

Patch 2 is the svndiff0 interpreter in broad strokes: read window,
read window, read window, ....  The patch doesn't encode any
knowledge about what actually goes _in_ a window aside from the
header, so it will error out for nonempty windows.

Patch 3 teaches the nacent interpreter to keep the appropriate
piece of the preimage in memory.  This is probably earlier in the
series than it ought to be, but I wanted to try out the sliding
window code.

With patches 4 and 5, the interpreter learns to read the "data"
and "instructions" section of a window.  The effect is observable
because it finds the beginning of the next window correctly.

Patch 6 is an example instruction (copyfrom_data).

Patches 7-8 introduce some sanity checks.

Patches 9 and 10 are another instruction (copyfrom_target) and
another sanity check.

Patch 11 is the last instruction (copyfrom_source).  That's it.
You can apply deltas now!

If anything seems unclear, please don't spend time puzzling it
out --- just yell at me, so the code or documentation can be
cleaned up.  Happy reading.
--

From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:19 am

Report EOF correctly in integer parsing code.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
This patch is meant for squashing.

 vcs-svn/svndiff.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index df0b1a2..36d2b30 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -45,7 +45,8 @@ static int read_int(struct line_buffer *in, uintmax_t *result, off_t *len)
 	while (sz) {
 		int ch = buffer_read_char(in);
 		if (ch == EOF)
-			break;
+			return error("Delta ends early (%"PRIu64" bytes remaining)",
+				     (uint64_t) sz);
 		sz--;
 		rv <<= VLI_BITS_PER_DIGIT;
 		rv += (ch & VLI_DIGIT_MASK);
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:21 am

A delta in the subversion delta (svndiff0) format consists of the
magic bytes SVN\0 followed by a sequence of windows, each beginning
with a window header consisting of five integers (with variable-length
representation):

	source view offset
	source view length
	output length
	instructions length
	auxiliary data length

Add an svndiff0_apply() function and test-svn-fe -d commandline tool
to parse such a delta in the special case of not including any
instructions or auxiliary data.

Later patches will add features to turn this into a fully functional
delta applier, for use by svn-fe in parsing the streams produced by
"svnrdump dump" and "svnadmin dump --deltas".

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile          |    4 +-
 t/t9011-svn-da.sh |   82 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 test-svn-fe.c     |   39 ++++++++++++++++++++----
 vcs-svn/svndiff.c |   64 +++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.h |    9 ++++++
 5 files changed, 189 insertions(+), 9 deletions(-)
 create mode 100755 t/t9011-svn-da.sh
 create mode 100644 vcs-svn/svndiff.h

diff --git a/Makefile b/Makefile
index d99da33..966f5c7 100644
--- a/Makefile
+++ b/Makefile
@@ -1766,7 +1766,7 @@ XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o \
-	vcs-svn/sliding_window.o
+	vcs-svn/sliding_window.o vcs-svn/svndiff.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1893,7 +1893,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
 	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
-	vcs-svn/svndump.h ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:30 am

The source view offset heading each svndiff0 window represents a
number of bytes past the beginning of the preimage.  Together with the
source view length, it instructs the delta applier about what portion
of the preimage instructions will refer to.  Read in that data right
away using the sliding window code.

Maybe some day we will mmap() to prepare to read data more lazily.

For compatibility with Subversion's implementation, tolerate source
view offsets pointing past the end of the preimage file (a later
patch will remove this flexibility).  For simplicity, also permit
source views that start within the preimage and end outside of it,
even though Subversion does not.

This does not teach the delta applier to read instructions or copy
data from the source view yet.  Deltas that would produce nonempty
output are still rejected.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
It occurs to me that Sam Vilain may well have something valuable to
say about this series, having implemented something similar[1].

Sam, this series adds an svndiff0 parser for git to use in parsing
v3 dumps (which are way easier to produce with remote access to an
svn repository than v2 dumps).  The beginning of the series is at [2],
though that cover letter is out of date: now, modulo any new bugs
I've introduced with this reroll, it is known to successfully apply
all the deltas involved in a complete dump of the ASF repo.

I am interested in improvements and complaints of all kinds.

[1] http://search.cpan.org/~samv/Parse-SVNDiff-0.03/lib/Parse/SVNDiff.pm
[2] http://thread.gmane.org/gmane.comp.version-control.git/151086/focus=158731

 t/t9011-svn-da.sh |   38 ++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   22 +++++++++++++++-------
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 8dccd16..b9aad70 100755
--- ...
From: Sam Vilain
Date: Thursday, October 14, 2010 - 2:45 pm

All I did was make the module lazy - version 0.02 was by 唐鳳, any
detailed knowledge I had of the binary format fell out of my head pretty
quickly :-)

Sam


--

From: Jonathan Nieder
Date: Thursday, October 14, 2010 - 4:40 pm

Ah, my bad.  唐鳳, as part of an attempt to natively support fetching
from and pushing to svn repositories, some contributors to the git
project are working on an svndiff0 applier.  If you're interested in
reliving old memories, please feel free to look it over (especially
the test cases).  Thoughts, simplifications, bug reports, improvements
welcome.

http://thread.gmane.org/gmane.comp.version-control.git/151086/focus=158913

Anyone wanting to try it can check out

	git://repo.or.cz/git/jrn.git svn-da

and use the test-svn-fe command:

	make test-svn-fe
	./test-svn-fe -d <preimage> <delta> <delta length>

or the tests:

	make
	cd t && sh t9011-svn-da.sh -v -i

The preimage or delta argument can be /dev/stdin for use in a pipeline.

Thanks for your work on svk and pugs!
Jonathan
--

From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:35 am

Each window of an svndiff0-format delta includes a section for new
data that will be copied into the preimage (in the order it appears in
the window, possibly interspersed with other data).

Read this data when encountering it.  It is not actually necessary to
do so --- it would be just as easy to copy straight from the delta
to output when interpreting the relevant instructions --- but this
way, the code that interprets svndiff0 instructions can proceed more
quickly because it does not require any I/O.

Subversion's implementation rejects deltas that do not consume all
the auxiliary data that is available.  Do not check that for now,
because it would make it impossible to test the function of this
patch until the instructions to consume data are implemented.

Do check for truncated data sections.  Since Subversion's applier
rejects deltas that end before the new-data section is declared to
end, it should be safe for this applier to reject such deltas, too.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   12 ++++++++++++
 vcs-svn/svndiff.c |   27 ++++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index b9aad70..44832b0 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -117,4 +117,16 @@ test_expect_success 'preimage view: accept truncated preimage' '
 	test_cmp empty actual.longread
 '
 
+test_expect_success 'inline data' '
+	printf "SVNQ%b%s%b%s" "QQQQ\003" "bar" "QQQQ\001" "x" |
+		q_to_nul >inline.clear &&
+	test-svn-fe -d preimage inline.clear 18 >actual &&
+	test_cmp empty actual
+'
+
+test_expect_success 'truncated inline data' '
+	printf "SVNQ%b%s" "QQQQ\003" "b" | q_to_nul >inline.trunc &&
+	test_must_fail test-svn-fe -d preimage inline.trunc 10
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index f2876b3..c60d732 ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:38 am

Buffer the instruction section upon encountering it for later
interpretation.

An alternative design would involve parsing the instructions
at this point and buffering them in some processed form.  Using
the unprocessed form is simpler.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |    5 +++++
 vcs-svn/svndiff.c |   23 ++++++++++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 44832b0..1383263 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -129,4 +129,9 @@ test_expect_success 'truncated inline data' '
 	test_must_fail test-svn-fe -d preimage inline.trunc 10
 '
 
+test_expect_success 'truncated inline data (after instruction section)' '
+	printf "SVNQ%b%b%s" "QQ\001\001\003" "\0201" "b" | q_to_nul >insn.trunc &&
+	test_must_fail test-svn-fe -d preimage insn.trunc 11
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index c60d732..72fe716 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -24,6 +24,7 @@
 #define VLI_BITS_PER_DIGIT 7
 
 struct window {
+	struct strbuf instructions;
 	struct strbuf data;
 };
 
@@ -119,7 +120,7 @@ static int read_chunk(struct line_buffer *delta, off_t *delta_len,
 
 static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 {
-	struct window ctx = {STRBUF_INIT};
+	struct window ctx = {STRBUF_INIT, STRBUF_INIT};
 	size_t out_len;
 	size_t instructions_len;
 	size_t data_len;
@@ -131,13 +132,25 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 	    read_length(delta, &instructions_len, delta_len) ||
 	    read_length(delta, &data_len, delta_len))
 		return -1;
+	if (read_chunk(delta, delta_len, &ctx.instructions, instructions_len))
+		return error("Invalid delta: incomplete instructions section");
+	if (buffer_ferror(delta)) {
+		rv = error("Cannot read delta: %s", strerror(errno));
+		goto done;
+	}
+	if (read_chunk(delta, delta_len, &ctx.data, ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:39 am

The copyfrom_data instruction copies a few bytes verbatim from the
auxiliary data section of a window to the postimage.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   31 +++++++++++++++++++
 vcs-svn/svndiff.c |   86 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 1383263..9279924 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -134,4 +134,35 @@ test_expect_success 'truncated inline data (after instruction section)' '
 	test_must_fail test-svn-fe -d preimage insn.trunc 11
 '
 
+test_expect_success 'copyfrom_data' '
+	echo hi >expect &&
+	printf "SVNQ%b%b%b" "QQ\003\001\003" "\0203" "hi\n" | q_to_nul >copydat &&
+	test-svn-fe -d preimage copydat 13 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'multiple copyfrom_data' '
+	echo hi >expect &&
+	printf "SVNQ%b%b%b%b%b" "QQ\003\002\003" "\0201\0202" "hi\n" \
+		"QQQ\002Q" "\0200Q" | q_to_nul >copy.multi &&
+	len=$(wc -c <copy.multi) &&
+	test-svn-fe -d preimage copy.multi $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'incomplete multiple insn' '
+	printf "SVNQ%b%b%b" "QQ\003\002\003" "\0203\0200" "hi\n" |
+		q_to_nul >copy.partial &&
+	len=$(wc -c <copy.partial) &&
+	test_must_fail test-svn-fe -d preimage copy.partial $len
+'
+
+test_expect_success 'catch attempt to copy missing data' '
+	printf "SVNQ%b%b%s%b%s" "QQ\002\002\001" "\0201\0201" "X" \
+			"QQQQ\002" "YZ" |
+		q_to_nul >copy.incomplete &&
+	len=$(wc -c <copy.incomplete) &&
+	test_must_fail test-svn-fe -d preimage copy.incomplete $len
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 72fe716..ac776e0 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -14,20 +14,40 @@
  *
  * svndiff0 ::= 'SVN\0' window window*;
  * window ::= int int int int int instructions inline_data;
+ * instructions ::= instruction*;
+ * instruction ::= ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:41 am

Check that the declared output size for each window is correct, and
reserve that amount of space in the output buffer in advance.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/svndiff.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index ac776e0..c03cd7e 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -220,10 +220,15 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
 		rv = error("Cannot read delta: %s", strerror(errno));
 		goto done;
 	}
+	strbuf_grow(&ctx.out, out_len);
 	if (apply_window_in_core(&ctx) || write_strbuf(&ctx.out, out)) {
 		rv = -1;
 		goto done;
 	}
+	if (ctx.out.len != out_len) {
+		rv = error("Invalid delta: incorrect postimage length");
+		goto done;
+	}
  done:
 	strbuf_release(&ctx.data);
 	strbuf_release(&ctx.instructions);
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:48 am

The main point is to constrain the format of deltas more,
so corruption and other breakage can be more easily detected.

Requiring deltas not to provide unconsumed data also opens
the possibility of ignoring the declared amount of new data
and simply streaming the data as needed to fulfill
copyfrom_data requests.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   12 +++---------
 vcs-svn/svndiff.c |    2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 9279924..c9f4768 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -117,20 +117,14 @@ test_expect_success 'preimage view: accept truncated preimage' '
 	test_cmp empty actual.longread
 '
 
-test_expect_success 'inline data' '
+test_expect_success 'unconsumed inline data' '
 	printf "SVNQ%b%s%b%s" "QQQQ\003" "bar" "QQQQ\001" "x" |
 		q_to_nul >inline.clear &&
-	test-svn-fe -d preimage inline.clear 18 >actual &&
-	test_cmp empty actual
+	test_must_fail test-svn-fe -d preimage inline.clear 18 >actual
 '
 
 test_expect_success 'truncated inline data' '
-	printf "SVNQ%b%s" "QQQQ\003" "b" | q_to_nul >inline.trunc &&
-	test_must_fail test-svn-fe -d preimage inline.trunc 10
-'
-
-test_expect_success 'truncated inline data (after instruction section)' '
-	printf "SVNQ%b%b%s" "QQ\001\001\003" "\0201" "b" | q_to_nul >insn.trunc &&
+	printf "SVNQ%b%b%s" "QQ\003\001\003" "\0203" "b" | q_to_nul >insn.trunc &&
 	test_must_fail test-svn-fe -d preimage insn.trunc 11
 '
 
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index c03cd7e..8755c83 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -188,6 +188,8 @@ static int apply_window_in_core(struct window *ctx)
 	while (insn != ctx->instructions.buf + ctx->instructions.len)
 		if (step(ctx, &insn, &data_pos))
 			return -1;
+	if (data_pos != ctx->data.len)
+		return error("Invalid delta: does not copy all new data");
 	return 0;
 }
 
-- 
1.7.2.3

--

From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:50 am

The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output.  (The offset
argument is relative to the beginning of output produced by the
current window.)

The region copied is allowed to run past the end of the existing
output.  To support that case, copy one character at a time
rather than using memcpy() or memmove().  This allows copyfrom_target
to be used once to repeat a string many times.  For example:

	COPYFROM_DATA 2
	COPYFROM_OUTPUT 10, 0
	DATA "ab"

would produce the output "ababababababababababab".

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   42 ++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   30 ++++++++++++++++++++++++++++--
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index c9f4768..ccd31e9 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -159,4 +159,46 @@ test_expect_success 'catch attempt to copy missing data' '
 	test_must_fail test-svn-fe -d preimage copy.incomplete $len
 '
 
+test_expect_success 'copyfrom target to repeat data' '
+	printf foofoo >expect &&
+	printf "SVNQ%b%b%s" "QQ\006\004\003" "\0203\0100\003Q" "foo" |
+		q_to_nul >copytarget.repeat &&
+	len=$(wc -c <copytarget.repeat) &&
+	test-svn-fe -d preimage copytarget.repeat $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'copyfrom target out of order' '
+	printf foooof >expect &&
+	printf "SVNQ%b%b%s" \
+		"QQ\006\007\003" "\0203\0101\002\0101\001\0101Q" "foo" |
+		q_to_nul >copytarget.reverse &&
+	len=$(wc -c <copytarget.reverse) &&
+	test-svn-fe -d preimage copytarget.reverse $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'catch copyfrom future' '
+	printf "SVNQ%b%b%s" "QQ\004\004\003" "\0202\0101\002\0201" "XYZ" |
+		q_to_nul >copytarget.infuture &&
+	len=$(wc -c <copytarget.infuture) &&
+	test_must_fail test-svn-fe -d preimage copytarget.infuture ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:53 am

Some particularly strange deltas of unknown origin were found to
request copies beyond the end of the preimage.  But svn 1.6 never
produces anything like that.

Although Subversion accepts these perverse deltas as input, let's
error out if some future version of subversion starts to actually
produce them.

Without this change, the diff applier would have to separately
keep track of the number of bytes supposedly and actually written out.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh        |   11 ++++-------
 vcs-svn/sliding_window.c |   10 ++++++----
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index ccd31e9..c4bd1f3 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -105,16 +105,13 @@ test_expect_success 'preimage view: offsets compared by value' '
 	test_cmp empty actual
 '
 
-test_expect_success 'preimage view: accept truncated preimage' '
+test_expect_success 'preimage view: reject truncated preimage' '
 	printf "SVNQ%b" "\010QQQQ" | q_to_nul >clear.lateemptyread &&
 	printf "SVNQ%b" "\010\001QQQ" | q_to_nul >clear.latenonemptyread &&
 	printf "SVNQ%b" "\001\010QQQ" | q_to_nul >clear.longread &&
-	test-svn-fe -d preimage clear.lateemptyread 9 >actual.emptyread &&
-	test-svn-fe -d preimage clear.latenonemptyread 9 >actual.nonemptyread &&
-	test-svn-fe -d preimage clear.longread 9 >actual.longread &&
-	test_cmp empty actual.emptyread &&
-	test_cmp empty actual.nonemptyread &&
-	test_cmp empty actual.longread
+	test_must_fail test-svn-fe -d preimage clear.lateemptyread 9 &&
+	test_must_fail test-svn-fe -d preimage clear.latenonemptyread 9 &&
+	test_must_fail test-svn-fe -d preimage clear.longread 9
 '
 
 test_expect_success 'unconsumed inline data' '
diff --git a/vcs-svn/sliding_window.c b/vcs-svn/sliding_window.c
index 8273970..5c08828 100644
--- ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 2:58 am

The copyfrom_source instruction appends data from the preimage
buffer to the end of output.  Its arguments are a length and an
offset relative to the beginning of the source view.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.  Thanks for reading.  Hopefully this
round did not introduce too many bugs but if it did, I'd be glad to
hear about them.

Good night,
Jonathan

 t/t9011-svn-da.sh |   35 +++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   27 +++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index c4bd1f3..c8959e2 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -198,4 +198,39 @@ test_expect_success 'catch copy that overflows' '
 	test_must_fail test-svn-fe -d preimage copytarget.overflow $len
 '
 
+test_expect_success 'copyfrom source' '
+	printf foo >expect &&
+	printf "SVNQ%b%b" "Q\003\003\002Q" "\003Q" | q_to_nul >copysource.all &&
+	test-svn-fe -d preimage copysource.all 11 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'copy backwards' '
+	printf oof >expect &&
+	printf "SVNQ%b%b" "Q\003\003\006Q" "\001\002\001\001\001Q" |
+		q_to_nul >copysource.rev &&
+	test-svn-fe -d preimage copysource.rev 15 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'offsets are relative to window' '
+	printf fo >expect &&
+	printf "SVNQ%b%b%b%b" "Q\003\001\002Q" "\001Q" \
+		"\002\001\001\002Q" "\001Q" |
+		q_to_nul >copysource.two &&
+	test-svn-fe -d preimage copysource.two 18 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'example from notes/svndiff' '
+	printf aaaaccccdddddddd >expect &&
+	printf aaaabbbbcccc >source &&
+	printf "SVNQ%b%b%s" "Q\014\020\007\001" \
+		"\004Q\004\010\0201\0107\010" d |
+		q_to_nul >delta.example &&
+	len=$(wc -c <delta.example) &&
+	test-svn-fe -d ...
From: Jonathan Nieder
Date: Wednesday, October 13, 2010 - 3:00 am

The copyfrom_source instruction appends data from the preimage
buffer to the end of output.  Its arguments are a length and an
offset relative to the beginning of the source view.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
[resending with cc: samv]

That's the end of the series.  Thanks for reading.  Hopefully this
round did not introduce too many bugs but if it did, I'd be glad to
hear about them.

Good night,
Jonathan

 t/t9011-svn-da.sh |   35 +++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   27 +++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index c4bd1f3..c8959e2 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -198,4 +198,39 @@ test_expect_success 'catch copy that overflows' '
 	test_must_fail test-svn-fe -d preimage copytarget.overflow $len
 '
 
+test_expect_success 'copyfrom source' '
+	printf foo >expect &&
+	printf "SVNQ%b%b" "Q\003\003\002Q" "\003Q" | q_to_nul >copysource.all &&
+	test-svn-fe -d preimage copysource.all 11 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'copy backwards' '
+	printf oof >expect &&
+	printf "SVNQ%b%b" "Q\003\003\006Q" "\001\002\001\001\001Q" |
+		q_to_nul >copysource.rev &&
+	test-svn-fe -d preimage copysource.rev 15 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'offsets are relative to window' '
+	printf fo >expect &&
+	printf "SVNQ%b%b%b%b" "Q\003\001\002Q" "\001Q" \
+		"\002\001\001\002Q" "\001Q" |
+		q_to_nul >copysource.two &&
+	test-svn-fe -d preimage copysource.two 18 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'example from notes/svndiff' '
+	printf aaaaccccdddddddd >expect &&
+	printf aaaabbbbcccc >source &&
+	printf "SVNQ%b%b%s" "Q\014\020\007\001" \
+		"\004Q\004\010\0201\0107\010" d |
+		q_to_nul >delta.example &&
+	len=$(wc -c ...
From: Ramkumar Ramachandra
Date: Monday, October 18, 2010 - 10:00 am

Hi Jonathan,


Thanks for this! The code looks really good and the test suite is
quite comprehensive :)

Eager to see this go through to `master`.

-- Ram
--

From: Jonathan Nieder
Date: Monday, October 18, 2010 - 10:03 am

I think the strbuf use breaks the contrib/svn-fe/svn-fe build
somewhere.  Probably something like the following is needed.

(moving the -I flags off of the compilation command line is
justa "while at it" thing, for consistency with the toplevel
Makefile)

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
diff --git a/contrib/svn-fe/Makefile b/contrib/svn-fe/Makefile
index 360d8da..9e8f174 100644
--- a/contrib/svn-fe/Makefile
+++ b/contrib/svn-fe/Makefile
@@ -6,7 +6,7 @@ MV = mv
 
 CFLAGS = -g -O2 -Wall
 LDFLAGS =
-ALL_CFLAGS = $(CFLAGS)
+ALL_CFLAGS = $(CFLAGS) -I../../vcs-svn -I../..
 ALL_LDFLAGS = $(LDFLAGS)
 EXTLIBS =
 
@@ -38,7 +38,7 @@ svn-fe$X: svn-fe.o $(VCSSVN_LIB) $(GIT_LIB)
 		$(ALL_LDFLAGS) $(LIBS)
 
 svn-fe.o: svn-fe.c ../../vcs-svn/svndump.h
-	$(QUIET_CC)$(CC) -I../../vcs-svn -o $*.o -c $(ALL_CFLAGS) $<
+	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) $<
 
 svn-fe.html: svn-fe.txt
 	$(QUIET_SUBDIR0)../../Documentation $(QUIET_SUBDIR1) \
--

Previous thread: COX.NET SUPPORT by COX.NET WEBMAIL ACCOUNT on Wednesday, July 14, 2010 - 3:08 am. (1 message)

Next thread: [PATCH] Get rr/svn-fe-contrib merged by Ramkumar Ramachandra on Thursday, July 15, 2010 - 9:25 am. (2 messages)