Re: [PATCH 03/21] Refactoring to make verify_tag() and parse_tag_buffer() more similar

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Johan Herland
Date: Saturday, June 9, 2007 - 3:49 am

On Saturday 09 June 2007, Johannes Schindelin wrote:

Hi. Thanks for taking the time to look at (some of) my patch. Most of your
questions below can be answered with a single answer:

The main purpose of the patch is (as the subject line says) to bring the
two functions more in line with eachother. At the time I made the patch,
I had made the observation that these function were trying to do much the
same thing, albeit in a slightly different form. This patch is therefore
about applying a series of (mostly non-functional) refactorings to make
their diff as small as possible. This involves "stupid" changes such as
renaming variables, tweaking whitespace, reordering the declaration of
variables, etc. It's all to make the functions similar to the point where
I can diff them, get a small and meaningful result, see the remaining
_real_ differences, and in the end, _merge_ them (see patches 7-9).
If this whole exercise didn't end up with merging the two functions into
one, I would _totally_ agree with you that all this refactoring is more
harmful than beneficial.


First, (and you'll see this in the commit message) I'm _moving_ (not
removing) the NUL termination out of verify_tag() and into main() (which I
can be sure is the only caller of verify_tag(), since verify_tag is
declared static, and there is no other call in that file). Two reasons for
doing this:

1. Make verify_tag more similar to parse_tag_buffer() (because
parse_tag_buffer() does not NUL terminate)

2. Do the NUL termination as close to the code that actually populated the
buffer with data (the read_pipe() in main())


So now you can ask: Why doesn't parse_tag_buffer() NUL terminate its
input? It _that_ safe? And I ran around checking all the callers of
parse_tag_buffer, and found that all of them use data (most of which
originates from read_sha1_file()) that's already NUL terminated.

In the end, I also put in a comment on the resulting function
(parse_and_verify_tag_buffer()), explicitly saying the given data _must_
be NUL terminated.



Side note: At first I actually thought the manual NUL termination
could cause a buffer overflow (i.e. if the given size was the same as the
allocated size), so I actually have a version of all of this where I
_don't_ assume the buffer is NUL-terminated at all, and put in lots of
bounds checking, replace strchr() with memchr(), etc.

I then took a hard look at read_pipe(), and discovered that if you
use it to fill a 4096-byte buffer with 4096 bytes of data, it actually
_will_ reallocate to 8192 bytes and leave room for the NUL termination
(and much more) (I believe this should have been documented in read_pipe).
Thus the NUL termination was safe all along.



Yes, but it would have made the aforementioned diff to parse_tag_buffer()
larger.


Well, PD_FMT is only used inside the function, so I found it easier to
move the #definition of PD_FMT inside the function to indicate the scope
(_perceived_ scope; I know it hasn't any effect on the compiled code).
But since the whole function is going away in a few patches anyway,
I should have probably left it out of this patch entirely.


But if I leave the NUL termination within the function I would have to
backtrack out of the function to all of its potential callers and check
whether it's safe to write to index size. Since the word "size" could
easily mean "allocated size" I would have the initial feeling that this
might be a buffer overflow, i.e. _not_ safe.

In the end, I think the best solution is to make sure NUL termination
happens before calling the function, and then documenting explicitly
that the function assumes NUL terminated input. Which is exactly what
I end up with at the end of the patch series.


Hmm. The "type" line is found in the variable type_line, the "tag" line is
found in the variable tag_line, and the "tagger" line is found in the
variable ... sig_line? Nope, I don't buy it.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/21] Refactor the tag object (take 2), Johan Herland, (Fri Jun 8, 5:10 pm)
[PATCH 02/21] Return error messages when parsing fails., Johan Herland, (Fri Jun 8, 5:13 pm)
[PATCH 09/21] Remove unneeded code from mktag.c, Johan Herland, (Fri Jun 8, 5:16 pm)
[PATCH 10/21] Free mktag's buffer before dying, Johan Herland, (Fri Jun 8, 5:16 pm)
[PATCH 17/21] Update comments on tag objects in mktag.c, Johan Herland, (Fri Jun 8, 5:20 pm)
Re: [PATCH 03/21] Refactoring to make verify_tag() and par ..., Johannes Schindelin, (Fri Jun 8, 7:54 pm)
Re: [PATCH 03/21] Refactoring to make verify_tag() and par ..., Johan Herland, (Sat Jun 9, 3:49 am)
Re: [PATCH 10/21] Free mktag's buffer before dying, Alex Riesen, (Sat Jun 9, 2:37 pm)
Re: [PATCH 09/21] Remove unneeded code from mktag.c, Alex Riesen, (Sat Jun 9, 2:39 pm)
Re: [PATCH 09/21] Remove unneeded code from mktag.c, Johan Herland, (Sat Jun 9, 2:42 pm)
Re: [PATCH 10/21] Free mktag's buffer before dying, Johan Herland, (Sat Jun 9, 2:46 pm)
Re: [PATCH 10/21] Free mktag's buffer before dying, Alex Riesen, (Sat Jun 9, 3:00 pm)
Re: [PATCH 10/21] Free mktag's buffer before dying, Johan Herland, (Sat Jun 9, 3:05 pm)
Re: [PATCH 05/21] Make parse_tag_buffer_internal() handle ..., Johannes Schindelin, (Sun Jun 10, 1:06 am)
Re: [PATCH 06/21] Refactor tag name verification loop to u ..., Johannes Schindelin, (Sun Jun 10, 1:14 am)
Re: [PATCH 07/21] Copy the remaining differences from veri ..., Johannes Schindelin, (Sun Jun 10, 1:22 am)
Re: [PATCH 10/21] Free mktag's buffer before dying, Johannes Schindelin, (Sun Jun 10, 1:38 am)
Re: [PATCH 11/21] Rewrite error messages; fix up line lengths, Johannes Schindelin, (Sun Jun 10, 1:38 am)
Re: [PATCH 12/21] Use prefixcmp() instead of memcmp() for ..., Johannes Schindelin, (Sun Jun 10, 1:41 am)
Re: [PATCH 13/21] Collect skipping of header field names a ..., Johannes Schindelin, (Sun Jun 10, 1:45 am)
Re: [PATCH 06/21] Refactor tag name verification loop to u ..., Johannes Schindelin, (Sun Jun 10, 2:01 am)