O.K., that is nicely defined module, although you would probably
want to add _what_ those functions parse: most of them (or all
of them) parse or help parsing output of git commands.
By the way, in the future we might want to move those subroutines
to Git::Parse (or individual Git::Commit, Git::Tag, etc. modules).
By "in the future" I mean here when we move to using Git.pm in
gitweb... which currently would require some functionality that is
just not present in Git.pm.
Usual notice: Git.pm or Gitweb::Git, etc.?
One of those is not like the other. parsed_difftree_line is really an
utility (helper) function, but I guess it is here and not in Gitweb::Util
because of its dependency on parse_difftree_raw_line, and therefore it
would add circular dependency of Gitweb::Parse to Gitweb::Util.
[...]
Wouldn't "subroutines related to parsing output of git commands, required
by gitweb" be a better description here?
Sidenote: do we want "GPLv2" or "GPLv2 or later", or even "GPLv2 or later,
at discretion of Junio C Hamano"?
Sidenote: in the future commit we might want to not export 'unquote' by
default (but still be able to export it / export it with ':all' tag).
I don't think we use it outside subroutines defined in this module.
But I guess it is better left for a later commit.
While at it, lets review those subroutines.
All those comments are meant for a future commit, and are not in any
way requirement for Pavan. Splitting should in my opinion be just simple
code movement / code reorganization, without any changes.
Sidenote: I wonder if it wouldn't be better to move definition of 'unq'
subroutine to outside of 'unquote', i.e. do not nest subroutines.
Sidenote: hmmm... aren't digit escape sequences always three digits long,
with exception of '\0' for NUL byte? I mean, wouldn't it be better to
use
+ $str =~ s/\\([^0-7]|0|[0-7]{3})/unq($1)/eg;
Sidenote: we would probably want to add description of this subroutine
later (in subsequent commits). This subroutine takes e.g. authortime
as returned by git (epoch + numerical timezone), parses its and
_formats_ to rfc2822, mday-time, ISO 8601 and other formats... hmmm...
perhaps we should separate parsing from formatting...
But all of this should be left for later commit.
In the future, when we move to using Git.pm (the "use Git", not
"use Gitweb::Git"), it could be
+ $repo->get_object_with_type("tag", $tag_id) or return;
Or not. Or something like that.
Sidenote: I wonder if parsing name + email out of author / comitter /
/ tagger info shouldn't be moved to a separate subroutine, or is it not
worth it...
Again, better left for later commit.
Sidenote: WTF is that? The (ASCII armored) signature block can occur
only in the tag message (tag "payload"), and coun't be in heder,
could it?
But fix of this issue should be left for later commit.
+ my @comment = <$fd>;
Again, this is better left for later commit.
+ return;
Fix (I guess), better left for later commit.
+ return %tag;
Same.
Sidenote: we should probably add description of this subroutine in
a later commit.
Sidenote: I wonder if using "named parameters" here wouldn't make
for a better API, i.e.
+ my ($commit_text, %opts) = @_;
+ my $withparents = $opts{'-with-parents'};
Those five lines above very much depend that the output is generated
by 'git rev-list --parents ...", i.e. that there is line with SHA-1
of commit, and then line with _effective_ parents of a commit.
Sidenote: parse_tag 'chomp's each line first.
Sidenote: we would probably want to distinguish between real / original
parents (as recorded in the object) and effective parents (history
simplification e.g. in 'history' view, and grafts).
Sidenote: again, perhaps parsing of aithor info should be refactored
into a separate subroutine, to avoid code duplication.
Same.
This foreach skips initial (leading) empty lines.
Sidenote: this simplification for 'title_short' should probably be
refactored into separate subroutine, and improved. But that's a
matter for a later commit, if any.
Using 'map' would be probably more idiomatic:
+ # remove added spaces
+ map { s/^[ ]{4}// } @commit_lines;
Sidenote: the above block should be probably refactored in a later commit.
With new API (see above) it would be
+ %co = parse_commit_text(<$fd>, -with-parents => 1);
Here a weakness (I think) of Perl way of passing subroutine arguments
is shown: it is impossible to have both slurpy @args and named parameters
i.e. %opts, unless we pass optional / named parameters in hashref, i.e.
+ my ($commit_id, @args) = @_;
+ my %opts = ('-max-count' => 1, '-skip' => 0);
+ %opts = (%opts, %{ shift @args }) if ref($args[ 0]) eq 'HASH';
+ %opts = (%opts, %{ pop @args }) if ref($args[-1]) eq 'HASH';
or something like that, and then using $opts{'-max-count'} and
$opts{'-skip'} in place of $maxcount and $skip, etc.
Errrr... @commits or @commit_list.
Hmmm... should it be called '$line' if it is "\0" separated chunk,
i.e. single commit? $commit_text could be better name, and if we
'chomp'ed it, we wouldn't have to 'pop' trailing "\0" in
parse_commit_text.
Nowadays I really wonder if it is such a good idea, and if it wouldn't
be better to just always return @cos (i.e. @commits / @commit_list).
We are a bit inconsistent WRT. context sensitivity.
Same here.
Hmmm... not all parse_* subroutines that can support '-z' named
option do it.
Same issue about context sensitivity here.
Hmmm... this subroutine is not exactly like the other. I wonder if it
wouldn't be better to leave it together with git_patchset_body. Or
with fill_from_file_info.
--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html