Johannes Schindelin wrote:
quoted text > Hi,
>
> I do not want to depend on more than necessary in msysGit, and therefore I
> started to write an asciidoc replacement.
>
> So here it is: a perl script that does a good job on many .txt files in
> Documentation/, although for some it deviates from "make man"'s output,
> and for others it is outright broken. It is meant to be run in
> Documentation/.
>
> My intention is not to fix the script for all cases, but to make patches
> to Documentation/*.txt themselves, so that they are more consistent (and
> incidentally nicer to the script).
>
> Now, I hear you already moan: "But Dscho, you know you suck at Perl!"
>
> Yeah, I know, but maybe instead of bashing on me (pun intended), you may
> want to enlighten me with tips how to make it nicer to read. (Yes, there
> are no comments; yes, I will gladly add them where appropriate; yes, html
> is just a stub.)
>
> So here, without further ado, da script:
It's pretty good, I certainly wouldn't have trouble reading or
maintaining it, but I'll give you suggestions anyway.
nice work, replacing a massive XML/XSL/etc stack with a small Perl
script ;-)
Sam.
quoted text >
> -- snip --
> #!/usr/bin/perl
Add -w for warnings, also use strict;
quoted text > $conv = new man_page();
> $conv->{manual} = 'Git Manual';
> $conv->{git_version} = 'Git ' . `cat ../GIT-VERSION-FILE`;
> $conv->{git_version} =~ s/GIT_VERSION = //;
> $conv->{git_version} =~ s/-/\-/;
> $conv->{git_version} =~ s/\n//;
> $conv->{date} = `date +%m/%d/%Y`;
> $conv->{date} =~ s/\n//;
>
> $par = '';
> handle_file($ARGV[0]);
> $conv->finish();
>
> sub handle_text {
this function acts on globals; make them explicit arguments to the function.
quoted text > if ($par =~ /^\. /s) {
> my @lines = split(/^\. /m, $par);
> shift @lines;
> $conv->enumeration(\@lines);
> } elsif ($par =~ /^\* /s) {
uncuddle your elsif's; also consider making this a "tabular ternary"
with the actions in separate functions.
ie
$result = ( $par =~ /^\. /s ? $conv->do_enum($par) :
$par =~ /^\[verse\]/ ? $conv->do_verse($par) :
... )
However I have a suspicion that your script is doing line-based parsing
instead of recursive descent; I don't know whether that's the right
thing for asciidoc. It's actually fairly easy to convert a grammar to
code blocks using tricks from MJD's _Higher Order Perl_. Is it
necessary for the asciidoc grammar?
quoted text > my @lines = split(/^\* /m, $par);
> shift @lines;
> $conv->enumeration(\@lines, 'unnumbered');
> } elsif ($par =~ /^\[verse\]/) {
> $par =~ s/\[verse\] *\n?//;
> $conv->verse($par);
> } elsif ($par =~ /^(\t| +)/s) {
> $par =~ s/^//mg;
> $par =~ s/^\+$//mg;
> $conv->indent($par);
> } elsif ($par =~ /^([^\n]*)::\n((\t| +).*)$/s) {
> my ($first, $rest, $indent) = (, , );
> $rest =~ s/^\+$//mg;
> while ($rest =~ /^(.*?\n\n)--+\n(.*?\n)--+\n\n(.*)$/s) {
> my ($pre, $verb, $post) = (, , );
>
> $pre =~ s/^(\t|$indent)//mg;
> if ($first ne '') {
> $conv->begin_item($first, $pre);
> $first = '';
> } else {
> $conv->normal($pre);
> }
>
> $conv->verbatim($verb);
> $rest = $post;
> }
> $rest =~ s/^(\t|$indent)//mg;
> if ($first ne '') {
> $conv->begin_item($first, $rest);
> } else {
> $conv->normal($rest);
> }
> $conv->end_item();
> } elsif ($par =~ /^-+\n(.*\n)-+\n$/s) {
> $conv->verbatim();
> } else {
> $conv->normal($par);
> }
> $par = '';
> }
>
> sub handle_file {
> my $in;
> open($in, '<' . $_[0]);
> while (<$in>) {
> if (/^=+$/) {
> if ($par ne '' && length($_) >= length($par)) {
> $conv->header($par);
> $par = '';
> next;
> }
> } elsif (/^-+$/) {
> if ($par ne '' && length($_) >= length($par)) {
> $conv->section($par);
> $par = '';
> next;
> }
> } elsif (/^~+$/) {
> if ($par ne '' && length($_) >= length($par)) {
> $conv->subsection($par);
> $par = '';
> next;
> }
> } elsif (/^\[\[(.*)\]\]$/) {
> handle_text();
> $conv->anchor();
> next;
> } elsif (/^$/) {
> if ($par =~ /^-+\n.*[^-]\n$/s) {
> # fallthru; is verbatim, but needs more.
> } elsif ($par =~ /::\n$/s) {
> # is item, but needs more.
> next;
> } else {
> handle_text();
> next;
> }
> } elsif (/^include::(.*)\[\]$/) {
> handle_text();
> handle_file();
> next;
> }
>
> # convert "\--" to "--"
> s/\--/--/g;
> # convert "\*" to "*"
> s/\\*/*/g;
>
> # handle gitlink:
> s/gitlink:([^\[ ]*)\[(\d+)\]/sprintf "%s",
> $conv->get_link(, )/ge;
> # handle link:
> s/link:([^\[ ]*)\[(.+)\]/sprintf "%s",
> $conv->get_link(, , 'external')/ge;
These REs suffer from LTS (Leaning Toothpick Syndrome). Consider using
s{foo}{bar} and adding the 'x' modifier to space out groups.
quoted text >
> $par .= $_;
> }
> close($in);
> handle_text();
> }
>
> package man_page;
>
> sub new {
> my ($class) = @_;
> my $self = {
> sep => '',
> links => [],
> # generator => 'Home grown git txt2man converter'
> generator => 'DocBook XSL Stylesheets v1.71.1 <http://docbook.sf.net/>'
> };
> bless $self, $class;
> return $self;
> }
>
> sub header {
> my ($self, $text) = @_;
> $text =~ s/-/\-/g;
>
> if ($self->{preamble_shown} == undef) {
> $title = $text;
> $title =~ s/\(\d+\)$//;
> print '.\" Title: ' . $title
> . '.\" Author: ' . "\n"
> . '.\" Generator: ' . $self->{generator} . "\n"
> . '.\" Date: ' . $self->{date} . "\n"
> . '.\" Manual: ' . $self->{manual} . "\n"
> . '.\" Source: ' . $self->{git_version} . "\n"
> . '.\"' . "\n";
> }
I'd consider a HERE-doc, or multi-line qq{ } more readable than this.
quoted text >
> $text =~ tr/a-z/A-Z/;
> my $suffix = "\"$self->{date}\" \"$self->{git_version}\""
> . " \"$self->{manual}\"";
Use qq{} when making strings with lots of embedded double quotes and
interpolation.
quoted text > $text =~ s/^(.*)\((\d+)\)$/.TH "" "" $suffix/;
> print $text;
>
> if ($self->{preamble_shown} == undef) {
> print '.\" disable hyphenation' . "\n"
> . '.nh' . "\n"
> . '.\" disable justification (adjust text to left'
> . ' margin only)' . "\n"
> . '.ad l' . "\n";
Using commas rather than "." will safe you a concat when printing to
filehandles, but that's a very small nit to pick :)
quoted text > $self->{preamble_shown} = 1;
> }
>
> $self->{last_op} = 'header';
> }
>
> sub section {
> my ($self, $text) = @_;
>
> $text =~ tr/a-z/A-Z/;
> $text =~ s/^(.*)$/.SH ""/;
>
> print $text;
>
> $self->{last_op} = 'section';
> }
>
> sub subsection {
> my ($self, $text) = @_;
>
> $text =~ s/^(.*)$/.SS ""/;
>
> print $text;
>
> $self->{last_op} = 'subsection';
> }
>
> sub get_link {
> my ($self, $command, $section, $option) = @_;
>
> if ($option eq 'external') {
> my $links = $self->{links};
> push(@$links, $command);
> $command =~ s/\.html$//;
> $command =~ s/-/ /g;
> push(@$links, $command);
> return '\fI' . $command . '\fR\&[1]';
> } else {
> return '\fB' . $command . '\fR(' . $section . ')';
> }
> }
>
> sub common {
> my ($self, $text, $option) = @_;
>
> # escape backslashes, but not in "\n", "\&" or "\fB"
> $text =~ s/\(?!n|f[A-Z]|&)/\\/g;
> # escape "-"
> $text =~ s/-/\-/g;
> # handle ...
> $text =~ s/(\.\.\.)/\&/g;
> # remove double space after full stop or comma
> $text =~ s/([\.,]) / /g;
>
> if ($option ne 'no-markup') {
> # make 'italic'
> $text =~ s/'([^'\n]*)'/\fI\fR/g;
> # ignore `
> $text =~ s/`//g;
> # make *bold*
> $text =~ s/\*([^\*\n]*)\*/\fB\fR/g;
> # handle <<sections>
> $text =~ s/<<([^>]*)>>/the section called \(lq\(rq/g;
Hmm, that regex would not match for <<foo > bar>>, if you care you'd
need to write something like <<((?:[^>]+|>[^>])*)>>
quoted text > }
>
> return $text;
> }
>
> sub normal {
> my ($self, $text) = @_;
>
> if ($text eq "") {
> return;
> }
>
> $text = $self->common($text);
>
> $text =~ s/ *\n(.)/ /g;
>
> if ($self->{last_op} eq 'normal') {
> print "\n";
> }
>
> print $text;
>
> $self->{last_op} = 'normal';
> }
>
> sub verse {
> my ($self, $text) = @_;
>
> $text = $self->common($text);
> $text =~ s/^\t/ /mg;
>
> print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n";
>
> $self->{last_op} = 'verse';
> }
>
> sub enumeration {
> my ($self, $text, $option) = @_;
>
> my $counter = 0;
> foreach $line (@$text) {
> $counter++;
> print ".TP 4\n"
> . ($option eq 'unnumbered' ? '\(bu' : $counter . '.')
> . "\n"
> . $self->common($line);
> }
>
> $self->{last_op} = 'enumeration';
> }
>
> sub begin_item {
> my ($self, $item, $text) = @_;
>
> $item = $self->common($item);
> $text = $self->common($text);
>
> $text =~ s/([^\n]) *\n([^\n])/ /g;
"." is the same as [^\n] (without the 's' modifier).
quoted text >
> print ".PP\n" . $item . "\n.RS 4\n" . $text;
>
> $self->{last_op} = 'item';
> }
>
> sub end_item {
> my ($self) = @_;
>
> print ".RE\n";
>
> $self->{last_op} = 'end_item';
> }
>
> sub indent {
> my ($self, $text) = @_;
>
> $text = $self->common($text, 'no-markup');
> $text =~ s/^\t/ /mg;
>
> if ($self->{last_op} eq 'normal') {
> print "\n";
> }
>
> print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n";
>
> $self->{last_op} = 'indent';
> }
>
> sub verbatim {
> my ($self, $text) = @_;
>
> $text = $self->common($text, 'no-markup');
>
> # convert tabs to spaces
> $text =~ s/^\t/ /mg;
> # remove trailing empty lines
> $text =~ s/\n\n*$/\n/;
>
> if ($self->{last_op} eq 'normal') {
> print "\n";
> }
>
> print ".sp\n.RS 4\n.nf\n.ft C\n" . $text . ".ft\n\n.fi\n.RE\n";
>
> $self->{last_op} = 'verbatim';
> }
>
> sub anchor {
> my ($self, $text) = @_;
>
> $self->{last_op} = 'anchor';
> }
>
> sub finish {
> my ($self) = @_;
> my $links = $self->{links};
>
> if ($#$links >= 0) {
> print '.SH "REFERENCES"' . "\n";
> my $i = 1;
> while ($#$links >= 0) {
just use if (@$links) and while (@$links)
quoted text > my $ref = shift(@$links);
> $ref =~ s/-/\-/g;
> my $label = shift(@$links);
> printf (".IP \"% 2d.\" 4\n%s\n.RS 4\n\%%%s\n.RE\n",
> $i++, $label, $ref);
> }
> } else {
> print "\n";
> }
> }
>
> package html_page;
>
> sub new {
> my ($class) = @_;
> my $self = {};
> bless $self, $class;
> return $self;
> }
>
> -- snap --
>
> Ciao,
> Dscho
>
> P.S.: I need to catch some Zs, and do some real work, so do not be
> surprised if I do not respond within the next 24 hours.
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to
majordomo@vger.kernel.org
> More majordomo info at
http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html