Hi, I wanted to ask what the general stance towards shell script cleanups and simplifications would be. For example, I find the expr usage quite inscrutable in commit, and there is no necessity of putting "shift" in every case branch instead of once behind it, and a lot of conditionals and other manipulations can be made much easier on the eye by using parameter expansion patterns that are, as far as I can see, available with every reasonable Bourne Shell and clones. Here is an example context diff (in this case, I find it more readable than unified) to illustrate (untested!, please don't apply without a regular formatted git patch). Should I bother doing such cleanups as I read up on code, or should I just leave things alone?
You can't do something like that on /bin/sh on many systems (for instance Solaris). Robert --=20 Robert Schiele Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com "Quidquid latine dictum sit, altum sonatur."
I have no authority over the git project, but please consider this argument: Every time you submit a patch there are three costs: 1. The time you put into making the patch. 2. The time required for the maintainer to review the patch and possibly merge it into the code base. 3. The risk that you may have accidentally broken something. Obviously, you aren't too concerned about 1 (the cost to you), because you're willing to do that work. However, if I were Junio, I wouldn't be willing to "spend" costs 2 and 3 on a patch that didn't either fix a problem or provide a new feature. So, I recommend you do the clean-up that you want to do on your own local branch. This will no doubt be fun and educational for you. I know I've learned a lot in the past by experimentally "cleaning up" old ugly code on other projects, even though the result never made it into the official code base. Along the way, you will probably find real bugs. When you do, submit patches for them based on the current master branch. You can probably manage to sneak a bit of clean-up into those bug-fixing patches, as long as you make sure it is all relevant to fixing the bugs and you keep the patches readable. Best Wishes, Bradford C Smith p.s. I should also point out that writing portable shell scripts is far from trivial, so it is very difficult to be certain that what works for you will work for someone with a different shell. -
For any decent codebase there is a need to keep the code clean. Being part of the linux-kernel community we see clean-up patches each day and a lot are applied. Even spelling errors in comments are sometimes applied. Do not underestimate the value of a clean codebase. Sam -
The shift in parameter parsing case arms were originally
generated by an automated tool. If it bothers you, feel free to
move them at the end, I would not mind. In fact, handcrafted
parameter parser in other scripts do use shift-at-the-end.
As to Bourne-ness of the shell script, please realize that your
maintainer is very old fashioned ;-), but is willing to be
taught new tricks within reason.
We try to limit ourselves to -, =, ?, + (and their colon "if
empty" variants when it really make sense) in parameter
expansion of shell variables. We also use % and # (and their
"match largest" variants).
Non POSIX substitions such as ${parameter/pattern/string} and
${parameter:offset} are not to be used. We do not want to
depend on bash.
We try to avoid [ ] and instead spell "test" explicitly; this is
just a personal taste, and not about portability but more about
readability.
After 1.5.3 git-commit.sh will hopefully become built-in, so I
would rather not touch the script. Certainly, the kind of
change that is "intended to be style-only but somebody needs to
make sure it does not introduce regression to everybody's shell"
is very unwelcome at this point.
-
There is in a test (t5300-pack-objects.sh) but I guess the restrictions do not apply on tests. -- Duy -
Oh, but they definitely should. Precisely on those platforms with shells not generally in use by the developers it becomes _most_ important to reliably be able to trace failed tests to problems with the _commands_, not problems with the tests. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
That would have been an earlier mistake. Keeping tests portable is important, perhaps it might be of lessor importance but still. -
You do?
Indeed:
-*- mode: grep; default-directory: "/home/tmp/git/" -*-
Grep started at Thu Aug 2 22:47:30
grep -nH -e '\${[a-zA-Z0-9_]*[#%]' *.sh
git-am.sh:146: resolvemsg=${1#--resolvemsg=}; shift ;;
git-clone.sh:358: destname="refs/$branch_top/${name#refs/heads/}" ;;
git-clone.sh:360: destname="refs/$tag_top/${name#refs/tags/}" ;;
git-filter-branch.sh:361: ref="${ref#refs/tags/}"
git-pull.sh:98: curr_branch=${curr_branch#refs/heads/}
git-rebase.sh:93: eval GITHEAD_$cmt='"${cmt_name##refs/heads/}~$(($end - $msgnum))"'
git-stash.sh:52: branch=${branch#refs/heads/}
Grep finished (matches found) at Thu Aug 2 22:47:31
I am confused now: a different poster adamantly stated that /bin/sh on
Solaris did not support those constructs, and that every functionality
Sure. What about the git-rebase line using $(($end - $msgnum)) ?
Too bad: this should mean that $EDITOR can get called from C... I've
Understood. But using ${...#...} and ${...:+...} does not exactly
seem to be news in the git code base. Even though we have the claim
that Solaris' sh won't deal with the former.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
Is it really risque? I do not think we have heard trouble with
the arith expansion from anybody. A few mistakes in the past
made that said things like:
$((end - 1)) ;# wrong... say "$end" if you mean variable
$((cd ...; pwd)) ;# wrong... say $( (...)) if command substitution
# that involves subshell
I do not think we have trouble with ${parameter#word}. Much
less with ${parameter+word}; it has been in /bin/sh forever.
-
You might find this thread amusing.
http://thread.gmane.org/gmane.comp.version-control.git/7116/focus=7136
Historically, I have even avoided accepting ${var#word}, ${var%word},
and arithmetic expansions.
I would still reject shell arrays.
-
I learnt Unix with a Banaham/Rutter primer in the early eighties. I got hit so often by the "this is now supposed to work in Bourne shells?" surprise it wasn't funny. The first time I saw "for ((i=0; i<$NR; i++)) ..." I thought the author had been smoking too much C and got things confused. It still creeps me out. I am more comfortable doing arithmetic with dc rather than sh. Employing the existing globbing machinery for # and %, on the other hand, seems quite bournesque (still-bourne sounds so ugly) to me. And it is certainly quite more readable than the regexp/expr stuff. If it works. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Basically this should mean that the proposed cleanups (apart from a
forgotten shift I had to add) are tenable.
Given that another poster claimed that Solaris /bin/sh does not
support ${parameter#word}, making the suggested changes to git-commit
might actually be a good idea: ${parameter#word} is used in half a
dozen other (likely less used) utilities in various other places. If
this is an overlooked regression, we want to make it non-overlookable
while we are still in testing, and git-commit would appear to be the
perfect candidate for that...
Depending on the feedback, we can either replace _all_ uses
everywhere, or accept it for good.
While I would think it perfectly understandable if you wanted to avoid
making an infamous "breaks all of Solaris release", _if_ ${...#...}
would indeed be fishy (and I somewhat doubt it), we are already there.
I have this cleaned-up version of git-commit.sh on a computer I can't
access right now. I'll post the patch tomorrow. Whether you want to
apply it to git.git remains at your discretion. I would, however,
strongly urge Solaris and potentially other POSIXly impaired users to
aplly and test this patch: if it breaks (and it will do so pretty
obviously, pretty much being unable to parse any option), then this is
_quite_ alarming with regard to existing uses of ${...#...} and would
need to get addressed _very_ soon.
Frankly, I doubt that this would have escaped notice so far, however.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
Well, those are not really the most challenging one. Thus you should either
test on more or just believe those people that have other shells that it do=
es
No, you should read the mails you are refering to. I said that the most
important stuff does work. Apparently this did not yet hurt me on the
platform. Thus we have to decide whether we want some textbook example code
and thus break this platform completely or whether we want to fix the issues
Bad on Solaris:
$ uname -a
SunOS solaris10-x64 5.10 Generic i86pc i386 i86pc
$ end=3D1
$ msgnum=3D5
$ echo $(($end - $msgnum))
syntax error: `(' unexpected
Why is it bad to call the editor from C?
$ uname -a
SunOS solaris10-x64 5.10 Generic i86pc i386 i86pc
$ parameter=3Dbla
$ echo ${parameter#word}
bad substitution
That one is ok for Solaris.
Robert
--=20
Robert Schiele
Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com
"Quidquid latine dictum sit, altum sonatur."
The "issues" are with Solaris, apparently. There is always a price
for portability. If Solaris users can fix their problems with a
global search and replace of the first line in *.sh, the question is
whether it is worth the hassle of having unreadable but "portable"
You are missing the line
$ echo $0
which is probably the most interesting one... we don't need to be
compatible with everything having a "$ " prompt, just with everything
See the rationale in my recently posted patch for implementing
EDITOR/VISUAL support. One needs to shell-quote stuff properly, and
If you prepare a patch replacing all existing ${parameter#word} uses
and get it accepted, I will not push for inclusion of my cleanup.
But you _really_ should go for it _now_.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
I am in the camp of avoiding "it is even in POSIX so it's your fault if your shell does not support it". We do not take POSIX too seriously in that way, although we do say "let's not use it, it is not even in POSIX". In other words, I've been trying to be, and as a result of that we are, fairly conservative. However, there is a line we need to draw when bending bacwards for compatibility, and I think a system that does not have a working command substitution $( ... ) is on the other side of that line. -
Not an issue. But apparently, ${parameter#word} is for Solaris. I'd
still like to get confirmation that it is indeed /bin/sh, but if it
is, the current code is not good for Solaris.
So the line of compatibility is much more interesting for this one.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
I happen to feel ${parameter#word} is more esoteric than $(cmd).
If a system does not even do the latter, then avoiding the
former to help such a system is a futile effort.
And /bin/sh my Solaris box does not understand $(cmd) and wants
you to say `cmd`. Of course % and # parameter substition do not
work there.
People could probably say SHELL_PATH=/usr/xpg4/bin/sh there to
get a saner shell, even if they do not like bash, though.
-
The situation is that we currently don't avoid the former. Robert
said that he had prepared a patch that would do so.
It would make sense to either encourage him to present his patch
(though we probably don't know for sure that there are indeed shells
for which the former works worse than the latter), or permit further
use of ${parameter#word} where it makes things more readable.
But "only a little bit of ${parameter#word}, please" seems pointless.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
Absolutely. And we started to adopt #/% substititions some time
ago. Undoing them just feels going backwards, and we need to
judge what the merits of going backwards are.
For that discussion, /bin/sh on Solaris does not count. There
are huge downside of rewriting scripts to work with stock
Solaris /bin/sh:
(1) that shell does not even grok $(cmd) substitution.
I won't accept a half-baked patch that replaces "$(" with a
backtick and matching ")" with another backtick. You need
to at least make sure your interpolated variables within
the backtick pair work sensibly, and you haven't broken
existing nesting of command interpolations, if any. I do
not even want to inspect, comment on and reject that kind
of changes. Quite frankly, it's not worth my time.
(2) Rewriting $(cmd) to `cmd`, and ${parameter#word} with sed
or expr would reduce readability, at least to other people.
Remember, I was the one who originally avoided modern
${parameter#word} substitutions, and older scripts had many
more invocations of expr than we currently have. Reading
such a backward rewrite would not be too much of a problem
for *me*, but other people also need to read and understand
scripts, if only to be able to rewrite them in C.
There may still be many old parts of the scripts that could
be made more readable and efficient using ${parameter#word}
substitutions. If we were to rewrite scripts, more use of
them could be a good thing, not the other way around.
Besides, on that platform there are more reasonable shells
available via SHELL_PATH, and it is not limited to going to
bash.
-
Ok, seems like the sort of cleanups I proposed would not clash with current git policies. I'll readily agree that the timing of their adoption might not really fit with a rc4, but posting them for the queue does not seem outrageous. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
And that's actually not even _half_ of the deal: we are talking about pandering to legacy shells here, and the amount of variance of just what level of quoting/backslashing is needed on the inside of `...` in order to get stuff through with just the right level of quoting is actually stunning. I've had my fair share of bad surprises with portable scripts. Getting a backquote mechanism running on one shell does not mean it will work on another. Basically, you have to forego nesting stuff and split it out into small units in separate commands. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Well, if you want to see it, just tell me. Do you want to have it with or without the arithmetic replacements I did as well? Robert --=20 Robert Schiele Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com "Quidquid latine dictum sit, altum sonatur."
From Junio's answer, I gather that indeed this would seem pointless.
$(...) is not going away anytime soon, and I have seen no evidence
that there is a shell in widespread use that supports it, but doesn't
support ${...#...}.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
Well, I have now a patch ready for submission that would replace all
occurences of ${PARAMETER#WORD}, ${PARAMETER%WORD}, and $(( EXPRESSION )).
But if you say that you won't accept replacement of $( ... ) then this is n=
ot
worth the effort since this one isn't accepted as well.
Robert
--=20
Robert Schiele
Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com
"Quidquid latine dictum sit, altum sonatur."
I absolutely can't understand your claim that most things work for
you, then. Are you _really_, _really_ sure you are talking about
/bin/sh here?
Look:
-*- mode: grep; default-directory: "/home/tmp/git/" -*-
Grep started at Fri Aug 3 01:29:41
fgrep -nH -e '$(' *.sh
check-builtins.sh:6: $(foreach b,$(BUILT_INS),echo XXX $b YYY;)
git-am.sh:24: cmdline=$(basename $0)
git-am.sh:83: his_tree=$(GIT_INDEX_FILE="$dotest/patch-merge-index" git write-tree) &&
git-am.sh:84: orig_tree=$(cat "$dotest/patch-merge-base") &&
git-am.sh:160: last=$(cat "$dotest/last") &&
git-am.sh:161: next=$(cat "$dotest/next") &&
git-am.sh:216: files=$(git diff-index --cached --name-only HEAD) || exit
git-am.sh:223:if test "$(cat "$dotest/binary")" = t
git-am.sh:227:if test "$(cat "$dotest/utf8")" = t
git-am.sh:233:if test "$(cat "$dotest/keep")" = t
git-am.sh:238:if test "$(cat "$dotest/sign")" = t
git-am.sh:301: GIT_AUTHOR_NAME="$(sed -n '/^Author/ s/Author: //p' "$dotest/info")"
git-am.sh:302: GIT_AUTHOR_EMAIL="$(sed -n '/^Email/ s/Email: //p' "$dotest/info")"
git-am.sh:303: GIT_AUTHOR_DATE="$(sed -n '/^Date/ s/Date: //p' "$dotest/info")"
git-am.sh:313: SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
git-am.sh:415: unmerged=$(git ls-files -u)
git-am.sh:453: tree=$(git write-tree) &&
git-am.sh:455: parent=$(git rev-parse --verify HEAD) &&
git-am.sh:456: commit=$(git commit-tree $tree -p $parent <"$dotest/final-commit") &&
git-bisect.sh:59: head=$(GIT_DIR="$GIT_DIR" git symbolic-ref HEAD) ||
git-bisect.sh:92: orig_args=$(sq "$@")
git-bisect.sh:102: rev=$(git rev-parse --verify "$arg^{commit}" 2>/dev/null) || {
git-bisect.sh:127: rev=$(git rev-parse --verify HEAD) ;;
git-bisect.sh:129: rev=$(git rev-parse --verify "$1^{commit}") ;;
git-bisect.sh:141: echo "# bad: "$(git show-branch $rev) >>"$GIT_DIR/BISECT_LOG"
git-bisect.sh:147: 0) revs=$(git rev-parse --verify HEAD) || exit ;;
git-bisect.sh:148: *) revs=$(git rev-parse --revs-only --no-flags "$@") ...I started wondering myself and it turned out that we just didn't look in the right place. Actually we _have_ an infrastructure in place to replace the shell. (SHELL_PATH in the Makefile) In that case I would not consider this an issue and you might go on with the cleanup from my point of view. Robert --=20 Robert Schiele Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com "Quidquid latine dictum sit, altum sonatur."
I was starting to doubt my sanity here. Sorry that this thread resulted in wasted work for you. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
That's ok. I mean I didn't spot the SHELL_PATH thing in the first place although I already had seen that one before (what I remember now). At least in that case I don't have to worry any longer since I always can p= ut some compliant shell at some random place and use that one. But at least you have learned now that not everything in the real world loo= ks like as it is written in some standard books. ;-) Robert --=20 Robert Schiele Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com "Quidquid latine dictum sit, altum sonatur."
You are aware that I am the maintainer of AUCTeX? Which has an autoconf-based setup that works under AIX, Solaris, MinGW, Cygwin, HP/UX and a few other oddities, all with the respective native tools? I know more about stupid shells than I really want to. But that does not change that I could not imagine a shell such as yours to work "mostly" with the current git code base. For autoconf, it is fine to call sed (and you would probably not believe how small the portable language subset for sed is) all the time. Performance is not an issue. For normal user commands, this is different. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Wow, you made me recall a painful past for myself. AIX's sed was so bad that I had to send in a few bugfixes to autoconf to work it around. I do not know if I should feel happy to find somebody to commiserate with... -
It depends on how you work. I for example always just push any change to a linux machine where I do the actual integration work. Thus I don't need mu= ch Ok, but then we needed an infrastructure to replace the shell with a Your way of telling people that you are considering everybody besides you to be a moron is somehow insulting. You might wish to change that. I have it ready now. Just waiting for the answer of Junio to my last mail. Robert --=20 Robert Schiele Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com "Quidquid latine dictum sit, altum sonatur."
Is this with /usr/xpg4/bin/sh or /bin/sh? The latter is not POSIX and should not be used by GIT, IMHO, otherwise there will be endless issues in less-well-tested code paths. Is rewriting the shebang lines to use the POSIX shell an option for GIT? -
Hi Florian, I recommend you read the other mails in this thread. This issue is already completely resolved. Robert --=20 Robert Schiele Dipl.-Wirtsch.informatiker mailto:rschiele@gmail.com "Quidquid latine dictum sit, altum sonatur."
To be a bit more helpful, a short summary is:
* SHELL_PATH in Makefile lets you munge installed scripts;
* /bin/ksh is usable on Solaris and xpg4 is fine too;
* We need to draw a line somewhere, and the current rule is
that we cannot afford to support a shell that does not
understand $(cmd) substitution, as we haven't seen a shell
that supports ${parameter#word} but not $(cmd).
-
