Ahh... that is what I thought.
The problem that we have to solve to fix this bug is twofold:
* CGI.pm does by default slight escaping (simple_escape from CGI::Util)
of _attribute_ values, but for obvious reasons it cannot do
unconditional escaping of tag _contents_ (because it can be HTML
itself).
This escaping, at least in CGI.pm version 3.10 (most current version
at CPAN is 3.43), is minimal: only '"', '&', '<' and '>' are escaped
using named HTML entity references (", &, < and >
respectively). simple_escape does not do escaping of control
characters such as ^X which are invalid in XHTML (in strict mode).
Note that IIRC escaping '<' and '>' in attributes is not strictly
necessary.
Gitweb relies on the fact that CGI.pm does escaping of attribute
values. We cannot escape attributes (e.g. "title" attribute with
(almost) full commit subject) as it is now, because it would lead
to double escaping. Fortunately it is possible to turn off
autoescaping by using $cgi->autoEscape(undef); note however that
we would have to do attribute escaping by ourself in the scope of
this declaration.
* Rules for escaping attribute values are slightly different for rules
for escaping HTML. For attribute values we have to escape '"'
because it is attribute delimiter, and '&' because it is escape
character; escaping '<' and '>' is not strictly necessary. For
escaping HTML we need to escape '<' and '>' because they introduce
tags, and '&' because it is escape character; escaping '"' is not
strictly necessary. It does not make sense to replace spaces by
in attribute values, although it shouldn't harm. OTOH we
should perhaps escape newlines in attribute values.
For esc_html and esc_path we replace (currently) control characters
by character escape codes (e.g. "\f" for form-feed, "\0" for NUL,
hexadecimal escapes for 'other' control characters). But it is not
the only possible solution. We can use Unicode printable
representation of control characters instead (0x2400 sheet). Or we
can use control key sequence / caret notation e.g. ^X for \0x18,
or ^L for "\f" there. We probably should discus this in more detail.
So it is not that simple...
P.S. The subject (one line summary of this change) should be also
changed to for example "gitweb: escape control characters in attributes"
and in commit message itself you should explain that control characters
break rendering in Firefox in strict XML compliance mode... or something
like that.
--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html