Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Jakub Narebski
Date: Friday, April 24, 2009 - 3:10 pm

On Fri, 24 April 2009, Paul Gortmaker wrote:



Ahh... that is what I thought.

The problem that we have to solve to fix this bug is twofold:

 * CGI.pm does by default slight escaping (simple_escape from CGI::Util)
   of _attribute_ values, but for obvious reasons it cannot do
   unconditional escaping of tag _contents_ (because it can be HTML
   itself).

   This escaping, at least in CGI.pm version 3.10 (most current version
   at CPAN is 3.43), is minimal: only '"', '&', '<' and '>' are escaped
   using named HTML entity references (&quot;, &amp;, &lt; and &gt;
   respectively).  simple_escape does not do escaping of control
   characters such as ^X which are invalid in XHTML (in strict mode).
   Note that IIRC escaping '<' and '>' in attributes is not strictly
   necessary.

   Gitweb relies on the fact that CGI.pm does escaping of attribute
   values.  We cannot escape attributes (e.g. "title" attribute with
   (almost) full commit subject) as it is now, because it would lead
   to double escaping.  Fortunately it is possible to turn off
   autoescaping by using $cgi->autoEscape(undef); note however that
   we would have to do attribute escaping by ourself in the scope of
   this declaration.

 * Rules for escaping attribute values are slightly different for rules
   for escaping HTML.  For attribute values we have to escape '"'
   because it is attribute delimiter, and '&' because it is escape
   character; escaping '<' and '>' is not strictly necessary.  For
   escaping HTML we need to escape '<' and '>' because they introduce
   tags, and '&' because it is escape character; escaping '"' is not
   strictly necessary.  It does not make sense to replace spaces by
   &nbsp; in attribute values, although it shouldn't harm.  OTOH we
   should perhaps escape newlines in attribute values.

   For esc_html and esc_path we replace (currently) control characters
   by character escape codes (e.g. "\f" for form-feed, "\0" for NUL,
   hexadecimal escapes for 'other' control characters).  But it is not
   the only possible solution.  We can use Unicode printable
   representation of control characters instead (0x2400 sheet).  Or we
   can use control key sequence / caret notation e.g. ^X for \0x18,
   or ^L for "\f" there.  We probably should discus this in more detail.

So it is not that simple...


P.S. The subject (one line summary of this change) should be also
changed to for example "gitweb: escape control characters in attributes"
and in commit message itself you should explain that control characters
break rendering in Firefox in strict XML compliance mode... or something
like that.
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [PATCH] gitweb: filter escapes from longer commit titl ..., Jakub Narebski, (Fri Apr 24, 3:10 pm)