Would it work to strip out all of the characters which are only in the
disallowed character set, change the character set to ascii, and check the
message like that? (Of course, empty or substantially reduced messages
should be discarded instead) It looks like the non-spam in these character
sets only has a few non-ascii characters, and those are transliterated
into ascii nearby anyway, and I doubt that there's much spam in non-ascii
with only a few characters in some other character set that wouldn't be
obvious in some other way after those characters were removed.
-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html