Devs,
I recently received a bulk e-mail from an event organizer that displayed in RoundCube (using Firefox 3) with the little square hex-code glyphs in place of some of the punctuation marks. I researched why this was happening, and tracked it down to an encoding issue.
The text/html message part in the e-mail source specified iso-8859-1 encoding. After RoundCube converted the message part to UTF-8, there were still non-UTF8 characters in the resulting text. One such character was 0x92, which is not even a valid iso-8859-1 character. It turns out that the message originator must have been using Windows-1252 encoding (in which 0x92 is a single-quote character, which was correct in the context in which it appeared), but incorrectly specified iso-8859-1 encoding in the MIME message.
The Windows-1252 character set is effectively a superset of the iso-8859-1 character set, replacing some of the seldom-used control character code points with additional punctuation and accent characters. Some mail agents incorrectly blur the line between these two encodings, and send Windows-1252 characters in iso-8859-1 messages.
The following workaround (in rcube_charset_convert()) corrects the issue (at least for my one test case):
// Workaround for mail agents that include Windows-1252 characters // in text advertised as ISO-8859-1 if ($from == "ISO-8859-1" && preg_match("/[\x80-\x9F]/", $str)) $from = "WINDOWS-1252";
What does everyone think of including a workaround like this? I'm generally reluctant to work around improper behavior from other software, but this particular kind of relaxed interpretation seems common (check out the ISO-8859-1 page on Wikipedia).