Eric Stadtherr wrote:
It is a very specific problem, but a common problem nonetheless. For example, HTML 5 *requires* this misinterpretation:
http://dev.w3.org/html5/spec/Overview.html#character-encodings-0
In favor of interpreting the encoding label for what it is, and keeping a clearly defined behavior, and at the same time not incurring the performance penalty on properly labelled messages due to the regex search, I have the following suggestion:
On iso-8859-1-labelled messages, provide a "fix encoding" button in an unobtrusive place, like the lower edge of the message. Users can then click this button when they see an encoding problem.
When the button is clicked RC would re-read the message, interpreting the iso-8859-1 part as windows-1252.
Sincerely, Sebastian