Håkan Lindqvist wrote:
On tis, 2006-03-14 at 16:05 +0100, Thomas Bruederli wrote:
If a message specifies it's charset in the Content-Type header, RC will attempt to convert it to UTF-8. This does not work for HTML messages that have chars encoded with html entities. A decoding function handling html entities has to be written for that. Anyone?
But aren't HTML entities already charset agnostic?!
I guess they aren't. An entity like ü represents a single byte char (ASCII 252; "ü" in ISO-8859-1). As far as I know the browser will not display this entity correctly because it expects double-byte characters.
Please correct me if I'm wrong...
Do really HTML entities have to be transformed in any way?
(Sorry that I jump into this discussion without fully reading up on what has been said before, but it just sounds so weird to me.)
/Håkan
~Thomas