Re: UTF problems (bugs)

14 Mar 2006


      Håkan Lindqvist wrote:
...
On tis, 2006-03-14 at 16:05 +0100, Thomas Bruederli wrote:
...
If a message specifies it's charset in the Content-Type header, RC will
attempt to convert it to UTF-8. This does not work for HTML messages
that have chars encoded with html entities. A decoding function handling
html entities has to be written for that. Anyone?
But aren't HTML entities already charset agnostic?!
I guess they aren't. An entity like &#252; represents a single byte char
(ASCII 252; "ü" in ISO-8859-1). As far as I know the browser will not
display this entity correctly because it expects double-byte characters.
Please correct me if I'm wrong...
...
Do really HTML entities have to be transformed in any way?
(Sorry that I jump into this discussion without fully reading up on what
has been said before, but it just sounds so weird to me.)
/Håkan
~Thomas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: UTF problems (bugs)