Hello,
I think the functions for UTF7-UTF8 encoding/decoding are not performing well, especially on the folders (view/create/rename folder).
This happens mostly on non ISO-8859-1 charsets (in the source it shows that this function is not yet done).
Is there a chance we get IMP's enc/decoding functions and use those instead? They work nicely in any system I tested IMP on.
If someone wants privately to work with me on this issue please contact me directly. I am willing to spend a lot of my time in testing and debugging. I tried to fix this myself but I'm not good enough and even though I managed to decode folders fine (these folders contained foreign chars created by other programs) I messed things up with other functions.
So basically I need someone who knows what he's doing in order to fix this together.
Also, I propose that wherever needed, we should make the charset conversion assumptions configurable by the admin and not have them hardcoded to ISO-8859-1.
If someone wants to work on this with me (I have a test system showcasing the issues which can be used for testing the fixes as well) please contact me directly.
Thanks, Eric
I don't know if anyone received this previously... it does not show in the web archive of the list so I'm resending.
Apologies if you already received it.
Eric
---------- Forwarded message ---------- From: Eric Liang ericliang2@gmail.com Date: Aug 8, 2006 1:29 PM Subject: UTF7 - UTF8 folder encoding errors To: dev@lists.roundcube.net
Hello,
I think the functions for UTF7-UTF8 encoding/decoding are not performing well, especially on the folders (view/create/rename folder).
This happens mostly on non ISO-8859-1 charsets (in the source it shows that this function is not yet done).
Is there a chance we get IMP's enc/decoding functions and use those instead? They work nicely in any system I tested IMP on.
If someone wants privately to work with me on this issue please contact me directly. I am willing to spend a lot of my time in testing and debugging. I tried to fix this myself but I'm not good enough and even though I managed to decode folders fine (these folders contained foreign chars created by other programs) I messed things up with other functions.
So basically I need someone who knows what he's doing in order to fix this together.
Also, I propose that wherever needed, we should make the charset conversion assumptions configurable by the admin and not have them hardcoded to ISO-8859-1.
If someone wants to work on this with me (I have a test system showcasing the issues which can be used for testing the fixes as well) please contact me directly.
Thanks, Eric
Eric Liang wrote:
I don't know if anyone received this previously... it does not show in the web archive of the list so I'm resending.
People, if you're having trouble getting messages through to the list (or any other list problems for that matter), please consult postmaster@lists.roundcube.net. Do *not* post (or in this case, repost) to the mailing lists itself.
Eric, I don't know what web archive you looked at, but both your messages are there:
http://lists.roundcube.net/mail-archive/roundcube.dev/2006/08/59/ http://lists.roundcube.net/mail-archive/roundcube.dev/2006/08/64/
Bob (list-mom)
2006/8/8, Eric Liang ericliang2@gmail.com:
Hello,
I think the functions for UTF7-UTF8 encoding/decoding are not performing well, especially on the folders (view/create/rename folder).
This happens mostly on non ISO-8859-1 charsets (in the source it shows that this function is not yet done).
You're right, I forgot about that.
Is there a chance we get IMP's enc/decoding functions and use those instead? They work nicely in any system I tested IMP on.
IMP uses the PHP integrated IMAP functions and as far as I could see do thex handle the charset conversion internally.
If someone wants privately to work with me on this issue please contact me directly. I am willing to spend a lot of my time in testing and debugging. I tried to fix this myself but I'm not good enough and even though I managed to decode folders fine (these folders contained foreign chars created by other programs) I messed things up with other functions.
So basically I need someone who knows what he's doing in order to fix this together.
I just committed some changes that should solve these problems. It works well with my mailbox but I only use ISO characters. Please checkout the latest revision and test it with your environment.
~Thomas
First of all, thank you for this. It made things better. New comments are inline.
On 8/10/06, Thomas Bruederli roundcube@gmail.com wrote:
IMP uses the PHP integrated IMAP functions and as far as I could see do thex handle the charset conversion internally.
I don't know - I have just tested IMP and Group-Office lately and they both seem to treat folders nicely from a user's point of view. I am not that familiar with how they do it.
I just committed some changes that should solve these problems. It
works well with my mailbox but I only use ISO characters. Please checkout the latest revision and test it with your environment.
These are the results:
middle of the string) incorrectly. This happened before, I just didn't mention it to the dev list until now. This means that the PHP function should check the length of string as multibyte and cut it as such. Currently I have folders that have AAA?...AAAAA (where AAA=multibyte chars). The question mark (?) is shown because the second byte of the character is cut so it's substituted by ? (therefore I assume that the PHP fuctions do not treat this string as multibyte before checking/converting/minimizing length). There are multibyte folder name with real length of 8 chars that are cut and English folders that are 10 or more and are not cut. So perhaps this behaviour should be examined.
nicely on the folder list (ie show), now the create/rename folder works flawlessly too (big thanks).
But when I use another language, for example Spanish, things get messed up, just like in the past. Let me know if I am allowed to send you screenshots (via personal email) to show what happens when charset is not ISO-8859-1, or perhaps a login account on such a mailbox.
Results (for non English charset): Create folder -> "error occured while creating folder" or similar error in the translated language View folder -> strange charset conversion (or no conversion at all?) is shown instead of the normal folder Rename folder -> "error occured while creating folder" or similar error in the translated language
I have noticed that the erratic folder behavior happens when I use specific languages like Slovak, Polski, Greek, Espanol, Arabic etc but not on Japanese, Russian, English (it works fine on those showing always the correct folder names as intended)
I can only assume it has to do with internal PHP charset conversions that only support some charsets and not others? Like what happens with html_entity_decode that supports only _some_ charsets: http://nl2.php.net/manual/en/function.html-entity-decode.php
Finally, I am using PHP 4.4.x branch to test.
Let me know how I can help further to solve this.
Your support is very much appreciated,
Eric
Also, browsing the SVN changes I noticed you fixed the search issue (file: program/steps/mail/search.inchttp://trac.roundcube.net/trac.cgi/browser/trunk/roundcubemail/program/steps/mail/search.inc?rev=305, $imap_charset = 'UTF-8'; ). I have spotted it 3-4 days ago but did not report as I considered it minor in comparison to folders issue.
I now report that searching works nicely on non EN charset I've tested with (rev 303 did not). I would advise other people to check also with their own non EN charsets.
Thanks again, Eric
On 8/10/06, Eric Liang ericliang2@gmail.com wrote:
First of all, thank you for this. It made things better. New comments are inline.
On 8/10/06, Thomas Bruederli roundcube@gmail.com wrote:
IMP uses the PHP integrated IMAP functions and as far as I could see do thex handle the charset conversion internally.
I don't know - I have just tested IMP and Group-Office lately and they both seem to treat folders nicely from a user's point of view. I am not that familiar with how they do it.
I just committed some changes that should solve these problems. It
works well with my mailbox but I only use ISO characters. Please checkout the latest revision and test it with your environment.
These are the results:
- In folders list the multibyte characters are cut (... is put in the
middle of the string) incorrectly. This happened before, I just didn't mention it to the dev list until now. This means that the PHP function should check the length of string as multibyte and cut it as such. Currently I have folders that have AAA?...AAAAA (where AAA=multibyte chars). The question mark (?) is shown because the second byte of the character is cut so it's substituted by ? (therefore I assume that the PHP fuctions do not treat this string as multibyte before checking/converting/minimizing length). There are multibyte folder name with real length of 8 chars that are cut and English folders that are 10 or more and are not cut. So perhaps this behaviour should be examined.
- When I use English GB language the folders work nicely. They used to
work nicely on the folder list (ie show), now the create/rename folder works flawlessly too (big thanks).
But when I use another language, for example Spanish, things get messed up, just like in the past. Let me know if I am allowed to send you screenshots (via personal email) to show what happens when charset is not ISO-8859-1, or perhaps a login account on such a mailbox.
Results (for non English charset): Create folder -> "error occured while creating folder" or similar error in the translated language View folder -> strange charset conversion (or no conversion at all?) is shown instead of the normal folder Rename folder -> "error occured while creating folder" or similar error in the translated language
I have noticed that the erratic folder behavior happens when I use specific languages like Slovak, Polski, Greek, Espanol, Arabic etc but not on Japanese, Russian, English (it works fine on those showing always the correct folder names as intended)
I can only assume it has to do with internal PHP charset conversions that only support some charsets and not others? Like what happens with html_entity_decode that supports only _some_ charsets: http://nl2.php.net/manual/en/function.html-entity-decode.php
Finally, I am using PHP 4.4.x branch to test.
Let me know how I can help further to solve this.
Your support is very much appreciated,
Eric
Eric Liang wrote:
First of all, thank you for this. It made things better. New comments are inline.
On 8/10/06, *Thomas Bruederli* <roundcube@gmail.com mailto:roundcube@gmail.com> wrote:
IMP uses the PHP integrated IMAP functions and as far as I could see do thex handle the charset conversion internally.
I don't know - I have just tested IMP and Group-Office lately and they both seem to treat folders nicely from a user's point of view. I am not that familiar with how they do it.
It uses a PHP extension that RoundCube does not require, means that we cannot do the same here.
I just committed some changes that should solve these problems. It works well with my mailbox but I only use ISO characters. Please checkout the latest revision and test it with your environment.
These are the results:
- In folders list the multibyte characters are cut (... is put in the
middle of the string) incorrectly. This happened before, I just didn't mention it to the dev list until now. This means that the PHP function should check the length of string as multibyte and cut it as such. Currently I have folders that have AAA?...AAAAA (where AAA=multibyte chars). The question mark (?) is shown because the second byte of the character is cut so it's substituted by ? (therefore I assume that the PHP fuctions do not treat this string as multibyte before checking/converting/minimizing length). There are multibyte folder name with real length of 8 chars that are cut and English folders that are 10 or more and are not cut. So perhaps this behaviour should be examined.
OK, this is correct, some PHP string functions are not multibyte save. I'll try to sort that out.
- When I use English GB language the folders work nicely. They used to
work nicely on the folder list (ie show), now the create/rename folder works flawlessly too (big thanks).
But when I use another language, for example Spanish, things get messed up, just like in the past. Let me know if I am allowed to send you screenshots (via personal email) to show what happens when charset is not ISO-8859-1, or perhaps a login account on such a mailbox.
Yes, please do. I cannot reproduce any different behavior when changing the language.
Results (for non English charset): Create folder -> "error occured while creating folder" or similar error in the translated language View folder -> strange charset conversion (or no conversion at all?) is shown instead of the normal folder Rename folder -> "error occured while creating folder" or similar error in the translated language
Do you have iconv or mbstring modules installed with your PHP? There's a ticket that complains about buggy mbstring implementation.
I have noticed that the erratic folder behavior happens when I use specific languages like Slovak, Polski, Greek, Espanol, Arabic etc but not on Japanese, Russian, English (it works fine on those showing always the correct folder names as intended)
I can only assume it has to do with internal PHP charset conversions that only support some charsets and not others? Like what happens with html_entity_decode that supports only _some_ charsets: http://nl2.php.net/manual/en/function.html-entity-decode.php
Well, it has nothing to do with html entities. The problem occurs when talking to the IMAP server.
Finally, I am using PHP 4.4.x branch to test.
As far as I could see in the Squirrelmail code, they use mbstring and if that is not available, UTF-7 conversion only works for ISO-885-1. Same in RoundCube. I have found a C-File that converts UTF-8 to UTF-7 and vice versa but I only have basic knowledge of C and it will take me many hours to rewrite it in PHP. Anybody out there with good C skills?
Let me know how I can help further to solve this.
Check your PHP if mbstring is installed.
Your support is very much appreciated,
Eric
Regards, Thomas