Re: [RCD] Ticket #1484429 - Dev

11 Sep 2007

Hi.
I thought mbstring functions can handle chr(160) in UTF-8.
But, when I test my patch, mbstring return false and report error 
"mb_strpos(): Unknown encoding or conversion error.".
like this:
mb_internal_encoding("UTF-8");
$test_str = "abcd".chr(160)."efg";
mb_strpos($test_str, chr(160)) ==> false
I searched web sites.

The chr(160) in UTF-8 is incorrect.
(http://en.wikipedia.org/wiki/UTF-8)
In UTF-8, no-break space(nbsp) is 0xC2 0xA0.
(http://www.fileformat.info/info/unicode/char/00a0/index.htm)

So, how about this?
===================================

--- main.inc_   2007-09-03 16:10:32.000000000 +0900 (rev 774)
+++ main.inc    2007-09-12 01:05:31.000000000 +0900 (changed)
@@ -1103,6 +1103,17 @@
   return $str;
   }
+function mb_str_replace($search_str, $replace_str, $str)

{
$current_pos = 0;
while (($found_pos = mb_strpos($str, $search_str, $current_pos)) !==

false)

{
$str = mb_substr($str, 0, $found_pos).$replace_str.mb_substr($str,

$found_pos + mb_strlen($search_str));

$current_pos = $found_pos + strlen($replace_str);
}

return $str;
}

/**

Replacing specials characters to a specific encoding type

@@ -1123,7 +1134,12 @@
// convert nbsps back to normal spaces if not html
   if ($enctype!='html')

$str = str_replace(chr(160), ' ', $str);


{

if ($OUTPUT->get_charset()=='UTF-8')

 $str = mb_str_replace(chr(194).chr(160), ' ', $str);


else

 $str = str_replace(chr(160), ' ', $str);


}
// encode for plaintext
if ($enctype=='text')


===================================
Yoshikazu.
On Wed, 5 Sep 2007 03:05:21 +0200, till klimpong@gmail.com wrote:
...
On 9/3/07, Yoshikazu Tsuji yskzt@church.ne.jp wrote:
...
Hi.
The following code causes Ticket #1484429
(http://trac.roundcube.net/trac.cgi/ticket/1484429).
=============================================================
= "program/include/main.inc" function rep_specialchars_output
// convert nbsps back to normal spaces if not html
  if ($enctype!='html')
    $str = str_replace(chr(160), ' ', $str);
=============================================================
This problem is happened in multibyte enviroment (japanese too).
In message list, function rep_specialchars_output garbled
UTF-8 message subjects.
Is converting chr(160) to space really necessary ?
This is patch using multi byte functions.
===============================================================
--- main.inc_   2007-09-03 16:10:32.000000000 +0900
+++ main.inc    2007-09-03 16:22:59.000000000 +0900
@@ -1122,8 +1122,17 @@
     $enctype = $GLOBALS['OUTPUT_TYPE'];
// convert nbsps back to normal spaces if not html

if ($enctype!='html')
$str = str_replace(chr(160), ' ', $str);


if ($enctype!='html') {
$current_pos = 0;
while(true) {
 $found_pos = mb_strpos($str, chr(160), $current_pos);


 if($found_pos == false)


   break;



 $str = mb_substr($str, 0, $found_pos)." ".mb_substr($str,



$found_pos
...

1, mb_strlen($str));

 $currentpos += 1;


}

}
// encode for plaintext
if ($enctype=='text')


===============================================================
multibyte looks like the better alternative, especially since we are
dealing with people from different countries. And since we are using
mb already, I have no issues with this.
Just one thing, can you add this to the trac? Please? :)
Thanks,
Till

List info: http://lists.roundcube.net/dev/