[RCD] URLs with 8bit chars?

Rimas Kudelis rq at akl.lt
Sat Feb 22 17:51:19 CET 2014


2014.02.22 17:02, Thomas Bruederli wrote:
> On Sat, Feb 22, 2014 at 3:47 PM, Rimas Kudelis <rq at akl.lt> wrote:
>> Regarding difficulty of detection, I would dare to disagree with you as
>> well. Since PHP 5.1, PCRE has had support for Unicode character properties,
>> so I'm pretty sure that it must be possible to add all alphanumeric
>> characters to your regex easily.
> I certainly agree to this. And we'd very much appreciate any
> contribution for this, preferably in terms of a regex that detect
> unicode URLs or even better with a set of text cases that demonstrate
> the correct detection of real and false urls within plain text.

I could take a look if you point me to the right file to edit.

>> [1] http://en.wikipedia.org/wiki/.%D1%80%D1%84 . Note how this looks hardly
>> readable compared to http://en.wikipedia.org/wiki/.рф .
> A possible optimization on our side could be to decode the URL
> encoding (and punycode) when displaying links in message view. This
> however, alters the actual message content which might be undesirable.

I don't think there's need for that. Especially if the assumption was 
that you can just write URL's as you see them.


