Hello,
Has a fulltext message searching feature been considered? I think it would be very handy. It could use a MySQL table to store hashes of the message text (with all HTML, nontext characters, and other formatting removed to reduce size). When a search is performed, it could utilize MySQL's FULLTEXT feature to deliver lightning fast search results similar to gmail, hotmail, etc.
It could maybe have an automated indexer that works in the background indexing one or two unindexed messages at a time on each ajax request from the browser (after the data is returned to the browser to prevent lagging).
The feature could of course be turned off (maybe off by default?) if the user didn't want to use the extra diskspace/processor cycles.
Does this sound like something you'd like to see in RC? If so, I can develop the backend for this system if somebody else can write the interface/ajax integration.
Thanks, Ben
List info: http://lists.roundcube.net/dev/
I recommend trying Sphinx. Most of the hard work has been taken care of already.
-----Original Message----- From: Ben [mailto:ben@forlent.com] Sent: Wednesday, October 17, 2007 12:16 PM To: dev@lists.roundcube.net Subject: [RCD] Fulltext message searching
Hello,
Has a fulltext message searching feature been considered? I think it would be very handy. It could use a MySQL table to store hashes of the message text (with all HTML, nontext characters, and other formatting removed to reduce size). When a search is performed, it could utilize MySQL's FULLTEXT feature to deliver lightning fast search results similar to gmail, hotmail, etc.
It could maybe have an automated indexer that works in the background indexing one or two unindexed messages at a time on each ajax request from the browser (after the data is returned to the browser to prevent lagging).
The feature could of course be turned off (maybe off by default?) if the user didn't want to use the extra diskspace/processor cycles.
Does this sound like something you'd like to see in RC? If so, I can develop the backend for this system if somebody else can write the interface/ajax integration.
Thanks, Ben
List info: http://lists.roundcube.net/dev/ _______________________________________________ List info: http://lists.roundcube.net/dev/
Ben wrote:
Hello,
Has a fulltext message searching feature been considered? I think it
would
be very handy. It could use a MySQL table to store hashes of the message text (with all HTML, nontext characters, and other formatting removed to reduce size). When a search is performed, it could utilize MySQL's
FULLTEXT
feature to deliver lightning fast search results similar to gmail,
hotmail,
etc.
It could maybe have an automated indexer that works in the background indexing one or two unindexed messages at a time on each ajax request
from
the browser (after the data is returned to the browser to prevent
lagging).
The feature could of course be turned off (maybe off by default?) if the user didn't want to use the extra diskspace/processor cycles.
Does this sound like something you'd like to see in RC? If so, I can develop the backend for this system if somebody else can write the interface/ajax integration.
i think it would be a much appreciated feature. and having the backend in place, i think one can easily extend it to search other information in the headers (e.g. explicitly search for to, mozilla flags, etc.)
thou i have no time in helping you implement it, i would gladly test it after i get some time after nov. 19th.
cheers, raoul
Wouldn't we need another "driver" for this? I mean, I know sphinx is great but it's nowhere as popular as myisam. ;) Even innodb is not as common in MySQL installations today.
If anyone wants to code this up, I have no objections. But including it by default wouldn't work out right now. :)
Till
On 10/17/07, Ethan Erchinger ethan@plaxo.com wrote:
I recommend trying Sphinx. Most of the hard work has been taken care of already.
-----Original Message----- From: Ben [mailto:ben@forlent.com] Sent: Wednesday, October 17, 2007 12:16 PM To: dev@lists.roundcube.net Subject: [RCD] Fulltext message searching
Hello,
Has a fulltext message searching feature been considered? I think it would be very handy. It could use a MySQL table to store hashes of the message text (with all HTML, nontext characters, and other formatting removed to reduce size). When a search is performed, it could utilize MySQL's FULLTEXT feature to deliver lightning fast search results similar to gmail, hotmail, etc.
It could maybe have an automated indexer that works in the background indexing one or two unindexed messages at a time on each ajax request from the browser (after the data is returned to the browser to prevent lagging).
The feature could of course be turned off (maybe off by default?) if the user didn't want to use the extra diskspace/processor cycles.
Does this sound like something you'd like to see in RC? If so, I can develop the backend for this system if somebody else can write the interface/ajax integration.
Thanks, Ben
List info: http://lists.roundcube.net/dev/
Ben wrote:
Hello,
Has a fulltext message searching feature been considered? I think it would be very handy. It could use a MySQL table to store hashes of the message text (with all HTML, nontext characters, and other formatting removed to reduce size). When a search is performed, it could utilize MySQL's FULLTEXT feature to deliver lightning fast search results similar to gmail, hotmail, etc.
Hi Ben,
Actually fulltext search is already available but it is currently done by the IMAP server. By preceding your search term with body: it will send an according request to the mail server. Depending on the IMAP software this is more or less fast.
I agree that searching should be done as close to the client as possible. Using the database we already have would be the best way I guess (apart from building a proprietary fulltext index), but can we make sure the index is complete right after the first login? This is what the user expects. But we don't have any passwords nor does RoundCube know about all users (we only know the ones who already logged in once).
Indexing a mailbox is something that requires communication between RC and the IMAP server in advance and over night. To achieve this we have to change the basics how RC manages user accounts.
Good idea but there are a few things to consider before we can start.
~Thomas
List info: http://lists.roundcube.net/dev/
Thomas Bruederli wrote:
Good idea but there are a few things to consider before we can start.
Please don't solve problems that don't exist. Yes, IMAP search is considerably slower then a MySQL Fulltext search, but the question is if Roundcube wants to be Google? If that's the case it would be fine to focus on one single Mail server, optimize it for that and maybe directly store all mails in a SQL database anyways....
I have a lot of mails and usually Courier is fast enough to deliver results on time. Yes, it would be nice to have results within 0.2 seconds, but I think it's not worth the effort of having all these tables, duplicate all data in MySQL (no, I don't wanna do that!) and having a complicate setup.
Instead focus on the important stuff and just do it the smart way. What bugs me in Squirrelmail Search is that the search *results* are not cached - so anytime I view a mail and go back to view the next one it's again performing a search... this is the real problem there, not the speed of the first initial search.
And honestly, the IMAP Server holds the data and he is also able to search it as defined in the standard, so the Webmail System should not go ahead and try to duplicate this functionality. The same applies for filters etc....
My 2 cents,
Michael Baierl mbaierl.com http://mbaierl.com/
"Die große Mehrzahl unserer Importe kommt von außerhalb des Landes." George W. Bush _______________________________________________ List info: http://lists.roundcube.net/dev/
Thomas Bruederli wrote:
Actually fulltext search is already available but it is currently done by the IMAP server. By preceding your search term with body: it will send an according request to the mail server. Depending on the IMAP software this is more or less fast.
Thank you for the "body:" trick, is there a similar "hidden" way to get roundcude to only display un-read emails ?
Thomas _______________________________________________ List info: http://lists.roundcube.net/dev/
On Thu, 18 Oct 2007 09:08:34 +0200, Thomas Bruederli roundcube@gmail.com wrote:
Actually fulltext search is already available but it is currently done by the IMAP server. By preceding your search term with body: it will send an according request to the mail server. Depending on the IMAP software this is more or less fast.
As a workaround, would it be reasonable to pull "foo" and "body:foo" queries in parallel and return the unique messages?
As a workaround, would it be reasonable to pull "foo" and "body:foo" queries in parallel and return the unique messages?
Enabling and using server side search should be a config flag. Don't just hit both and thrash everyone and everything. That'll burn CPU and worse I/O capacity for no good reason.
BTW, if you want fast imap server side search.. look at cyrus with daily runs of the cyrus squatter (full text indexing) utility. Cyrus is just a bit much to set up and maintain. The server side search with squatter databases built is wicked fast even on huge mailboxes. When I last ran Cyrus I was regularly opening/searching/closing mailboxes with 30,000 messages in them.
List info: http://lists.roundcube.net/dev/
On Thu, 18 Oct 2007 08:16:56 -0700 (PDT), Jason Fesler jfesler@gigo.com wrote:
As a workaround, would it be reasonable to pull "foo" and "body:foo" queries in parallel and return the unique messages?
Enabling and using server side search should be a config flag. Don't just hit both and thrash everyone and everything. That'll burn CPU and worse I/O capacity for no good reason.
BTW, if you want fast imap server side search.. look at cyrus with daily runs of the cyrus squatter (full text indexing) utility. Cyrus is just a bit much to set up and maintain. The server side search with squatter databases built is wicked fast even on huge mailboxes. When I last ran Cyrus I was regularly opening/searching/closing mailboxes with 30,000 messages in them.
I use Courier and am very happy with the speed of it. I'm just recommending the option for ease-of-use. I've been following RC since damn near its inception, and this is the first I've heard of "body:" snatch^H^H^H^H^H^H searching. :)