What does the messages table exactly store? I'm looking at it's contents and it looks like it puts there good part of the mail headers, plus some other information.
-- 18:02:25 up 4 days, 9:57, 5 users, load average: 1.59, 1.57, 1.62
Lic. Martín Marqués | SELECT 'mmarques' || Centro de Telemática | '@' || 'unl.edu.ar'; Universidad Nacional | DBA, Programador, del Litoral | Administrador
On Tue, 20 Dec 2005, Lic. Martin Marques wrote:
What does the messages table exactly store? I'm looking at it's contents and it looks like it puts there good part of the mail headers, plus some other information.
Adding somethings.
I have a bunch of error messages related to DB errors when trying to make an INSERT to the messages table (that's why I was asking about it). Basically, the "to" field is to small (varchar(128)) to handly the data that the program has to put there. Thinking better, I'm not sure if where is a "n" such that we will never have problems with that insert.
I happens very offen with mails that come with lots of recipients.
Should we doble the space to varchar(256), triple it, or put an unlimited field type like TEXT?
-- 17:40:01 up 2:22, 3 users, load average: 0.00, 0.00, 0.00
Lic. Martín Marqués | SELECT 'mmarques' || Centro de Telemática | '@' || 'unl.edu.ar'; Universidad Nacional | DBA, Programador, del Litoral | Administrador
The from and to fields in the message table are primarily used for sorting when taking all message headers from the cache. The "real" content is delivered by the serialized content. Therefore I decided to not use more than 128 chars for these headers because just for sorting it should be enough. In MySQL, the content is automatically chopped if it's longer than the specified field length and I assumes that every database acts this way. If it's causing errors, the PHP script should chop those strings before inserting.
Other thoughts?
Thomas
Lic. Martin Marques wrote:
On Tue, 20 Dec 2005, Lic. Martin Marques wrote:
What does the messages table exactly store? I'm looking at it's contents and it looks like it puts there good part of the mail headers, plus some other information.
Adding somethings.
I have a bunch of error messages related to DB errors when trying to make an INSERT to the messages table (that's why I was asking about it). Basically, the "to" field is to small (varchar(128)) to handly the data that the program has to put there. Thinking better, I'm not sure if where is a "n" such that we will never have problems with that insert.
I happens very offen with mails that come with lots of recipients.
Should we doble the space to varchar(256), triple it, or put an unlimited field type like TEXT?
-- 17:40:01 up 2:22, 3 users, load average: 0.00, 0.00, 0.00
Lic. Martín Marqués | SELECT 'mmarques' || Centro de Telemática | '@' || 'unl.edu.ar'; Universidad Nacional | DBA, Programador, del Litoral | Administrador
Lic. Martin Marques wrote:
What does the messages table exactly store? I'm looking at it's contents and it looks like it puts there good part of the mail headers, plus some other information.
This table is used to store message headers locally for caching purpose. It will only be used if caching is enabled which is not necessary if the IMAP server runs on the same machine.
I'm not sure yet, how to keep this table clean to avoid an exhausting amount of cached messages here. One possibility is to store the message headers just for the current session and remove them when logging out. But for performance issues it would be better to keep the cached messages over multiple sessions...
Other thought was to add a last_view_date which could allow a garbage collection routine to remove headers that haven't been listed over a certain amount of time. The disadvantage of this behavior would be that on every view (SELECT from DB) the last_view_date has to be updated what will cause the database to rewrite it's index and make it slower again.
I'm curious, what you guys think of that problem? Anybody out there experienced with caching mechanisms?
Regards, Thomas
I'm definatly not an expert on software caching (I know more about the
basics of hardware caches, but that doesn't do a whole lot of good
here), but it seems like one option would be to cache only the 1st page
of mesages, so the maximum number of cached messages is the number they
show per page (ex: 40), multiplied by the number of folders they have.
Whenever a new message comes in, bump the message that is last on the
list. I know I very rarely go beyond the first page of emails, although
I can't speak for the others.
Just seems like one way to go about it, and it leads to predictable cache sizes, which is useful for planning server loads. Rob
Thomas Bruederli wrote:
Lic. Martin Marques wrote:
What does the messages table exactly store? I'm looking at it's contents and it looks like it puts there good part of the mail headers, plus some other information.
This table is used to store message headers locally for caching purpose. It will only be used if caching is enabled which is not necessary if the IMAP server runs on the same machine.
I'm not sure yet, how to keep this table clean to avoid an exhausting amount of cached messages here. One possibility is to store the message headers just for the current session and remove them when logging out. But for performance issues it would be better to keep the cached messages over multiple sessions...
Other thought was to add a last_view_date which could allow a garbage collection routine to remove headers that haven't been listed over a certain amount of time. The disadvantage of this behavior would be that on every view (SELECT from DB) the last_view_date has to be updated what will cause the database to rewrite it's index and make it slower again.
I'm curious, what you guys think of that problem? Anybody out there experienced with caching mechanisms?
Regards, Thomas
!DSPAM:43a907c652052175210217!
On Wed, 21 Dec 2005, Thomas Bruederli wrote:
The from and to fields in the message table are primarily used for sorting when taking all message headers from the cache. The "real" content is delivered by the serialized content. Therefore I decided to not use more than 128 chars for these headers because just for sorting it should be enough. In MySQL, the content is automatically chopped if it's longer than the specified field length and I assumes that every database acts this way. If it's causing errors, the PHP script should chop those strings before inserting.
Normaly a database engine shouldn't change the data that is been inserted, but it's well known that MySQL does that.
Other thoughts?
substr($string,0,128) in all the fields would be OK.
-- 07:10:01 up 15:52, 0 users, load average: 0.00, 0.01, 0.06
Lic. Martín Marqués | SELECT 'mmarques' || Centro de Telemática | '@' || 'unl.edu.ar'; Universidad Nacional | DBA, Programador, del Litoral | Administrador
On Wed, 21 Dec 2005 08:44:23 +0100, Thomas Bruederli roundcube@gmail.com wrote:
Lic. Martin Marques wrote:
What does the messages table exactly store?
I'm looking at it's contents and it looks like it puts there good part
of the mail headers, plus some other information.
This table is used to store message headers locally for caching purpose.
It will only be used if caching is enabled which is not necessary if the
IMAP server runs on the same machine.
I'm not sure yet, how to keep this table clean to avoid an exhausting
amount of cached messages here. One possibility is to store the message
headers just for the current session and remove them when logging out.
But for performance issues it would be better to keep the cached
messages over multiple sessions...
Other thought was to add a last_view_date which could allow a garbage
collection routine to remove headers that haven't been listed over a
certain amount of time. The disadvantage of this behavior would be that
on every view (SELECT from DB) the last_view_date has to be updated what
will cause the database to rewrite it's index and make it slower again.
Definately not a caching expert, however i have worked with several DBs that did some basic caching. One thought would be to add a date_inserted (can be added via a trigger on insert in postgres, not sure that mysql supports those) and clean out after a pre-determined (user configurable) amount of time.
Updating on every SELECT pretty much defeats the point of having it in a DB, no? since you'd incur a write op for every read op?
I also liked the idea of only caching the first (n) pages of the mail, however, this doesn't work for sorting. In fact, any type of "expiry" of the cache would make re-sorting MUCH slower since we'd have to re-download the message list.
Perhaps there should be a sort key first, ie default view of inbox, then cache a predefined number of messages and keep that across sessions.
Now if someone wants to resort everything, there's a penalty, unless one guesses which messages to pre-cache for this.. a header index would help since the content of the mail would only need to be retrieved if it's ever clicked on, but quck sorting on date/arrival/from/to etc woudln't use so much mem.
all of these shoudl be config file parameters so users can tweak it according to their usage requirements. a nice default set is fine.
other than that, you can always look at caching packages and see what they do and how they deal with thresholds.. shrug.
-- Robi
Robi wrote:
Perhaps there should be a sort key first, ie default view of inbox, then cache a predefined number of messages and keep that across sessions.
RoundCube already has config parameters for that. These could be used to keep the first couple of pages.
Now if someone wants to resort everything, there's a penalty, unless one guesses which messages to pre-cache for this.. a header index would help since the content of the mail would only need to be retrieved if it's ever clicked on, but quck sorting on date/arrival/from/to etc woudln't use so much mem.
This is what the (new) caching actually does. We have cols for the most common headers for sorting and all headers are stored as a serialized string. The message body will be added to the cache when the message is shown.
all of these shoudl be config file parameters so users can tweak it according to their usage requirements. a nice default set is fine.
I don't think that all users will understand what they can configure here. This should be something that is configured by the system administrator.
other than that, you can always look at caching packages and see what they do and how they deal with thresholds.. shrug.
-- Robi
Regards, Thomas