Hi,
I've been reading the IMAP specification to find ways to improve current rcube cache algorithms. The following is from the IMAPv4 specs:
2.3.1.2. Message Sequence Number Message Attribute
A relative position from 1 to the number of messages in the mailbox. This position MUST be ordered by ascending unique identifier. As each new message is added, it is assigned a message sequence number that is 1 higher than the number of messages in the mailbox before that new message was added.
Currently rcube stores the whole header list for each mailbox in the DB. If the current message count is bigger than the message count of the cache then it fetches the whole header list again and stores it in the cache on DB.
I think this could be improved by redesigning the caching algorithm, since each new message in a mailbox is given a sequence number higher than the other ones, we can just fetch the new messages. The following code should clarify the idea:
$mbox_count = messagecount($mbox); $cache_count = count($cache); if ($mbox_count > $cache_count) { $newMessages = iil_C_FetchHeaders( $conn, $mbox, "$cache_count:$mbox_count" ); $cache = array_merge( $cache, $newMessages ); }
Unfortunatly the specs also say:
Message sequence numbers can be reassigned during the session. For example, when a message is permanently removed (expunged) from the mailbox, the message sequence number for all subsequent messages is decremented. The number of messages in the mailbox is also decremented. Similarly, a new message can be assigned a message sequence number that was once held by some other message prior to an expunge.
We can solve this however since we have the UID of each header in the cache. Instead of fetching the (mbox_count - cache_count) headers, we can fetch (mbox_count-cache_count)+1, with this we should get as the first header the last one in the cache. We then compare its UID, so if we have in the cache the last header with id=269 and uid=468 but in the new list it has the id=269 but the uid=493 it means that there have been deleted messages in that folder and that we must revalidate the whole list. This could also be optimized but it'll require a quite complex algorithm which I guess is not worth it for most cases. The following pseudo-code should show the idea good enough:
$mbox_count = messagecount($mbox); $cache_count = count($cache); if ($mbox_count > $cache_count) { $newMessages = iil_C_FetchHeaders( $conn, $mbox, ($cache_count-1).":$mbox_count" ); if ($newMessages[0]->uid != end($cache)->uid) { //revalidate the cache $cache = iil_C_FetchHeaders( $conn, $mbox, "0:".($cache_count-1)); } $cache = array_merge( $cache, $newMessages ); }
The code I think is simple enough although I would like to discuss this before trying its implementation, I'm worried about the specs support in different servers.
ciao, ivan
Hi again,
I had some spare time this evening so I decided to check the idea I posted previously. The result is the following code, to try it just replace the _list_headers() function in /program/include/rcube_imap.inc with this one.
It implements a somewhat better caching algorithm by taking advantage of the spec's incremental sequences for messages. If a mailbox is cached and there is a new message received, when loading the mailbox again we will just get two headers: the last one already in the cache and the new one. If the last one's UID is the same as the cached one then nothing else is done, otherwise (perhaps a message was deleted using another client) an intelligent cache revalidation is performed. The cache revalidation fetches from older to newer all the messages in blocks of N headers. It keeps fetching blocks until we find a header with same ID and UID in the cache and in the block. So it gets much faster than the current code, specially for huge mailboxes.
The original code has a 'bug' in which it uses the cache if the number of messages is the same in the mailbox and in the cache. In the case that a message is deleted and a new one stored in the mailbox, this comparison will show up the cache which is not up to date.
ciao, ivan
// private method for listing message header function _list_headers($mailbox='', $page=NULL, $sort_field='date', $sort_order='DESC') { $a_out = array(); $cached_count = 0;
if (!strlen($mailbox))
return $a_out;
$mbox_count = $this->_messagecount($mailbox /*, 'ALL', TRUE*/);
$revalidate = false;
if ($mbox_count) {
// get cached headers
$a_out = $this->get_cache($mailbox.'.msg');
$a_out = is_array($a_out)?$a_out:array(); //make sure we get an
array $cached_count = count($a_out);
$a_new = array();
$revalidate = true; //revalidate by default
//if the cache count is greater then there have been changes for
sure if ($cached_count <= $mbox_count) { $from = $cached_count?$cached_count:1;
//get new headers (at least one is returned)
$a_temp = iil_C_FetchHeaders($this->conn, $mailbox, $from .
':' . $mbox_count);
$duplicated = $cached_count?true:false;
foreach ($a_temp as $hdr) {
//skip the first one if duplicated
if ($duplicated) {
//check for changes using the UID
if ($hdr->uid === end($a_out)->uid)
$revalidate = false;
$duplicated = false;
continue;
}
//skip deleted ones
if (! $hdr->deleted)
$a_new[ $hdr->uid ] = $hdr;
}
}
//revalidate cache if needed
$to = $mbox_count - count($a_new);
if ($revalidate && $to !== 0) {
//we'll needt o reindex the array so we have to make a copy
$a_dirty = $a_out;
$a_out = array();
$a_buffers = array();
//fetch chunks of 20 headers
$step = 20;
$found = false;
//fetch headers in blocks starting from new to old
do {
$from = $to-$step;
if ($from < 1) $from = 1;
//store the block in a temporal buffer
$a_buffers[$from] = iil_C_FetchHeaders( $this->conn,
$mailbox, $from . ':' . $to);
//compare the fetched headers with the ones in the cache
$idx = 0;
foreach ($a_buffers[$from] as $k=>$hdr) {
//if it's different the comparison ends
if (!isset($a_dirty[$hdr->uid]) ||
$a_dirty[$hdr->uid]->id !== $hdr->id) break;
//if we arrive here then we know that the older
messages in cache are ok $found = $hdr->id; $idx++; }
//remove from the buffer the headers which are already
cached if ($found) $a_buffers[$from] = array_splice($a_buffers[$from], 0, $idx );
$to = $from-1;
} while ($found===false && $from > 1);
//just keep the headers we are certain that didn't change in
the cache if ($found !== false) { foreach ($a_dirty as $hdr) { if ($hdr->id > $found) break; $a_out[$hdr->uid] = $hdr; } }
//we builded the block buffers from new to older, we process
them in reverse order ksort($a_buffers, SORT_NUMERIC); foreach ($a_buffers as $a_buff) { foreach ($a_buff as $hdr) { if (! $hdr->deleted) $a_out[$hdr->uid] = $hdr; } } }
//array_merge() would reindex the keys, so we use this 'hack'
$a_out += $a_new;
}
//write headers list to cache if needed
if ($revalidate || count($a_out)!=$cached_count) {
$this->update_cache($mailbox.'.msg', $a_out);
}
//sort headers by a specific col
$a_out = iil_SortHeaders( $a_out, $sort_field, $sort_order );
// return complete list of messages
if (strtolower($page)=='all')
return $a_out;
$start_msg = ($this->list_page-1) * $this->page_size;
return array_slice($a_out, $start_msg, $this->page_size);
}