We are currently running a pilot Roundcube service and are expecting to go live a a couple of weeks time. Something that my users have picked up on is that idle sessions appear to time out in unpredictable ways despite the keep_alive requests fired off by the Web browser.
A bit of digging over the weekend revealed a potential race condition in program/lib/Roundcube/rcube_session.php : mc_write(). (Roundcube 0.9.2).
I'm happy to write the following up as a ticket on trac.roundcube.net. I wanted to make sure my analysis and proposed solution was correct first.
Given the 10 minute (600 second) default session_lifetime, keep_alive requests are fired off every 300 seconds. However these may not arrive immediately, particularly if the Web browser, network or server is busy.
Sessions stored in memcached expire after precisely 600 seconds unless they are refreshed.
mc_write() only refreshes session entries if the following is true:
($newvars !== $oldvars || $ts - $this->changed > $this->lifetime / 2)
Most of the time with a idle login session: ($this->lifetime / 2) will be 300 ($ts - $this->changed) will be a tiny fraction more than 300.
However if I add some logging it becomes clear that ($ts - $this->changed) is occasionally less than 300. I imagine this happens when the previous keep_alive was stuck in transit for a fraction of a second. It is an asynchronous request and the timer for the next keep_alive doesn't wait.
The mc_write() refresh on the session data is skipped when this happens.
The following keep_alive is scheduled to arrive precisely the same second that the session is due to expire in memcached. If you are unlucky the keep_alive arrives just after the session has been expired, which leads a "Session is invalid or has expired" error page.
This is using memcached for session storage: the problem appears to go away if I switch to using a database backend. I believe that this is because session entries stored in memcached expire after precisely 600 seconds. With a database backend sessions are expired by PHP garbage collection later on. This will add a few seconds delay, at least on a inactive system.
The attached patch appears to fix the problem, using two separate approaches:
A lower threshold for deciding when to refresh the session entry. (I am tempted to remove the guard condition altogether and always update. There isn't any I/O cost associated with a memcached back end).
A slightly longer expiry time for entries stored in memcached.
However I may be missing some subtlety about the refresh process.