Brennan Stehling wrote:
I have been using SpamAssassin, but I have had problems where incoming spam causes the server to become unresponsive for long periods of time. This is obviously unacceptable. I am pretty sure the biggest part of the problem is that fact that is running with Perl. I have had problems with Perl before when I wrote CGI applications where it can lock up a server if you are handling a lot of data.
I specifically have it set to only handle messages under a certain size, but I still have problems.
There must be something else going on there. Perl isn't necessarily slow, and spamassassin is quite fast on its own, especially if you're using spamc/spamd. Perl has a slower start-up time because it's interpreted, but the actual execution isn't all that slow.
No one process should bring the entire server to its knees. Even if it's getting bombarded with spam, it should still do something. That is, unless someone is targeting you with a DoS attack.
Our main mail hub processes ~60,000msgs/day through spamassassin (Well, spamc/spamd) without any real delay. If there is a delay, it's usually due to a DNS/RBL timeout, or an image being processed through FuzzyOCR. This is on a dual CPU 2GHz Xeon. Previously, we had a 1.3Ghz Athlon in that position and it also handled the load quite well.
You can help it out more by using RBLs, we rejected 230,000 messages yesterday by RBL alone before they ever hit spamassassin.
I host other things on the same server, like my DNS and Web servers so I cannot allow the spam filter to kill the performance of all applications. Is there something better that I could do? I am seriously considering having all of my mail aliased to my Gmail account and not allow incoming mail to be stored on this server. If I do that I will not be using RC, which I would like to continue using and helping with the development effort.
If you really do want Gmail to handle your mail, you could always get it back to yourself by using fetchmail to bring it back into your own server.
On my personal server that also has web, mail, etc. I use amavisd-new to process mail and scan for viruses. It's nowhere near the load of the main hub, (and it's a dual cpu PIII-800) and it's also in perl, but it runs as a daemon so there is no start-up time on a per-message basis.
If you don't already, you may want to consider graphing network usage, processor load, etc with snmp (and perhaps a graph package like Cacti) it can help a lot when tracking down an issue like this.
Jim