[RCD] junk processing flow needs two Junk folders, not just one

Rimas Kudelis rq at akl.lt
Sun Feb 26 14:32:24 CET 2012


2012.02.26 00:26, Brian J. Murrell wrote:
> But that does not recognize that there are two types of junk: the mail
> that the mail system determined is spam (let's call this tagged spam)
> and wants to quarantine for the user to sift through for false
> positives[1].  The other type of "Junk" is the spam that the mail system
> did not determine for the user (let's call this untagged spam) and that
> the user wants to tell the mail system is spam so that it can learn.
> So the user needs two folders for these different types of messages, for
> a couple of reasons.  First reason is that it's a waste of the users
> time to put the untagged spam into the same folder that is meant to be
> the folder that the user to sifts through to find falsely tagged spam.
> Secondly, the user does need a folder to put untagged spam so that the
> mail system has somewhere it can go get messages that the user wants it
> to use to learn about what spam is.  And this folder shouldn't be same
> folder that the tagged spam has gotten put into since we don't want/need
> the mail system to learn from messages it's already tagged as spam.

It's probably a matter of personal taste, but I feel OK with the mail 
system simply not accepting messages that are clearly spam and 
prepending "***SPAM***" to the subjects of messages that look very much 
like spam, but could possibly be legitimate. None of that requires a 
separate folder. On the other hand, if you don't delete positives, there 
are two options:
1) your filter is suspicious and marks most spam as spam. In this case, 
there probably aren't that many messages for user to mark as spam 
manually, so even if you put them all in one place, the increase of work 
to find that important false positive won't be noticeable.
2) the filter is relaxed and the user marks more spam manually than the 
system does automatically. In this case, false positives are soooo 
unlikely that it's not even worth considering

And even if you don't fall into one of those categories, there's still 
another argument not to bother about two folders: the search field. The 
user can always use it to look for that particular ham message in the 
spam folder. All she has to know is (a part of) the sender address or 
the subject.

As for the second reason (learning), I think you could easily teach the 
system not to learn from messages that already have "X-Spam-Flag: Yes" 
or a similar header that it has set itself. In any case, with your 
setup, additional measures of preventing the system from learning from 
the same message again are needed: it has to either delete the messages 
it has learned from, or move them to another (third?) folder, or keep a 
track of them, or add a header to them and look for it next time. Either 
way, it's troublesome.

Add to it the fact that what one user considers spam, might be ham for 
the other (and vice versa) and such learning becomes even more complicated.

There is an alternative to having untagged spam learned from. Mark as 
Junk plugin allows you to specify commands to run with a message to 
teach spamassassin about (non-)spam. Why not use these?

> On my system here those two folders are "Junk" and "spam" (respectively).

No offence, but to me, this distinction in names looks rather lame. If I 
saw those two folders next to each other, my first idea would be that 
there is/was a glitch/misconfiguration somewhere which resulted in this 
situation. On the other hand, "Junk (auto)" and "Junk (manual)" would be 
obvious enough.

> Mail that has the X-Spam-Flag header set to "YES" is
> put into "Junk" (and does not need to be used to learn about spam from)
> and messages that are in the user's INBOX that are actually spam should
> be moved to "spam".  A process on the mail system goes through the
> "spam" folders of all of the users and pushes those messages through the
> spam-learning process.
> Am I going about this all wrong?  Does anyone else see the need for two
> different folders (three if you bring the "ham" into the discussion) for
> spam processing?

Again, I think it's a matter of personal taste and perspective, but to 
me, your proposed set up looks more confusing than useful.

Best regards,
List info: http://lists.roundcube.net/dev/

More information about the Dev mailing list