My Spam Record

Starting in March of 2003 I started keeping track of (roughly) how many spam messages I get a day. I use SpamAssassin to filter my emails, and then I clean out the spam folder every morning at around 9 AM. Since I'm spending the time checking for false positives and then deleting them, it's almost no extra work at all to record the number of spams I actually received. Below, I've plotted the data to date.

Spams per Day

Note that there are a couple of caveats on these statistics:

  1. I don't always check at exactly the same time. When I'm away, I often make a rough cut where the 9 AM cutoff belonged.
  2. These "spams" technically include a fair number of virus messages. SpamAssassin and I both don't really pay much attention to the difference when filtering. However, experience suggests that the majority of the messages are commercial and not virus-generated. (This rule is periodically violated during peak virus-times and those stand out rather clearly in the data.)

I've also plotted two smoothed curves. (The smoothing was done with a "running box" averages of two difference widths.)

Additional note (1 Sept. 2004) — You'll notice that starting this past June, the spam rate dropped considerably. This is because CU's ITS started doing extra filtering on incoming mail with spam-like characteristics. As you can see, they were eminently successful.

I moved to CICLOPS in May of 2005. For quite a while afterward, my CU account was still my main source of spam. That's dying down, however. Most of my spam comes in through Gmail now, although it almost all gets filtered into my spam folder correctly.

A lot of the spikes in the late 2000s are due to times when my email got posted publicly to sites like Slashdot, by the way. It was almost like clockwork, in fact. What interests me there is that there's a rapid decline of the spam rate after the spike. That is, the email isn't being retained by everyone, ever for spamming purposes. That, or the anti-spam systems are learning to recognize the spam and cut it off before it gets into my spam folders. Both are hopeful thoughts, in different ways.

Days of the Week

How does this spam break down by, say, day of the week, you ask? Good question!

Spams per Day

As you can see, the spammers take the weekend off, but not by a whole lot. (It's statistically significant, I think, but the effect isn't exactly dramatic.) It's pretty clear that a lot of spam is sent by automated jobs or we have some very dedicated spammers.

Days of the Year

Now that I have 10 years' worth of data, it's fair to ask how the spam seems to break down by day of the year. Here you go:

Spams per Day of Year

The uncertainties are a lot larger here than in the day of week case, of course, and I'm not sure how much of the trend really is real. But that said, there are some interesting trends in these data, including the dips around the holidays. I wish I had a proper explaination for the higher rate at the start of the year, though.

Valid HTML 4.01! Valid CSS!

`