stillhq.com : Mikal, a geek from Canberra living in Silicon Valley http://www.stillhq.com The life, times, travel and software of Michael Still en Copyright (c) Michael Still 2000 - 2006 blosxom simplerss20 v20050208hh 180 http://blogs.law.harvard.edu/tech/rss An occasional rant about spam /diary/spam Wed, 28 Mar 2007 00:38:00 GMT <pre> mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" | wc -l 664 mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" | wc -l 18361 mikal@daedalus:~/blog-comments$ find . -type f -name "*.blocked" | wc -l 32111 mikal@daedalus:~/blog-comments$ find . -type f -name "*.badword" | wc -l 5007 mikal@daedalus:~/blog-comments$ du -sh 506M . </pre> <br/><br/> 664 real comments on this site, 18361 I manually said no to, 32111 were blocked based on originating IP, and 5007 contained a bad word. <a href="http://blog.andrew.net.au">Andrew currently donates 506 mb of disk to hosting just comments</A>. That seems excessive to me. I'll take the time to cleanup the disk usage in the next couple of days. <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000007.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000007.html http://www.stillhq.com/diary/spam/000007.html Email subscription to comments /diary/spam Sun, 26 Nov 2006 04:28:00 GMT Hey all. Yesterday I finally got around to implementing email subscriptions to comments on posts in my custom comment module code for Blosxom. I run a custom comment module because of the static generation mode I use for the site, which helps reduce load on <a href="http://blog.andrew.net.au">Andrew</a>'s server. <br/><br/> Email subscription to comments on a post that you have commented on is the default, but it is easy for the user to turn it off. If you post and opt for email, you'll also get an email when your own post survives moderation, which might be useful for some people. <br/><br/> It will be interesting to see if willingness to be emailed a comment is an effective spam signal or not -- so far with a sample of six spam comments, it seems to be evenly split between the two options, which is interesting because it means some spam bots are smart enough to turn the check box off. Or are they using a POST without using my form at all? <br/><br/> (That makes me wonder if moving the URL for the submission CGI might reduce spam...) <br/><br/> If there is any interest in a public release of my uber crap perl code let me know, and I might try and find the time to clean it up. <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000006.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000006.html http://www.stillhq.com/diary/spam/000006.html Black listing words in comments /diary/spam Mon, 06 Nov 2006 04:30:00 GMT I had yet another flood of comment spam over the last couple of days, so I spent some time last night writing some code to add word black listing to my IP based spam filtering. I also block over 100 IPs from posting to this blog now. It's helped a lot -- I am only black listing one word so far, and it's already blocked 200 spam posts... <br/><br/> Some details from the dawn of time: 559 posts which survived moderation, 12,665 posts which were manually blocked, 20,119 posts which were <a href="http://www.stillhq.com/diary/spam/000004.html">automatically blocked based on the submitter's IP</a>, and 256 blocked because of use of a banned word (since last night!). My blog comments now take 258 megabytes on disk. <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000005.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000005.html http://www.stillhq.com/diary/spam/000005.html Comment spam again /diary/spam Tue, 04 Jul 2006 05:37:00 GMT As <a href="http://blog.subverted.net/?p=615">In Search of L33t</a> says, comment spam can be a large scale annoyance. In L33t's words: <br/><br/> <blockquote> I am promising myself I am going to start blogging more. The main problem is that I am so tired of blog spam. Even with the comments turned off I am still getting blog spam. It depresses me a little to see so many blog comments that have absolutely nothing to do with my topics. </blockquote> <br/><br/> I have similar blog spam levels: <br/><br/> <ul><pre> mikal@daedalus:~/blog-comments$ du -sh 79M . mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" | wc -l 6064 mikal@daedalus:~/blog-comments$ find . -type f -name "*.blocked" | wc -l 4778 mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" | wc -l 483 </pre></ul> <br/><br/> Yes, that really is 79 meg of blog comments (admittedly including the metadata for recent comments). The most interesting bit is that blocked line. That's the number of posts which have been automatically blocked <a href="http://www.stillhq.com/diary/spam/000001.html">since I started automatically blocking some posters</a>. It's been really effective, I get around one or two comment spams in my email for moderation a day now. The super secret algorithm? I block these IP addresses: <br/><br/> <ul><pre> 84.19.184.26 85.255.117.250 203.142.1.182 202.71.106.121 85.249.136.194 202.76.235.6 202.75.62.79 202.75.49.130 202.75.49.134 202.75.49.133 202.75.49.131 193.87.17.120 </pre></ul> <br/><br/> I recommend others give it a try, as it's eliminated basically all of my comment spam. That's right, it appears to me that almost all comment spam comes from these few IPs. <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000004.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000004.html http://www.stillhq.com/diary/spam/000004.html We get E-3 visa spam now? /diary/spam Tue, 20 Jun 2006 08:44:00 GMT Is it just me, or are other people getting spam offering them jobs with E-3 sponsor companies in the US. What a strange trend. <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000003.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000003.html http://www.stillhq.com/diary/spam/000003.html Hmmm, me no likee spam /diary/spam Tue, 13 Jun 2006 12:34:00 GMT I like that I have spam blocking. Check this out: <br/><br/> <pre> mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" -mtime -1 | wc -l 1468 </pre> <br/><br/> That's all comment spams from 5:30pm (ish) today. All from one IP address: 193.87.17.120 <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000002.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000002.html http://www.stillhq.com/diary/spam/000002.html Blog comment spam /diary/spam Sun, 04 Jun 2006 12:36:00 GMT <a href="http://www.stillhq.com/diary/001037.html">I occasionally</a> <a href="http://www.stillhq.com/diary/001039.html">comment on</a> <a href="http://www.stillhq.com/diary/001041.html">the amount of</a> <a href="http://www.stillhq.com/diary/001043.html">comment spam I get here</a>. But I felt further analysis might be a good idea, so I am not logging as much information as possible about the commenter when they submit a comment. This dump below I find fairly interesting (it's for approximately the last 24 hours). <br/><br/> <ul><pre> mikal@daedalus:~/blog-comments$ find . -type f -name *.info -exec cat {} \; | \ grep REMOTE_ADDR | sort | uniq -c | sort -n 2 REMOTE_ADDR = 85.255.117.250 3 REMOTE_ADDR = 203.142.1.182 5 REMOTE_ADDR = 202.71.106.121 8 REMOTE_ADDR = 202.75.62.79 9 REMOTE_ADDR = 202.75.49.130 11 REMOTE_ADDR = 202.76.235.6 12 REMOTE_ADDR = 202.75.49.131 13 REMOTE_ADDR = 202.75.49.134 16 REMOTE_ADDR = 202.75.49.133 mikal@daedalus:~/blog-comments$ </pre></ul> <br/><br/> I wonder if blocking specific IPs would help the spam level, or if stopping comments on some posts would help? There certainly seem to be some "hot spot" posts: <br/><br/> <ul><pre> 264: /home/mikal/blog-comments/travel/usa/california/santaclara/000003 179: /home/mikal/blog-comments/diary/lca2005/000029 170: /home/mikal/blog-comments/linux/000038 158: /home/mikal/blog-comments/diary/000796 134: /home/mikal/blog-comments/diary/000795 92: /home/mikal/blog-comments/pdfdb/000001 87: /home/mikal/blog-comments/link/000065 81: /home/mikal/blog-comments/diary/toys/000001 79: /home/mikal/blog-comments/travel/usa/000006 70: /home/mikal/blog-comments/diary/toys/mp101/pymediaserver/000001 </pre></ul> <br/><br/> I think I will ponder more. <br/><br/><i>Tags for this post: blog(<a href="http://www.stillhq.com/diary"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) spam(<a href="http://www.stillhq.com/spam"><img src="http://www.stillhq.com/favicon.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/diary/spam/000001.commentform.html">Comment</a> http://www.stillhq.com/diary/spam/000001.html http://www.stillhq.com/diary/spam/000001.html