Tags

, ,

There are many things you can do to help manage WordPress comment spam. Third-party services such as Akismet have proven to be quite effective. However, if you want to do it yourself and feel comfortable around a little PHP programming, there are a number of strategies that can also be quite effective. We’ll start with blocking comments from IP subnets known to be sources of comment spam and place this in a code framework which we’ll add to in later posts.

There are a number of characteristics of spam that we can use to identify it and deal with it programmatically. One way is by looking at the source (IP address) of the comment–there are a number of IP addresses in subnet ranges that keep cropping up in spammed comments. We’ll develop a way to block comments from IP addresses within subnet ranges now and, in a later post, we’ll program a way to dynamically add offending IP addresses to a blacklist database and age them out after a period of good behavior.

Investigation

Let’s start with an example. A while back one of our WordPress sites started getting spam originating from 69.147.240.89. Click on the WordPress moderation email link for “Whois” and it leads you to the arin.net site entry for that IP address (http://whois.arin.net/rest/ip/69.147.240.89). It is listed under Nobis Technology Group, LLC. A little poking around reveals they own Ubiquity Servers, owners of IP addresses that frequently show up in our spam. Lets block all comments coming from any IP address owned by Ubiquity Servers. On the ARIN page, click on the “Organization” link for Nobis Technology Group, “NTGL”, and you’re brought to another page about the Nobis Technology Group organization. Near the bottom is a link called “Related networks”. Click on that and we’re brought to a page that lists all the IP addresses owned by Nobis Technology (Ubiquity Servers).

On the ARIN page listing all the subnets owned by Nobis, they show the address ranges for each subnet. We can use either the range specification (e.g. “70.32.32.0 – 70.32.47.255”) or the CIDR specification of the same subnet ( “70.32.32.0/20”). We want to collect all these subnet ranges in our programming and work out a way that we can test an IP address to see if it’s in any of those subnets (in which case we’ll block the comment).

Developing The Code

Although it may seem simple to test an IP address to see if it’s in a subnet, it turns out it’s quite difficult to do correctly. Fortunately, somebody has already worked through all this and provided a fully-functional flexible PHP subroutine that does the trick nicely. Read Paul Gregg’s explanation at http://pgregg.com/blog/2009/04/php-algorithms-determining-if-an-ip-is-within-a-specific-range/ and then fetch his PHP subroutine from http://pgregg.com/projects/php/ip_in_range/ip_in_range.phps.

The tough part already done, we now just need to code up our list of bad subnets and a way to test an IP address against them. Here’s one way to do it:

function ipInBadSubnet($ip) {
     $ipnetworks = array (
          "23" => array ("23.19.0.0/16"),
          "46" => array ("46.109.0.0/16"),
          "64" => array ("64.120.0.0/17"),
          "67" => array ("67.201.0.0/21","67.201.40.0/24","67.201.48.0/23"),
          "69" => array ("69.147.224.0/19","69.174.60.0/22"),
          "70" => array ("70.32.32.0/20"),
          "72" => array ("72.37.145.0/24","72.37.204.0/24","72.37.218.0/23","72.37.221.0/24","72.37.222.0          /23","72.37.224.0/21","72.37.237.0/24","72.37.242.0/23","72.37.246.0/23"),
          "108" => array ("108.62.0.0/16"),
          "173" => array ("173.208.0.0/17","173.234.0.0/16"),
          "174" => array ("174.34.128.0/18"),
          "216" => array ("216.6.224.0/20")
          );
     $prefix = substr($ip, 0, strpos($ip, '.'));
     $isinnetwork = false;

     if (isset($ipnetworks[$prefix])) {
          foreach ($ipnetworks[$prefix] as $network) {
               if (ip_in_range($ip, $network) == true) {
                    $isinnetwork = true;
                    break;
               }
         }
     }
return $isinnetwork;
}

As you can see, we list all the subnets, specified in CIDR format, as an array of arrays indexed by the first octet of all subnets beginning with that octet. This slightly confusing method provides more than 25X performance improvement over simply running through every subnet in a linear fashion. In the example above, we have flagged more than 330,000 IP addresses and yet we can check an IP address for inclusion in this set very quickly and efficiently.

So, all this function does is use the first octet of the IP address in question to look up the list of subnets that begin with that same octet, and check each of the subnets to see if the IP address is in the range. We do that test by simply calling Paul Gregg’s function, “ip_in_range”. Our function returns the status, true if the IP address is in one of our bad subnets, false otherwise.

Hooking In To WordPress

Now all that’s left to do is to hook this in to WordPress so that incoming comments’ IP addresses are checked and then we can reject the bad ones. The place we’ll do this is in WordPress’s “pre_comment_on_post” filter:

function my_pre_comment_check() {
     $ip = $_SERVER["REMOTE_ADDR"];

     // Check if IP address is blocked for comments
     if (ipInBadSubnet($ip)) {
          // Give bad guys a "Not Found" response
          header('Status: 404 Not Found');
          header($_SERVER["SERVER_PROTOCOL"]. ' 404 Not Found');

          //wp_redirect( home_url() );
          //wp_die("Comment rejected.");
     }
}

add_action('pre_comment_on_post', 'my_pre_comment_check');

This code, used in a plugin or in your functions.php, will cause comment processing to be aborted for any IP address within any of the subnets we coded into our function. You have a choice of what to do in response to the attempted comment: give them a “404 Not Found” as the code does here, redirect them to another URL, or die with some explanation. The latter two options (lines 10 and 11) are commented out in the code above. Pick any of these you like or even a different response.

You can add other subnets to your hard-coded black list any time you want; just be careful of the indirection if you use the technique here. In the example above, we blocked not only all the Ubiquity Servers IP addresses, but all those from Latvia as well. (It’s amazing how many comments from Latvia we were getting on a Vestal, NY home-for-sale site.)

Going Forward

This technique is not a total solution by any means, just a blunt-instrument way of dealing with the most persistent sources of spam. I would recommend limiting its use to the most egregious subnets. Also, this solution is by no means comprehensive; you would go crazy trying to code in all the subnets of all the nasty IP addresses you’ll come across. In later posts we’ll show several other different things that can be done to contribute to a more comprehensive solution. We’ll be adding several of these other methods into the same very useful “pre_comment_on_post” WordPress filter. None of the fun things we’ll be doing put any additional onus on users like, for example, CAPTCHA does. We shouldn’t be making innocent users pay the price for the evil spamming parasites’ bad behavior.