About two weeks ago, it came to my attention that a different site of mine was receiving a significant amount of non-human traffic, translation: Bot Traffic. This was according to data from my Google Analytics property.
While I had noticed that my bounce rate had spiked, I had quickly brushed off this anomaly as merely a symptom of the new permalink structure the site had after migrating it from Blogger to WordPress.
However, after digging through the Analytics data on my own, I couldn’t help but agree with the assessment. The data revealed that shortly after the migration, traffic from the service provider microsoft corporation had skyrocketed to account to approximately 10% of my daily traffic.
This was a significant amount considering the site was now receiving approximately less than 600 daily views (from highs of 1.5k) thanks to the recent bouts of Google algorithm updates. And to think that my US traffic was scaling up despite this being a blog geared towards local traffic.
The worst part about this traffic was that it had completely skewed the site’s metrics for the worse. For instance, its average bounce rate was now hovering around 84%. This is because the traffic had an Average Session Duration of 0-2 seconds and a 100% bounce rate.
From this alone you can tell this has all the telltale signs of a bot and a not a human. Interestingly, my site is just one of many sites that have fallen victim to this traffic going by this thread on the Analytics Help forum.
But what bot? It’s easy to surmise that this is a Bing or MSN bot from the tracked service provider, but that may not be the case. Here’s why?
Data from my analytics indicates that all the traffic:
- originates from Chicago, Illinois
- uses Internet Explorer 9.0 on Windows Vista/7
- uses a screen resolution of 1024×768.
In addition to this, one poster on the aforementioned thread claims to have contacted Microsoft who happened to deny having any links to that traffic.
Besides, this traffic is tracked even when the option to Exclude all hits from known bots and spiders in the Property’s View setting is enabled, giving us more credence to the likelihood that it’s a bad bot and not the Bing/MSN Bot.
Nevertheless, while Microsoft may have no knowledge of this traffic, what I know for sure (as we’ll see later) is that it’s no doubt originating from their network.
How to Block the Bad Bot
Unfortunately, Google analytics doesn’t tell us the IP address of the traffic it tracks, so we need to find out this info through some other means before we can block the bot.
In my case I started by checking my server access logs which are available for download in the panel provided by my host. The log however may have thousands of tracked IP addresses and user agents which may prove futile to analyze using the naked eye.
Fortunately we have the user agents and thanks to Analytics we precisely know the browser and OS the bot is using: Internet Explorer 9.0 on Windows Vista/7 whose default user agent string according to Microsoft should be:
On Windows Vista:
Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
On Windows 7:
Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0)
Sure enough when I searched the logs for this string I did get a lot of matches. The user agent however was slightly altered into:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Trident/5.0)
On each line where you find this user agent you should be able to find the associated IP address(es) that you can block.
To my dismay however, I later found out that all these IPs were associated to Cloudflare after I did WHO IS Lookup on them. This shouldn’t have come as a surprise had I recalled my site was on Cloudflare and all the traffic was being proxied via their Edge servers before reaching the Origin server of my host.
If you’re not on a proxy however you should get the correct IP addresses. So take note of this if you happen to use a similar service. Still, all is not lost; we can still find the IP addresses using some of Cloudflare apps.
Finding the IP Address on CloudFlare
We can use Cloudflare Firewall Rules to find the real IP address associated with the bot’s user agent. To do this go to the Firewall app and set up a new rule that blocks the user agent you found in the previous section.
Wait for some few hours then come back and check the 24 hour log to see which IP addresses have been blocked by that rule.
As you can see below, on my site all the traffic that matched that user agent was originating from one lone IP Address: 18.104.22.168, a very abusive IP going by the reports here.
An IP lookup indicates this address belongs to none other than: Microsoft.
Alternatively you can install a Cloudflare app called DataDome. This premium third-party app will track all the bot traffic your site is receiving then categorize it for you into good, bad and commercial bots. It will also give you the IP addresses, the owner of the IP, user agent and many other metrics for each event it tracks.
Under the free account you however only get a 30 day trial to use the service plus you cannot block any of the bad bots it logs. The trial period should however be more than enough for our task.
Pay close attention to what DataDome labels as bad bots, as they may not fit that description. For instance, it detects WordPress cron jobs from one of my plugins as a bad bot and on another occasion has flagged my IP address as doing a DDoS attack.
Blocking the Bot IP Address
Once you’ve found the rogue IP addresses you just have to block them from accessing your site. To do this, we’ll have to make use of a firewall. The firewall can be implanted in one of the following ways:
- Talk to your host and have them block the IPs from their end.
- Using a firewall plugin. If you’re using WordPress you’ve plenty to pick from.
- Use a Web Application Firewall (WAF).
Of the three options the first should be the easiest route to go. But should it not work out for you as it was my case, then you’ve no choice but to take matters into your own hands by using one of the other two options.
Of the two, a WAF would be the most ideal as it is not resource intensive as a security plugin running on your server (think WordFence and the like). This is because the IPs are blocked before reaching your server whereas a plugin does all the heavy lifting using your server’s limited resources. Some cloud based WAFs include the likes of Cloudflare and Sucuri.
Block IP Address or IP Ranges on Cloudflare
Cloudflare’s free account allows you to block IP addresses, albeit in a limited sense. To do this, set up a new firewall rule to block the IP in question. This is however only ideal if the bot traffic is originating from one or a few IP addresses (less than 5 since you only get 5 firewall rules with the free account).
If you had set up a rule to block the bot user agent, do also block the IP address since the bot may be using different fake user agents. After setting up the IP rule, move it to the top so that it takes precedence when Cloudflare applies the rules.
To block traffic from several IP addresses, in the Firewall app go to the Tools page and set up IP Access Rules. You can block specific IPs, an IP address range, ASN or all traffic originating from a particular country.
The Autonomous System Number (ASN) allows you to block traffic for all IPs originating from a specific ISP. This is especially useful if you want to block traffic originating from a specific country without having to block traffic from the entire country.
To find what ASN a particular IP address belongs to, search the IP address on this page.
Still in the tools page there’s User Agent Blocking which allows you to block up to 10 user agents.
If you need to block several user agents, use this option instead of exhausting your limited 5 firewall rules.
Firewall Plugin for WordPress
If for some reason you cannot use a WAF, I’d recommend you check out NinjaFirewall and Blackhole for Bad Bots plugins. The former is the closest thing to a true WAF that you can get as plugin on WordPress though the free version doesn’t include IP Access control.
Nevertheless, in the few days I’ve been using it has foiled a couple of other attacks that Cloudflare misses or doesn’t protect you against (e.g. Brute Force attacks on the login page).
It’s certainly better placed for this work compared to the likes of JetPack Protect which gives you no logs of the type of attacks or IP addresses of the attackers.
Anyway, I hope this will help you block that bot traffic for good and at least return some sanity to your Google Analytics data. Meanwhile, stay on guard for any new rogue IP addresses and promptly block them. Cheers!