The WordPress core and a good number of its plugins rely on a cron system called WP-Cron to schedule and execute tasks at specified times. This system is however not foolproof and often results in delayed or failed cron events.
A common diagnosis of a failing cron system that most WordPress users may experience at one time is scheduled posts that fail to publish at the designated time. Other common symptoms include not getting updates, failure to send out newsletters or receive scheduled reports from a plugin and failed scheduled backups.
Recently I noticed one of my sites was showing clear signs of the latter symptom. A month had passed and I was yet to receive a successful backup report from the UpdraftPlus Backup plugin. When I checked the plugin settings my assertions were confirmed. The remote backups had failed for the past month.
It’s here that I installed the nifty WP Crontrol plugin to ascertain what was going on with the WP-Cron System. When I viewed the Cron schedules page I was greeted with the following error:
There was a problem spawning a call to the WP-Cron system on your site. This means WP-Cron events on your site may not work. The problem was: Unexpected HTTP response code: 503
According to WP Crontrol all the scheduled cron events on this site had been failing for the past month. Before that, the crons used to run just fine.
Troubleshooting the Error Code 503
Coincidentally, at the beginning of the month I had received a Cloudflare Server Error Report that indicated a high percentage of 503 errors for that domain. I didn’t make much of it and quickly brushed it off as Cloudflare’s sly attempts to upsell me their Load balancing which they proferred as a solution in the report. Turns out I was mistaken.
After looking some material on the 503 error code online the direction pointed to the server side of things and not WordPress. Still, that didn’t stop from troubleshooting WordPress by disabling all plugins, using the default theme and even enabling the WP Debug mode.
None of that worked and the debug log surprisingly yielded nothing to do with the cron job failures. At the recommendation of the Litespeed documentation I also tried disabling the Opcode cache but that didn’t resolve the issue.
Having exhausted my efforts on WordPress I turned to the server side and the first place I checked was the server error log. Unfortunately, I didn’t find anything pertinent there other than some Mod Security entries and PHP notices.
I next turned to my host for assistance who proceeded to turn off Mod Security on checking the server logs but of course, that didn’t help either.
This was proving futile so in a last ditch effort I decided to just run the cron jobs manually. The cron jobs I was most concerned about were the Litespeed Cache ones as they were essential to optimizing my images to WebP and generating Critical CSS via their QUIC.cloud Service.
Fortunately, the Litespeed cahe plugin has a Force cron button that manually triggers its scheduled cron events. Hitting that button however only loaded a 503 error page from the Quic.cloud servers. I was stumped.
Cloudflare Behind WP-Cron Job Failures
Assuming WordPress and the Server had nothing to do with the failures, what else could possibly be a potential cause? The missing link was Cloudflare that up to this point I had conveniently assumed to be an unlikely culprit.
So I proceeded to pause Cloudflare for the site and immediately after doing that the rather obstinate 503 error in Wp-Crontrol disappeared. Shortly afterwards the cron events started working again.
There was no doubt at this point that Cloudflare was behind the failures, only I didn’t know what was exactly causing them. The most likely culprit however was the Firewall app.
Fortunately, Cloudflare provides a log for firewall events in the past 24 hours. The log in this particular case was replete with JS Challenges under the Bot fight mode service.
However, with the default columns used in the logs it was difficult to make any sense of what these events really meant. I was in the brink of attributing them to the usual case of bad bots before I noticed that all of them had the same IP from UK.
Now what you don’t know is that the server (origin server not the Cloudflare ones) for this site is located in the UK. Furthermore, ordinarily (that is from my limited experience) bad bots mostly came from either the United States or Eastern European countries like Ukraine and Russia.
In a bid to find out more about these bots I toggled some additional columns in the firewall log to reveal their paths, query string and user agent strings.
The results as shown above were telling: the path for all the events was
/wp-cron.php, the query string
?doing_wp_cron=… and the user agent
WordPress/5.7.2;.... An expanded log revealed the following details:
Based on the query string I believe these cron events were from the Litespeed Cache plugin which at the time had 11 delayed requests.
Enter Cloudflare’s Bot Fight Mode
So Cloudflare for whatever reason was putting the WordPress cron events on this site through a JS Challenge which by all indications they failed. This however wasn’t the default setting and actually I was partly to blame for this mess.
You see, about a month ago I had innocently activated the Bot Fight Mode option in the Firewall app settings after observing that this relatively new site was getting a lot of non-human traffic that I couldn’t account for.
The Bot Fight Mode is a relatively new product and is available to both Free and Pro / Business (as Super Bot Fight Mode) Cloudflare users. The mode cannot be configured for users in the Free plan which means Cloudflare handles the bot detection automatically.
According to their documentation, Cloudflare doesn’t outrightly block what it deems as a bad bot but instead puts them through a “computationally expensive challenge”. Apparently this technique is called tarpitting and it puts the automated bots through CPU intensive tasks to increase the operating costs for its owner.
This is a far more effective in stopping bad bots than just blocking them.Also, I suppose this is why the cron events were returning a 500 server error code instead of the more likely 400 error codes.
What however I don’t quite understand is how Cloudflare could detect WordPress cron events as potentially bad bots. It seems too much of an oversight on their part seeing a great deal of the internet runs on WordPress not to mention they offer a WordPress plugin and optimization product.
Bad IP or False Positives
For this reason my guess is that the IP address of the server the site is hosted on is what was tripping its bot detection. My assertion for this is that I have bot fight mode activated on another site on a completely different server and haven’t experienced such a problem so far.
Nevertheless, Cloudflare does point out that although the bot fight mode products are “designed to fight malicious actors on the Internet, they may challenge API or mobile app traffic.” No wonder they recommend upgrading to the Enterprise plan for fine-tuned control.
As such, it’s also probable that the server’s IP reputation is perfectly in good standing and this is entirely a false positive on Cloudflare’s part. Whatever the case, bot fight mode is a very useful product but one that has potential for far-reaching consequences in the event it makes false positives.
Evidently, it’s intended target market is Enterprise users who get some level of control over it. Users on the free plan should therefore consider activating the mode only:
- if they notice they’re getting significant traffic from bad bots; and
- are prepared to monitor the firewall log daily for false positives
Otherwise, one risks not only challenging good bots in the process but also, as Cloudflare admits, legitimate API and mobile app traffic as was the case here.