Load Balanced Bot-split Approach to Counter Excessive Bot Traffic. When Search Engines Work Against You!

Websites with a large number of links tend to get a good share of hammering from crawlers like Googlebot. This crawler traffic often leads to major slowdowns and can often knock your server offline. And there isn’t much you can do besides upgrading your hardware. You could easily block all search engine traffic and be done with it. But that would be a death knell to your online business. It’s a lose lose situation. That’s because you depend on indexing and ranking to get noticed online. There’s, however, a workaround to dilemma. We’re happy to share it with you here.




Instead of increasing your hardware fleet to several folds your current needed capacity for the sole purpose of absorbing search engine traffic, you can leverage a dirt-cheap high-density RAM server to cache as much of your content on a separate server where all bot traffic will be “trapped.” See, we already know how to identify most search engine traffic. This triage of traffic is key to logically and physically route bots and visitors exactly where we want them to go.



2015-03-14 10.59.21


More concretely, we run Varnish on the load balancer node to programatically split traffic based on the visitor’s user agent string (Nginx is a possible option but some LB features are only available in the commercial version Nginx Plus). Bots have well known and documented user-agent strings. It takes a few VCL lines to switch traffic lines to their respective backends (read: servers).

if (req.http.user-agent ~ “(?i)bing|(?i)googlebot”) {

    set req.backend = bot;


else {

    set req.backend = default;


Now that we have bot traffic routed to where we want it to go, let’s go over what the “cache box” should be and do.

First of all, it’s a high-RAM node (commensurate with the size of your content) with a one or two-core CPU running Varnish Cache. In terms of software configuration, it needs to be near-identical to the “realtime” box because it needs to be able to run the very same website (same software requirements). The “realtime” box will regularly push the latest copy of its DB and synchronize files. “Regularly” could be any reasonable interval you wish that to be. But one hour is reasonable for most implementations. A couple of shell scripts should do it for both the DB dump/pump and file sync. Be sure to avoid table locks if you’re not running InnoDB.

It’s important to put a reasonably high TTL if you’re going to deploy a low-powered CPU on the bot box. A 2-day TTL for both static and assets and pages wouldn’t be unusual. Your cache hit won’t be efficient and will defeat the purpose of the whole setup otherwise. A simple cache warmer can render the whole setup super efficient (ex: runs on sitemap changes)

That’s all!

Always Up Feature

Note: Always Up is a feature that was innovated and engineered by UNIXy back in 2011 and was deployed on some of our clients’ high traffic servers and clusters since. A detailed article was written on this technology: http://blog.unixy.net/2010/11/3-state-throttle-web-server/

Always Up is a new feature in the cPanel Varnish Plugin starting from version 2.4.0. This feature addresses a serious issue that affects most busy Web servers especially at times when one could be unavailable or away from their desk. As you know, Web traffic is highly erratic. A well behaving server with normal traffic could become unresponsive and crash within minutes due to a traffic spike (slashdot, reddit, viral, news, events, etc) or a runaway process within the system itself (backup, batch job, etc). Always Up is highly dynamic in the sense that it doesn’t kick in and go into effect until server vitals hit a specific programmable thresholds. Thresholds you get to set (in WHM -> Varnish -> Always Up). It’s dynamic because it’s a throttle-based caching mechanism that gets more aggressive as the load climbs up.

Always Up

Once the system triggers Always Up, it sets in motion aggressive caching measures so the server is able to cope with the extra traffic as gracefully as possible. And as soon as traffic subsides, it reverts back to its original pristine state. Always Up requires the input of three pairs of variables (ex 30:1800, 60:3600, 180:7400).

Always Up Screen

Always Up Screen

Each variable (ex: 30:1800) contains two numbers, which are separated by a colon. The first part of the variable designates the threshold 5-minute load average number at which Always Up intervenes to modify caching aggressiveness and set the caching in motion in the system. So as soon as the 5-minute load average is registered in the system, a systemic caching TTL corresponding to that threshold is set in the system.
Always Up Threshold

So in the example above of 30:1800, 60:3600, 180:7400, a 30 5-minute load average will trigger a systemic caching level value of 1800 seconds. If the load continues to climb or is at 60 load average, the TTL becomes 3600 seconds. And so on. The TTL is reverted as the load decreases back down in lock step fashion.

Has your own firewall blocked you? We have an app for that!

Server owners tend to be power users as they access multiple services (Web, control panel, email, SSH, etc) on their server within a very short time span. This can generally trigger an IP block at the firewall. If you happen to be on the go and haven’t had a chance to whitelist your mobile IP, you’re out of luck (well not really because we’re still here to help 24/7)! But power users are POWER users! So we empower them so they can unblock themselves without needing to enter a single request with us.


Firewall Unblock Tool

All you need to do from now on is access http://unblock.unixy.net/ and be unblocked from your server! And… done!



cPanel Varnish LiteSpeed Plugin v2.2.0. Simply Amazing!

The cPanel Varnish LiteSpeed Plugin has received a brand new update with features that will make your Websites pull at 250% faster page loads from the previous plugin release. Yes, that’s 250% speed improvement on top of any LiteSpeed cache option. The Scripts Store now includes Varnish Warp add-ons for WordPress, Magento, and Drupal. This is the smartest acceleration available anywhere on the Web.

Knowledge Base: http://www.unixy.net/secure/knowledgebase/37/Scripts-Store