UNIXy Next – Refocus

I wish a very happy new year to everyone reading this blog entry. And especially to our clients to whom I give my most heartfelt thanks for continuing to place their trust in us.

The start of 2017 is an excellent time to reflect upon the past few years and set a clear direction forward. But first, we must reflect deeply on who we are as an organization, what got us here, and why we’re doing what we’re doing. The work we do at UNIXy is central to individuals, organizations, upstarts, and businesses needing to manage the increasingly complex world that is online presence. Simply put, we help people.

This reductionist thinking is important in that it externalizes itself in ways and actions that permeate our culture. Our culture lives on helping our clients meet their online needs. This is often embodied in our slogan that is Truly Fully Managed Servers. But as we zoom into our culture and out to what we actually do, we must be very specific about what is it that we do or don’t do for our clients. The minute we begin to outline the dos and donts, we’ve already failed them. And this is where reductionist stance of helping people collapses.

 

In light of a fluid and changing online world where multiple services are needed to bring an online presence to life, I’ve sat down to refocus our mission in a crystal-clear and all-encompassing way. The tenets of our managed services are enumerated in the above illustration so as to leave not an iota of doubt as to what you should expect from us.

The four tenets are: support, security, performance, and outreach. Outreach fills the gaping void that glues everything together so our clients succeed online. Outreach is the part where UNIXy goes above and beyond to meet the clients’ needs whether the totality of their services reside with us or partially elsewhere in the cloud.

I’m looking forward to the coming months as we execute on this mission.

Yours sincerely,

Joe Hmamouche

Load Balanced Bot-split Approach to Counter Excessive Bot Traffic. When Search Engines Work Against You!

Websites with a large number of links tend to get a good share of hammering from crawlers like Googlebot. This crawler traffic often leads to major slowdowns and can often knock your server offline. And there isn’t much you can do besides upgrading your hardware. You could easily block all search engine traffic and be done with it. But that would be a death knell to your online business. It’s a lose lose situation. That’s because you depend on indexing and ranking to get noticed online. There’s, however, a workaround to dilemma. We’re happy to share it with you here.

 

Atrapitis

 

Instead of increasing your hardware fleet to several folds your current needed capacity for the sole purpose of absorbing search engine traffic, you can leverage a dirt-cheap high-density RAM server to cache as much of your content on a separate server where all bot traffic will be “trapped.” See, we already know how to identify most search engine traffic. This triage of traffic is key to logically and physically route bots and visitors exactly where we want them to go.

 

 

2015-03-14 10.59.21

 

More concretely, we run Varnish on the load balancer node to programatically split traffic based on the visitor’s user agent string (Nginx is a possible option but some LB features are only available in the commercial version Nginx Plus). Bots have well known and documented user-agent strings. It takes a few VCL lines to switch traffic lines to their respective backends (read: servers).

if (req.http.user-agent ~ “(?i)bing|(?i)googlebot”) {

    set req.backend = bot;

}

else {

    set req.backend = default;

}

Now that we have bot traffic routed to where we want it to go, let’s go over what the “cache box” should be and do.

First of all, it’s a high-RAM node (commensurate with the size of your content) with a one or two-core CPU running Varnish Cache. In terms of software configuration, it needs to be near-identical to the “realtime” box because it needs to be able to run the very same website (same software requirements). The “realtime” box will regularly push the latest copy of its DB and synchronize files. “Regularly” could be any reasonable interval you wish that to be. But one hour is reasonable for most implementations. A couple of shell scripts should do it for both the DB dump/pump and file sync. Be sure to avoid table locks if you’re not running InnoDB.

It’s important to put a reasonably high TTL if you’re going to deploy a low-powered CPU on the bot box. A 2-day TTL for both static and assets and pages wouldn’t be unusual. Your cache hit won’t be efficient and will defeat the purpose of the whole setup otherwise. A simple cache warmer can render the whole setup super efficient (ex: runs on sitemap changes)

That’s all!

cPanel Varnish Plugin release 2.4.0

This new release sports new features and several bug fixes. Notably the Always Up feature. cPanel’s webmail and WHM redirection and/or proxying has been fixed. cPanel’s new Paper Lanter theme has been implemented. Varnish has been upgraded to the latest. And bugs have been fixed among other things. The release is available in the download area.

Enjoy the new release and happy New Year!

Always Up Feature

Note: Always Up is a feature that was innovated and engineered by UNIXy back in 2011 and was deployed on some of our clients’ high traffic servers and clusters since. A detailed article was written on this technology: http://blog.unixy.net/2010/11/3-state-throttle-web-server/

Always Up is a new feature in the cPanel Varnish Plugin starting from version 2.4.0. This feature addresses a serious issue that affects most busy Web servers especially at times when one could be unavailable or away from their desk. As you know, Web traffic is highly erratic. A well behaving server with normal traffic could become unresponsive and crash within minutes due to a traffic spike (slashdot, reddit, viral, news, events, etc) or a runaway process within the system itself (backup, batch job, etc). Always Up is highly dynamic in the sense that it doesn’t kick in and go into effect until server vitals hit a specific programmable thresholds. Thresholds you get to set (in WHM -> Varnish -> Always Up). It’s dynamic because it’s a throttle-based caching mechanism that gets more aggressive as the load climbs up.

Always Up

Once the system triggers Always Up, it sets in motion aggressive caching measures so the server is able to cope with the extra traffic as gracefully as possible. And as soon as traffic subsides, it reverts back to its original pristine state. Always Up requires the input of three pairs of variables (ex 30:1800, 60:3600, 180:7400).

Always Up Screen

Always Up Screen

Each variable (ex: 30:1800) contains two numbers, which are separated by a colon. The first part of the variable designates the threshold 5-minute load average number at which Always Up intervenes to modify caching aggressiveness and set the caching in motion in the system. So as soon as the 5-minute load average is registered in the system, a systemic caching TTL corresponding to that threshold is set in the system.
Always Up Threshold

So in the example above of 30:1800, 60:3600, 180:7400, a 30 5-minute load average will trigger a systemic caching level value of 1800 seconds. If the load continues to climb or is at 60 load average, the TTL becomes 3600 seconds. And so on. The TTL is reverted as the load decreases back down in lock step fashion.