Disclaimer: this post covers Linux only. The installation steps will also only work with Linux Redhat and Debian derivatives. They can be made to work with other Linux and Unix flavors as well.
What Is This Post About
There are times when one can’t tell what is it that’s causing their server (as in VPS or dedicated server) to perform poorly. This post will hopefully help someone make an educated decision on whether to proceed with an upgrade or not. At least, it will help someone find the culprit and act accordingly.
The Basics First
There are mainly four components that can make or break server responsiveness. Those are hard disk, memory, CPU, and bandwidth. It only takes one weak component to cause a server to be unresponsive. Linux provides tools that help visualize and pinpoint the lowest performing component(s). The tools are somewhat analogically similar to a car dashboard. You can tell whether the car is running out of gas or if the engine temperature is higher than usual. etcetera. Ultimately, one can correlate component weakness with server responsiveness.
So, what makes a server sluggish? Is it the lack of memory or the lack of CPU cycles? It could be either, neither, or both. It’s essential to realize that there are so many jobs a CPU can run before it’s pegged. Hard disks can only take so many requests before it saturates. As the components are fed more work than they can handle, the system starts queueing jobs. The queuing of jobs is one of the earliest signs of a server that has reached capacity.
Introducing The Dashboard
How do we catch the early signs of a busy server? Unfortunately, and for a non-maintained server, user complaints is one of the earliest signs. There’s no doubt that we need to be proactive and on the lookout. There’s the great Top tool but it doesn’t provide a holistic view of what’s cooking on the srever. The good news is that there’s a tool that will do all the mundane work for us, all day long unattended. And from the comfort of your home once a day or once a week, you can go over the collected information.
My favorite tool is SAR, which stands for System Activity Reports. SAR is a collection of programs that run in the background on minimal resources and silently collect vital system data. It collects hard disk, CPU, memory, network information, and much more. While the SAR command line tools are all you need to display the data, there’s a bit of a learning curve to it. But don’t be discouraged; there are other ways, besides the command line, to display the data. We’ll go over an example later on.
So, let’s get SAR up and running. For those that have skipped the disclaimer, this tutorial deals with Linux Redhat and Debian compatible installs only. Therefore, it may or may not work with all Linux distributions. The following instructions require that you remote into the server. You should have a shell / command line open and ready for your input.
As root, install the sysstat package by executing the following command as root:
|apt-get install sysstat|
Edit the file /etc/default/sysstat and change the line:
No action is required on your part if it’s already set to true. Also, edit the file /etc/sysstat/sysstat and set the variable HISTORY to however many days you want SAR to keep historical data:
And last, kick off the process:
Install the sysstat tools
|yum install sysstat|
Edit the /etc/sysconfig/sysstat file and change the variable HISTORY as such:
Then restart sysstat:
|service sysstat restart|
That’s all you have to do on the server side. We’ll have to let SAR run for a few hours and let it collect data.
The Fun Part
This is the part where we get to see the data in pretty graphs and charts! A tool that goes with the name kSar, and that you can download for free, will itself remote into your server, download the SAR-generated files, and neatly graph them for you. This is the best tool I have found so far. Simply point your browser to http://sourceforge.net/project/showf…ease_id=645912 and hit the download link. Unzip the file, and double click on the JAR file. Once the application has launched, select the “Data” option in the menu, then “Launch SSH Command”. Input your server parameters (Ex: firstname.lastname@example.org:22). Finally, select “Yes” when prompted for running the command “sar -A”. I’ve attached some screenshots of kSar in action. kSar has a shortcoming, however. It only displays one day worth of data at a time. But you can always save the daily reports as a PDF file and view them at a later date.
Back To Work!
Let’s make some sense out of those graphs. Of interest are the following panel tabs: CPU all, Swapping, Memory Usage, Load, Processes, Paging Activity, Sockets, and finally Interface traffic under the Interface tab. We can group Swapping, Memory Usage, and Paging Activity together under the same category as they’re correlated to an extent. CPU all, Sockets and Interface traffic can be regarded as two independent indicators.
About 99% of the time, Sockets and Interface traffic are leading indicators in the sense that the number of open sockets and traffic increases as soon as the node in question sees more traffic as a result of user visits. CPU all shows the amount of CPU used over time. You can ignore short, occasional spikes. But sustained usage of 90% and above for periods of time is usually a sign of pegged CPUs. It’s most likely time to upgrade the CPUs or dig further into what’s hogging CPU. Swapping is the next most interesting indicator to look at. Any spikes here are usually a sign of lack of free memory. Let’s select Memory Usage to confirm. If Memory Usage is anywhere less than roughly 4MB over extended periods of time, then the box is begging for a memory upgrade (in conjunction with Swapping indicator). Paging, and in particular a spike in page faults, can further confirm the lack of memory.
To recap, and in general, interface traffic and therefore number of sockets increase, which leads to resources being consumed in order to serve data. The latter then causes either memory, CPU, or both to spike up, which then might cause swapping (disk I/O). One might suggest improving disk I/O but swapping is only a symptom of the issue. In this scenario, the lack of memory is definitely the root cause of the issue. But keep in mind that there are scenarios when disk I/O spikes up while memory is plenty available on the system. This is an easy case to resolve as most likely a rogue process is hitting the disks more than it should. One last scenario is when CPU, memory, and disk I/O are all almost flat but responsiveness is sluggish. In this case, pay close attention to the Interface traffic and Interface errors graphs for either saturation of your allocated port speed or errors.
That’s all folks. Feel free to reply back with questions, comments, corrections.