Dec 5 12:31:31 server varnishd: Child (28447) not responding to CLI, killing it. Dec 5 12:31:59 server varnishd: Child (28447) died signal=3 Dec 5 12:32:03 server varnishd: child (22431) Started Dec 5 12:32:04 server varnishd: Child (22431) said Child starts Dec 5 12:32:04 server varnishd: Child (22431) said SMF.s0 mmap'ed 268435456 bytes of 268435456
The message above comes from the Varnish Manager. The Manager is the Varnish process (there are two Varnish processes) that monitors and manages the Varnish caching process. Here it’s telling us that caching process has stopped responding back with a PONG to its PING. A ping from the Manager to the Cache process with its response look like this and happens every 3 seconds (at least in this configuration):
0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1323181633 1.0 0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1323181636 1.0
So going back to the error message, the Manager is telling us that it hasn’t received a “PONG” back from the Cache process in a while. A “while” is controlled by the variable cli_timeout which has a default of 10 seconds in Varnish 3.x and 20 seconds in Varnish 2.x. So the Manager waits for about 10 seconds before it fires a UNIX kill using signal SIGQUIT (signal=3). Next the Manager starts the child back up again afresh.
You may be asking yourself: why is the Cache process not responding? There are a few of possible causes. Either the cache process is deadlocked (deep code issue) or the server is under a very heavy disk IO load / memory shortage. So how do you determine which category you fall under? I recommend checking the load average and memory utilization levels on your server using Sysstat tools.
Did you know?
UNIXy is a fully managed server and cluster provider. What this means is we don’t expect you to know anything about servers or server management. The good news is it doesn’t cost you extra to have us manage your UNIXy server! Get in touch with us here to get the ball rolling!
That’s all folks.