Freeky Bug

Ever have one of those bugs that customers complain about, but you just cannot reproduce it? Here is a good one…

Customers were complaining about being logged out when clicking a download link.

This particular setup is a Cisco CSS 11501 series load balancer with 2 Dell Poweredge web servers sitting behind it.  Each webserver is running apache, as well as an application server (python) which handles authentication and processing for THAT server.

For weeks, I could not reproduce this bug.  So tonight when I finally got bit by it (at home), I was clueless for a while.  The code is so simple.  A simple key lookup in a simple dictionary, yet it just was not making sense.

Here is the story:

A while ago, we were having problems with Internet Explorer downloading content over SSL.  This turns out to be a common problem with IE, so to fix it, I caused the downloads to not use SSL, which is more efficient anyway.

We use a cisco hardware load balancer which balances incoming requests to different backend servers.  It has a feature called STICKY SOURCE IP, which means that any connections routed from the same IP to the same site will be delivered to the same backend server.  This is nice, because you are always visiting the same server.

So as it turns out, by turning the download SSL off, the load balancer was using another “site” definition to handle the DOWNLOAD request.  STICKY SOURCE IP was out the window, and the request was being passed back to a “random” webserver.

About 50% of the time, users (like me tonight) were tossed to the other server, which knew nothing about the user login. That is why it was complaining about the “WB4_App::$DSEG and/or WB4_App::$AuthToken must be set in order to contact the     applications server.” error message, which is not one that should normally be shown.

To make matters worse, our IP address at work was apparently always using the same server, so I could not reproduce the problem.  I’m lucky that it happened to me at home, or I would still be banging my head against the desk…

Interesting Thoughts on Cloud Server Performance

Apache load testing on a Cloud Server – Jason – 7/31/2009

I recently created a cloud server for a wordpress blog, and configured it to the point that the blog was working OK.  Then I decided to check the performance aspects of the server, as it was a small 256 MB + 10GB machine.
Using apachebench (ab), I ran some load tests on the blog home page.  The server choked to death. It was swapping so bad, that RackSpace Cloud sent me this email:

This is an automatic notification to let you know that your Cloud Server,, is showing a considerable amount of consistent swapping activity. Quite often this is an indicator that your application or database are not as efficient as they could be. It also may indicate that you need to upgrade your Cloud Server for more RAM.

That’s strange…
I found that the response rate was:

4 requests per second, 10 concurrent connections

When the concurrency was raised to 50, the server died.  It took 10 minutes for it to calm down enough that I could LOG IN and KILL apache.
So upon further investingation, I found that the default httpd.conf configuration was WAY TOO LARGE:
We’re only working with 256 MB ram here, so if each apache process takes up any amount of memory at all, we have a low limit.

<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
ServerLimit      256
MaxClients       256
MaxRequestsPerChild  4000

Only after drastically reducing the configuration to the following, did we get reasonable performance:

<IfModule prefork.c>
StartServers       4
MinSpareServers    2
MaxSpareServers   4
ServerLimit      4
MaxClients       4
MaxRequestsPerChild  4000

As it turns out, the performance went up considerably:

16 requests per second, 50 concurrent connections

Still, I thought that it could get better.  So I looked into installing some PHP opcode caching software.

The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. Its goal is to provide a free, open, and robust framework for caching and optimizing PHP intermediate code.

As it turns out, it was easy to install.

# yum install php-pecl-apc

And after restarting apache:

47 requests per second, 50 concurrent connections

Even during this load test, the site was still responsive from a web browser.
Not bad for a cheap little Cloud Server, eh?