Category Archives: Server Admin

New Equipment Is Here

The blade chassis arrived on a palette today (and boy, was that an awesome time getting it upstairs), so that means all the new server equipment is here (except for additional RAM).

We got 1 blade chassis, 10 loaded blades (except for RAM right now), 2 load balancers (1 is a hot spare) and a managed 48-port gigabit switch (along with redundant power supply).

Now I need to go to Home Depot and get an extension cord and plug adapter so I can plug the chassis into a 220V outlet.

Until the equipment is ready to be installed in the data center, we have Yashi (and his camera shy brother, Wiggly) standing guard over it all.


Just so I don’t forget, the chassis has an L6-30P NEMA plug and the drier has a 14-30R NEMA receptacle (for my adventure to Home Depot later).

Dell Blade Servers Are Here

The new blade servers were delivered this morning (10 of them). Hopefully should get the chassis tomorrow (it shipped separately), and the new switch and load balancers later in the week.

Wiggly (one of my cats) is terrified of change, and now he won’t leave my office because there are big boxes in the hallway. heh


I just remembered the chassis runs at 220V instead of 120V. I know that’s fine for the data center, but I just realized I don’t have 220V in my place to configure them (maybe my drier or stove is 220V… I better check).

Multiple Instances Of mysqld

My primary MySQL server has been VERY overloaded lately (which is the main reason new blades are on the way), but today I decided to see what I can do about it in the meantime (the parameters have already been tuned as much as possible).

First I toyed around with a single node MySQL Cluster… it didn’t work terribly well. I think you really need 2 or 4 nodes for it to be effective.

Then I decided to run two different copies of mysqld on the same machine. Dude, this works *so* well under a high load that it’s almost unbelievable. While the memory fragmentation issues are still there, it’s 20x better (really). I should have done this a long time ago… 🙂

APC Datastore Class For vBulletin

On one of my ultra-high traffic web servers, I switched from eAccelerator to APC today (an opcode/caching system for PHP). So far it seems pretty nice… Especially the ability to disable stat for each PHP request.

I ended up making a datastore class for vBulletin also so I could use it for the forum, so if anyone else is using vBulletin on a server with APC, here you go (if you know what this is for, you will know where it goes :)).

[code=php]// #############################################################################
// APC

* Class for fetching and initializing the vBulletin datastore from APC
* @package vBulletin
* @version $Revision: $
* @date $Date: 2006/05/08 16:51:06 $
class vB_Datastore_APC extends vB_Datastore
* Fetches the contents of the datastore from APC
* @param array Array of items to fetch from the datastore
* @return void
function fetch($itemarray)
if (!function_exists(‘apc_fetch’))
trigger_error(‘APC not installed’, E_USER_ERROR);

foreach ($this->defaultitems AS $item)

if (is_array($itemarray))
foreach ($itemarray AS $item)


// set the version number variable
$this->registry->versionnumber =& $this->registry->options[‘templateversion’];

* Fetches the data from shared memory and detects errors
* @param string title of the datastore item
* @return void
function do_fetch($title)
$ptitle = $this->prefix . $title;

if (($data = apc_fetch($ptitle)) === false)
{ // appears its not there, lets grab the data and put it in memory
$data = ”;
if ($dataitem = $this->dbobject->query_first(”
SELECT title, data FROM ” . TABLE_PREFIX . “datastore
WHERE title = ‘” . $this->dbobject->escape_string($title) .”‘
$data =& $dataitem[‘data’];
$this->build($title, $data);
$this->register($title, $data);

* Updates the appropriate cache file
* @param string title of the datastore item
* @return void
function build($title, $data)
$title = $this->prefix . $title;

if (!function_exists(‘apc_store’))
trigger_error(‘APC not installed’, E_USER_ERROR);
$check = apc_store($title, $data);


I just found out APC datastore support was added to the yet unreleased vBulletin 3.6. Nice!

Update 2

I’ve since switched back to eAccelerator. APC was causing Apache segfaults under ultra-high loads.

Getting Around Dell’s Whacky Pricing

As I mentioned previously, I’m looking to get a bunch of Dell blade servers, but their pricing system (seemingly random pricing changes every day) is really irritating me. So I think I may have come up with a solution… Just buy stripped down blades and add the RAM, hard drives (and maybe even 2nd CPU) yourself.

As of right now, a single loaded blade configured as I would want it is $9,062 (that’s 2 dual core CPUs, 12GB RAM, 2 146GB 1k rpm drives, 3 gigabit ethernet ports, SuSE Linux Enterprise 9, etc.)

But if I strip the CPU, RAM and hard drives down to a minimum (1 dual core CPU, 1GB RAM, 1 36GB 15k rpm drive), the cost is $3,025.

Dell doesn’t offer 4GB DIMM modules, but they do say the blades support them. It’s actually cheaper to use 4GB DIMMs instead of 2GB DIMMs because you can use double ranked for the 4GB vs. single rank for the 2GB. This also means by using 4GB DIMMs you can max out at 16GB of memory instead of 12GB. 4GB DDR2 DIMMs are $581 each.

146GB 15k rpm U320 SCSI drives are $275 each (would need to find out if Dell sells blank drive carriers since they are hot swappable.. if not, I found them on eBay for $8.95).

A 2nd processor is $935 (the user’s guide for the blades actually have instructions for replacing a CPU, so maybe you can add one yourself too).

So if we add it all together we could have an identically configured blade (except we would have 4GB MORE RAM) for $6,834 instead of Dell’s $9,062 price. Also, would probably just end up adding 8GB RAM for now (9GB total) which would bring the per blade cost down to $5,672.

Now if they would just use AMD Opteron processors instead of Intel Xeon…….. 🙂

Dell Pricing Fluctuations

Can I just tell everyone how annoying Dell’s price fluctuations are? I’m trying to purchase a blade chassis and 10 loaded blades. One day the blades are $66,030, then they are $105,400, then $66,030 again, now they are $88,040 (all pricing for identically configured blades of course). Finally I got pissed and called Dell, and their response was, “Well, we change our pricing every week.”

Gee, really??

What, do you have to roll the dice and try to guess when they will be a “normal” price again and buy them in that 15 second window? Really f’ing annoying…

WordPress Is NOT Scaleable

The core of WordPress (this blog software) is pretty much a piece of crap as far as it’s “guts” are concerned (although I knew this already, I just didn’t care because my blog doesn’t get enough visitors to really make that fact matter much).

Anyway, I woke up this morning to my servers being thrashed (database server was hitting it’s max limit of 2,500 concurrent connections). Turns out it was because of a front-page Digg (the 3rd one for in the last 60 days, but the 1st one for my blog with the crappy WordPress backend). That didn’t hold up to the “digg effect” for very long.

I ended up cobbling together a caching mechanism for WordPress real quick that actually made everything okay, but what I really want to know is if anyone knows of any blog software out there that doesn’t have a crap backend? One that can hold up under load if need be. Sure would be nice if there is one out there already so I don’t have to do it myself.

This is the digg in case anyone is curious. It was dugg by the same person that got a front-page Digg for previously. Digg is powerful… a crazy amount of traffic at once. It’s also what spawned the server fundraiser going on now. I think TOPS30 needs to be stabbed.

Installing APC On BSD Variants

Alternative PHP Cache is a PHP caching mechanism (like Turck mmCache, eAccelerator, etc.) that is being developed directly by PHP developers. In fact, PHP 6.0 is going to include APC Cache in it’s core framework, so it’s certainly something PHP developers/admins should start looking at.

Anyway, if you install APC Cache and Apache fails to start afterwards, check your Apache error log to see if you get something like this:

[Thu Mar 23 15:18:28 2006] [apc-error] apc_shm_create: shmget(0, 67108864,914) failed: Invalid argument. It is possible that the chosen SHM segment size is higher than the operation system allows. Linux has usually a default limit of 32MB per segment.
PHP Warning: Unknown(): Unable to load dynamic library './/usr/local/lib/php/' - (null) in Unknown on line 0

Most BSD variants (including Mac OS X Server in my case) don’t allow much shared memory to be allocated by default. Lucky, it’s an easy fix…

My OS had a default allowance of 4MB max for shared memory. You can alter that by adding this to your /etc/sysctl.conf file (or creating it if it doesn’t exist):

My new /etc/sysctl.conf file…


shmall should be shmmax/4096

So the above config will let you use up to 128MB for shared memory.

You can’t dynamically set the shared memory kernel variables with the sysctl command because once it’s set, it can’t be altered. Because of that, you must reboot your server after you edit the sysctl.conf file…


I just realized that *only* editing sysctl.conf works on Mac OS X. For Mac OS X Server, you need to comment out the kern.sysv.shm* lines in /etc/rc (in Mac OS X Server those commands are called before sysctl.conf is read for some reason).

Google Not Interpreting robots.txt Consistently

I had an issue where Googlebot was spidering parts of my site that were not allowed in the robots.txt file…

My old robots.txt file…

User-agent: *
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /scripts/
Disallow: /ads/

User-agent: Googlebot
Disallow: /ebay_

Hmmmm… that’s weird… Googlebot is still spidering stuff it shouldn’t be… - - [14/Mar/2006:06:21:07 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//" - - [14/Mar/2006:10:26:18 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//" - - [14/Mar/2006:14:29:35 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//" - - [14/Mar/2006:17:47:21 -0800] "GET /ads/ HTTP/1.1" 302 38 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +//"

So I made an inquiry to Google about this, and I actually heard back (nice!)…

While we normally don’t review individual sites,
we did examine your robots.txt file. Please be advised that it appears
your Googlebot entry in your robots.txt file is overriding your generic
User-Agent listing. We suggest you alter your robots.txt file by
duplicating the forbidden paths under your Googlebot entry:

User-agent: *
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /scripts/
Disallow: /ads/

User-agent: Googlebot
Disallow: /ebay_
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /scripts/
Disallow: /ads/

Once you’ve altered your robots.txt file, Google will find it
automatically after we next crawl your site.

Okay… I can live with that… easy fix. But check this out… Google’s own robots.txt testing tool within Google Sitemaps show the old robots.txt as being able to block Googlebot as expected.

So how about some consistency here? And more importantly, if anyone at Google is reading this, how about someone tell me why my blog is banned in your index… 🙂

MySQL 5.1 Out Of Alpha

With version 5.1.7, MySQL 5.1 (which is something I’m [not] patiently waiting for) has gone from alpha to beta status.

The stuff I’m really looking forward to is it’s improvements to the NDB Cluster engine…

  • Disk Data tables (before NDB Cluster required everything to be memory-resident)
  • Integration of MySQL Cluster and MySQL replication
  • Variable sized records

Now hurry up through the beta phases! 🙂

The only thing (that I can think of) that MySQL Cluster will be lacking for 5.1 is support for FULLTEXT indexes… not a deal breaker though since you will be able to replicate to a MyISAM to get a FULLTEXT indexed table. Should see it in MySQL 5.2 though (hopefully). Oh, and the ability to alter live tables sure would be nice.

Dell PowerEdge 1855 Blade Server

I’m looking at the possibility of moving some stuff I’m doing to blade servers. Anyone have any experience with blade servers in general? And if so, now is the time to gimme some input! 🙂

This is what I’ve been looking at lately…


Then I could just pop in a new blade (computer) when I need more power. A 7U blade chassis will hold 10 blades/computers, so a 7U chassis could hold 40 Xeon processors, 120GB RAM, 2.9TB drive space (15k rpm Ultra 320 SCSI)

Prevent DoS Attacks Via DNS (BIND)

A malformed UDP packet to your DNS server can cause it to respond to an IP address that never made the request (with the response being being more bytes than the request). So someone malicious could use one of your name servers to throw unwanted traffic at a 3rd IP address. Annoying… but pretty easy to solve. You can setup BIND to only answer DNS queries that it’s authoritative for except for specific blocks of IPs which it will do recursive lookups for (basically internal IPs that could use the DNS server as it’s name server for lookups).

Besides your server becoming part of a DoS attack, it can suck a ton of your own bandwidth (I was seeing cases where short-lived attacks were saturating 3Mbit worth of my bandwidth). Not any more! 🙂 I figured out what was causing the bandwidth spikes with my friend, tcpdump.

I’m too tired to get into more details (that’s what Google is good for), but you can basically add something along these lines to your /etc/named.conf file:

allow-recursion {;;};

That will ignore DNS requests from any IP (except those 2 subnets) when the IP makes a request about any domain that the DNS server not an authoritative server for.


If you ever need to figure out what is eating bandwidth on a server, tcpdump comes in handy…

tcpdump -n -i any

That will spew out everything, so you might be able to find anything that looks suspicious in there. Say you find the IP address of doing something suspicious, you can zero in on them to see if they are doing anything naughty like so:

tcpdump -n -i any host

In my case, someone was utilizing one of my DNS servers for about 200 lookups per second (not logging DNS lookups and it’s UDP traffic so it was hard to figure out where the bandwidth was going).

Once you find a naughty IP address, now just block them like so:

route add -host reject (Linux)


route add -host -reject (Mac OS X/BSD)

MySQL Memory Fragmentation

Okay, I *thought* it wasn’t a problem (at the time), but it turns out, it just made it “less” of a problem.

After watching the server closely for a month or two I think I finally figured out what the hell is going on. mysqld is not really efficient with it’s memory (I finally figured it out with vmmap). To the point that I start getting 10,000+ different memory segments for mysqld after about 24 hours of use. On a server that has many gigs of unused/free memory (with MySQL only using about 200MB), that’s pretty sucky memory management if you ask me.

That explains why it gets slower and slower over time and also explains why pretty much removing the query cache (memory) made it less of a problem.

I opened a bug report with MySQL about it, so *hopefully* we can finally get it taken care of and I don’t have to restart the mysqld daemon every 24-48 hours.

What Google Analytics Is Missing

Google Analytics lets you toggle between pie and bar chart, but it really seems logical to have a 3rd chart option that makes a line graph comparing the top 10 items from any report section. For example for web browsers you could see historically how the percentage of Firefox users goes up and Internet Explorer users goes down. This type of chart would be useful for virtually any existing report… Keyword Considerations, Browser Versions, Platform Versions, Referring Source, etc. WebTrends can generate multi-dimensional bar charts which yields the same results.

Basically it would be the “Data Over Time” chart that Google Analytics generates already, but for more than one item.

Here’s a couple examples to illustrate:

Google Broke Up With Me

It seems Google has broken up with me. Not only did they not do it in person, they didn’t call or even bother to email me to let me know they have moved on from our relationship. This blog (not all of, only this blog) has some sort of weird partial ban from the Google index. I noticed it after seeing this thread, but was reminded about it again from this thread in my forum.

Googlebot (not Mediapartners) still spiders this blog at the rate of 400-500 pages per day, but the spidered pages never make it to the Google search index. It makes me wonder if Google is looking for something specific to be “fixed” and not finding whatever it’s looking for.

Here’s what makes me think it has a partial ban…

I only have supplemental results in the Google index:

The supplemental results show snippets from January 23, 2005 and before, yet sometimes the cache shows more up to date content (sometimes not though):

No pages from 2005 are in the index (yet they are spidered daily).

This page has no PageRank (last I looked it was PageRank 6 in December, 2004).

I’ve contacted Google a couple times about it… usually I don’t get a response, but a couple times I got a generic response telling me to read their webmaster guidelines (which I have). Since they tell me to read their guidelines, I’m taking that as some sort of affirmation that I am banned. I know it wouldn’t be good for Google as a whole to give specific reasons about why a site/blog is banned, but in my case I honestly don’t have the faintest idea of what it could possibly be. So it makes whatever “it” is impossible to correct. The best guess I can come up with (which obviously isn’t the case, I’m just making a point that I have no idea) is that I switched from Blogger to WordPress for this blog.

Maybe I’m just getting old. Google has found someone prettier that they like better now that I turned 30. 🙁

Oh well… at least I have the fond memories of when we were in love and frolicked through the fields. 🙂


It looks like my mom might have the same problem:

Maybe it’s anything within with a tilde in the URL {shrug}

MySQL Problems On Mac OS X Server

For the last two months or so, I’ve been having a strange problem with my primary MySQL Server that required that the mysqld process (not the server itself) be restarted. The first image shows the CPU usage of the server, with the red arrows being the points I restarted the mysqld process (queries per second and types of queries do not change over time).

The server itself is a dual processor Xserve G5 with 5GB RAM, 1.2TB drive space, etc. so resources really are not an issue. Also, I should point out that no other services (like a web server) are running on the machine (it’s strictly MySQL only). Basically the longer the database server ran, the slower it would become (there are no “bad” queries anywhere either). After 24 hours it was slow enough that it needed to be restarted. I tried tweaking MySQL config options, throwing more memory at various aspects of MySQL.

While watching the processlist, I noticed something strange a few times. Something that should be blazingly fast (like an insert into a tiny HEAP table) was taking a VERY long time (sometimes 90 seconds), and not only that it was hanging all the query threads (even ones hitting unrelated databases). So then I had the idea to look at the query cache since that’s shared by all databases, and when you update a table, it needs to flush the queries from the cache for that table (which would explain why unrelated databases’ queries would hang while it was flushing the query cache for the HEAP table being updated). So to test this theory I rolled the query cache memory allocation from 256MB to 2MB. Low and behold… it worked! In my situation, the less memory you allocate to the query cache, the faster MySQL was. Then I remembered something that was added to Mac OS X 10.3 (I believe) and that’s the ulimit functionality. It’s actually a nice feature and can prevent a runaway process from taking down a machine. But in my case it was preventing MySQL (which really is the only process) from using very many resources. Still not sure why it would slowly get worse, but maybe it has to do with the more memory MySQL wanted to use over time, the more swap disk memory it was forced to use. Who knows… and to be honest, I don’t care, I’m just glad it works now.

Once I got that squared away, this is what my MySQL server CPU usage looks like:

Now, hopefully someone out there will find themselves in the same problem and this will be your solution, rather than spend months screwing with it (this ulimit stuff would apply to other BSD variants like FreeBSD, not just Mac OS X).

Run this from the shell (as root):

sysctl -w kern.maxfiles=122880
sysctl -w kern.maxfilesperproc=102400

This allows a single process (mysqld in my case) to have up to 102,400 files open at once at the kernel level (more than enough).

Add this to your /etc/sysstl.conf file (or create it if needed):


This makes the settings work when you reboot the machine.

Edit the /Library/StartupItems/MySQLCOM/MySQLCOM script that MySQL installs (it’s part of the automatic startup package), and somewhere near the beginning, add this:

ulimit -n 7000
ulimit -c unlimited
ulimit -d unlimited
ulimit -s 65536

You could put it somewhere else, but I chose this file since it doesn’t get overwritten when you upgrade MySQL. This lets the mysqld process use more than the default per user resources that Mac OS X Server allows.

You may want to adjust your settings as you see fit for your situation, but that’s what I used.

I’m just happy it’s no longer a problem for me!!!!! Yaaaayyy!!!!

MySQL Is F’ing Fast

I’ve been doing a little tuning with one of my MySQL database servers today, and it’s pretty amazing how fast it is. Just ran a benchmarking thing on it before I applied some optimizations (so this is with no tuning), and this is what it looks like:

Uptime: 747 Threads: 1 Questions: 1409737 Slow queries: 0 Opens: 41055 Flush tables: 1 Open tables: 197 Queries per second avg: 1887.198

Notice the queries per second. And even more impressive is this is for a single threaded benchmark app.

vBulletin 3.5

The Digital Point forum is now live with vBulletin 3.5 beta 1. I spent some time today and last night recoding all the little hacks I made on vB 3.0.x, and turning them into plug-ins for 3.5 (yay for the new plug-in architecture).

Along the way, I even found some bugs (some of which I fixed on my own). Hopefully it will get fixed in the distribution though so I don’t have to fix it every update. 🙂

Most of the changes are for admins and moderators (inline moderation, plug-in system, template history system with diff, etc.), but man it’s nice!

There are some cool things for end users too though, like AJAX stuff… you can double click a post to edit it “live”, pretty tricky. Also dumped vBulletin’s internal search engine for MySQL’s fulltext search engine. It also support persistent marking of read threads which is so much nicer.

Uhm… does anyone care? If you read this far, you have less of a life than me. heh

Mac OS X Server Wish List

I’ve spent a few weeks now with the new Xserves (I’m going to put them physically into the data center tomorrow BTW).

For the most part, Mac OS X Server 10.3 is a very nice server platform (especially when you are using Xserves). But maybe someone at Apple will read my little wish list, so here goes…

  • Make a system-wide shutdown script option. There is a bug in MySQL that prevent the mysqld process from shutting down with the normal kill function, so you end up with corrupt databases on a reboot or shutdown. This could be prevented with a shutdown script that shutdown the mysqld process with mysqladmin.
  • Actually USE the shutdown scripts in /Library/StartupItems/ (this could also solve the mysqld problem).
  • Server Monitor application should be able to monitor the health of the disk RAID when using hardware RAID (not just software RAID). I wrote a cron job that runs every 5 minutes to do this, but still… it should be standard I think.
  • Something should notify me about high loads on the server. I wrote a cron job that runs every 5 minutes to do this as well, but still… it should also be standard I think.
  • IP Failover is nice, but if the backup machine doesn’t give back the primary IP fast enough when it comes back online, you end up with NEITHER server utilizing that IP address. I run a script to do some clean-up before it relinquishes the IP address, which can take a little bit. The primary machine thinks another machine as the IP, and stops checking to see if it will give it up at some point. I solved this by making a script that executes after the IP is relinquished, that essentially shuts off the Ethernet port on the master, then turns it back on. This one could be catastrophic if you just assume it took back it’s IP address.
  • Give me an API to the Server Admin application. Would be nice to get the nifty graphs and stuff for MySQL transactions for example.

My Fake MySQL Cluster

Okay, so I have two new loaded (each has dual 2.3Ghz G5s, 8GB RAM, hardware RAID-5 w/ 1.2TB drive space, dual gigabit ethernet, etc.) Xserves thanks to those that donated, and I’ve been mucking around with them for a few days now.

One is going to be primarily a web server, and the other will be primarily a MySQL database server. Mac OS X Server makes IP fail-over really easy, so I setup rsync to keep a copy of the web site content on the MySQL server. If one server goes down (or maybe taken down for an upgrade), the other acquires it’s IP address and can continue to serve the requests. The tricky part is the MySQL database and keeping it in sync between the two machines in realtime.

The “ideal” option would be MySQL Cluster, but it requires your databases to be 100% memory resident, and that just didn’t appeal to me (too much memory and scary if someone decided to pull the plug on all the servers). Although I do know that MySQL is working on a disk based option for MySQL Cluster as well as asynchronous replication (that may turn out to be the same thing).

So instead what I did was setup both servers to be masters and both servers to be slaves to each other. This was in fact VERY easy. Add this to your /etc/my.cnf file on the servers (be sure to create a replication slave user named replicate):

Server 1
server-id = 1
master-host =
master-user = replicate
Server 2
server-id = 2
master-host =
master-user = replicate

So now I can send all SQL queries to the primary database server’s IP address. If that machine is down, the other server acquires it’s IP address. Once the server comes back online, it will automatically get back in sync.

The thing I didn’t like is a double master setup does not always execute the SQL statement in the same order, so there is a (slight) chance of the databases getting out of sync when the server comes back online and there is a lot of catching up to do. So what I did was setup a simple script to check every 5 seconds to see if the servers are in sync. Once they are in sync again, then it will relinquish the public IP address back to the rightful server (the MySQL servers talk on the private 10.0.0.x network which does not change).

My setup is for redundancy/fault tolerance, not load balancing. Although you could send SELECT statements to the slave db server without any problems (since there is nothing to replicate from that).

This will work for now, until MySQL finishes their native async replication. 🙂

9 + 15% = 35?

Apple put out a minor speed-bump on their G5 Xserve machines (they went from 2.0 Ghz to 2.3 Ghz and nothing else in the system or architecture changed).

The dual 2.0 Xserve could yield 9 gigaflops of raw processing power (which is really fast to begin with). The dual 2.3 Ghz Xserve yields 35 gigaflops of power.

Can someone please explain to me how a 15% increase in CPU speed gives you 4 times more processing power? Seems a little odd if you ask me.