Seagate is still not producing the hard drives I want for new servers in sufficient quantity, but they do seem to be actually producing them now. I’ve seen places where I can get 30 of them lately… but I need 72.
I’m assuming I’ll be able to get them relatively soon and have started to think about how I want to lay the servers out architecturally.
As far as hardware goes, gigabit ethernet just really isn’t that fast these days… Real world use has being able to pass about 80MB/sec (theoretical maximum is about 110MB/sec). When you have a cluster of servers passing massive amounts of data amongst themselves, 80MB/sec just doesn’t cut it. Which is why the servers have Infinband QDR (40Gbit interconnects), so they should be able to pass 3.5GB/sec (ish) between themselves.
For software/services, I’m thinking maybe 12 identical servers that are more or less service agnostic… with each server being a web server, MySQL Cluster data node and a MySQL Cluster SQL node. Then each server could probably be setup to handle 10,000 concurrent HTTP connections as well as 10,000 DB connections (1 per HTTP connection). With 12 servers, you could have the capacity of 120k concurrent web connections, 120k DB connections capable of doing millions of SQL queries/sec. If you need more capacity with anything, you could just bring online additional agnostic servers.
This of course is just in theory… who knows how it will test out in actual use, but it does seem like something worth testing at least.
Why don’t you just outsource all this, it isn’t 2003 anymore!
Because if you want something done right, you need to do it yourself, regardless of year. 🙂
Lies, you love all this! It does sound like fun though.
Why not use some sort of DB connection pooling? A new DB connection per HTTP connection seems like an unnecessary overkill unless your database digests the authentication of the HTTP connection.
Well of course there would be DB connection pooling. But 90% of the HTTP requests *also* require a DB connection, so…
The only stuff that doesn’t require DB connections are static content (images, JS, CSS, etc.) and those are cached in the browser, so subsequent requests won’t be made anyway (not even to check if it’s been modified).
Really? Most of my programming is done with .Net and SQL server. Each web application uses one connection to the database. All the different sessions are funneled through one DB connection. I’m not sure if there is a corresponding mechanism in the PHP world.
Yeah there is… it has some drawbacks though. Like what happens if something goes wrong with the one connection everything is sharing?
Also you have an issue if you ever need to run slow queries where those slow queries can eat up some of the query threads while they are running.
I am totally going to come off as “That Guy” by continuing the conversation. I’m sure that you’re doing it the most efficient way possible and my “Mental Model” must be getting in the way of comprehending.
With ADO.net, as long as you specify the same connection string, it will attempt to reuse a connection. 99.9% of the time it does this. However, if another is using the connection it does create a new one. “Closing” a connection really just leaves it in a state to be reused.
The equivenent in PHP (I’m assuming you’re using PHP) would be to use mysql_pconnect I believe. http://php.net/manual/en/function.mysql-pconnect.php
The sensible guy in me knows you are doing everything right… but the “Cliff Clavin” in me can’t let the topic go.
Yep… for the connections being used, they are of course persistent connections (exactly as you described).
My whole point originally was that I want the servers to be able to support the same number of persistent connections (if necessary) as concurrent web requests since the vast majority of the web requests need database access. There is no point in being able to have 10,000 concurrent web requests being served if you can only have 100 database connections.
And yes, always try to do things as efficiently as possible. For example there are a couple backend routines that read from the database occasionally and then process some stuff for 10 seconds and then write to the DB. For those cases where we know there’s going to be a huge gap between queries, we actually release the persistent connection for another process to use and then reacquire it when we actually need it.