HP has just announced its forthcoming Gen 8 servers. Rather than go on about the usual stuff: CPUs, I/O bandwidth etc., or even the trend to put solid state storage off the PCIx bus, what's interesting to me is this: the servers are explicitly designed to be part of a larger system, a datacentre.
The existing products, they are individual servers you just happen to put into racks, you just happen to hook up to a switch you've stuck at the top of the same rack. The racks may be set up into hot rack/cold rack, but that's mostly a deployment detail the servers don't care about, except in ensuring airflow is good.
This has now changed.
The Datacenter as a computer argued that the software developers need to recognise that a datacentre is the new execution platform, one with mixed availability, limited bandwidth and other concerns that could be ignored before -or at least treated as the special case of "distributed systems", rather than what we have now: "systems". Everything is distributed.
These hardware changes mirror that. Here are some of the new concerns for both the ops team -and the applications themselves.
- Re-integration of Storage and Computation.
- Availability though replication. Less RAID-style hardware, more
replication across machines. - Inventory tracking -especially for identifying failure points, such as monitoring the history of specific batches of disks. If some appear particularly unreliable, you want to find all of them.
- Networking: 10 GbE is still a luxury, bonded 2x1 GbE good for availability too. Understanding network failures in data centers shows why ToR switches become the dominant network failure point in a cluster -and from a re-replication perspective, that's not ideal.
- Power management. Beyond just PUE, the metric of datacentre overhead, power consumption in the servers is a big concern.
Inventory They work out from the rack (don't ask me how, I don't know these things) where they are on it -information that can be propagated to the management tools so that they can be used for inventory tracking.
Networking Lots of ethernet ports. Some slow and inexpensive for management, faster ones for the application.
Power. This work here is something you can point to Chrandrakant Patel in HP Labs for. If you look at his published work, you can see a lot of it is about airflow and cooling in a datacentre. If you can improve that -as the container hosted datacentre pods can do- then your PUE is better. Why instrument the inside of the servers? It ensures that you can keep the hardware within its limits, because you have a better idea of what is going on inside. Every extra degree F, C or K you can take the air up, lets you save a lot of money over time. Yet the risk of overheating -and the cost of doing so- makes this dangerous. Knowing what is happening inside the servers give you more confidence of what's happening.
This is what the new servers enable. Which means that we are going from servers that you stack to servers that are designed to locate themselves in the racks, ideally hosted within a datacentre container that is optimised for airflow and designed to work as close to the limits of temperature as is considered safe based on the information coming out of the servers themselves.
Which is very close to what a laptop does: a box with optimised airflow and fans that come on when they feel it is important, and with a power budget that the system is designed to optimise. The datacentre is the new laptop, at least from a power and cooling perspective.
Now, what about the software? If the datacentre-level application infrastructure can get at the power, topology and network information, it could adapt itself better.
The topology information that the servers can determine could be used to dynamically generate the topology map for the cluster. It is entirely co-incidental that I'm typing this while my new topology patches are being tested in a adjacent console, but those changes (better support for topology sources other than the script runner, ability to dump the current topology) are effectively a precursor. I wouldn't do some fancy integrated java module though -better to have a topology source that just reads a java properties file and by polling for changes, can react to moving topologies. Let the management tooling generate that and it would propagate into HDFS and the RM/MR layer.
Power? If overheating is a problem, that server can be clocked back, which makes it slower. It may be better to actually tell the resource manager that there are less slots on that box, so reducing its actual workload. This could ensure that the work running in the remaining slots doesn't take longer than normal to complete.
Networking? We really need a way to get more information about the network backplane into the application -including the amount of bandwidth currently allocated to applications. Bandwidth can be a precious resource, but right now there is better tooling to manage it in a bittorrent client than there is between applications in a datacentre.
This is a challenge and an opportunity. A challenge: this information needs to be extracted and forwarded to the applications -which then need to act on it. An opportunity -it will make the applications and datacentres work better. Wave goodbye to writing topology scripts that don't work, say hello to being able to move servers around and have them the application infrastructure work out where they are. Worry less about uncontrolled backbone bandwidth use in a shared datacentre; have some policy tooling to manage it across applications. As for power, hope to see the electricity bills decrease.
[Artwork: Sepr on Jamaica Street, Stokes Croft]