Green IT made easy

Talks about “green IT” are everywhere, no IT magazine without an article, no fair without new products, and advertisement all over the place. But I bet there are quite some small and medium-sized business around the world that ask themselves: “How are we to participate in this move?” Typically being the more responsible entrepreneurs, I’m sure they don’t put profit first, but cannot afford (or at least economically deploy) all those new & nifty water-cooled racks or set up “separated hot and cool zones in your data center”. Mostly, because there simply is no data center.

We were facing such a situation at a small office with a single-rack computing room. The office is in a zone of moderate climate (around -5 to 5 °C in the winter time with peaks down to -15 °C in tough years, and 20 to 30 °C in the summer time, despite some heat waves), the computing room is facing north (no sunshine heating things up), and a direct outside window front. The servers in the rack currently draw around 3,2 kW when under load.

In the first years, running with an open rack and fresh outside air was sufficient. But with time there came an increase in computing power and more servers in the rack. During an especially hot period, a typical mobile air conditioning was set up (you know the type, everything in a single unit and a nozzle to direct the hot air directly out of the window), but even with a permanently installed air outlet, that solution was soon coming to it’s limits.

Split air conditioning was out of the question (too much noise in the night, disturbing all the neighbors in the mixed office/living quarters), so it was time for some new ideas. I’d like to share with you what was created – you cannot only cut your A/C energy costs by two, but do so with spending only some hundred Euros!

First and most important was the realization that in that computing room, only the servers need cooling. Not all of the room needs to be cooled, not the backside of the servers, nothing else – cool air only needs to go to the front air intake of the servers to get them cooled.

It was helpful that the servers are in a closed 19″ rack – that’s different from many computing centers, where the rack fronts and backs are mostly open to allow free ambient air flow.

Secondly, it was decided that the servers will have to run with an intake air temperature of up to 27 °C – but you can vary that number to your liking (at least downward the scale… it’d be “depending on your willingness to take extra risks” when increasing that value). No special attention was to be paid to humidity – the servers ran with unconditioned exterior air for years without noticeable bad influence, so there was no motivation to spend money on that now.

Thirdly, the influence of television series of the 80es took its toll: Rather than spending big bucks on an industrial-grade super-duper custom-made air conditioning, MacGyvering a solution on a limited budget was requested. When it’s cold enough outside, take that air to cool the servers, and mix in some A/C-cooled-down air when necessary.

Key to success was the strict separation of hot & cool air zones inside the 19″ rack: the front of the rack, where the servers have their air intakes, was mostly sealed off from the area behind the front plates, making the front the “cool zone” and all of the back the “hot zone”. Fresh air is fed directly to the front area through an air pipe installed in the base socket of the rack, sucked in by the servers’ fans and pushed out to the back of the servers as warm (or even hot) air. That warm air (with typically 30 to 33 °C, but I`ve seen peaks up to 40 °C) is picked up by fans in the rack dome, partially transported out of the room by an extra fan and some air pipes, partially fed into the office rooms for heating during cold winter nights.

This hot/cold separation inside the rack alone took down energy costs by 25 %, as the air conditioning had much less running time than before, when cold air was simply fed into the rack (as done in typical data center installations with raised floors). It’s amazing: After some server works, a single height unit between two servers was left open – and the temperature at the servers’ intakes immediately went up several degrees, just because the warm air could (and would) flow back to the front side. Looking at it from the theoretical side, it’s all about what volume of air you need to cool down: When everything is open, you’ll have to add much more cold air to cool the resulting mix down to the desired intake temperature. When you separate properly, only the small volume between the server front and the front rack door needs to be cooled down. This makes a huge difference. (And you don’t need a coat to work in the computer room 😀 )

The other cost decrease occurred when a controlled fresh ambient air intake was added to the picture: An industrial-quality fan was added (the same build as the extra fan to transport hot air out of the building) to transport air from outside the building to the server rack, and both fans were hooked up to a custom-build, thermal-controlled fan control (that is also capable of switching the A/C on and off). This way, rather than only cooling down (warm) room air via the A/C, outside air is used to cool the servers and the A/C only kicks in when that air isn’t cold enough. About two thirds of the year the outside air is cool enough to reliably cool with it alone. If, in the winter time, the exterior air gets really cold, then the intake fan reduces its speed so the servers won’t get cooled down too much (and probably fetch condensed water or suffer from too wide temperature changes). If the outside air gets too warm to help cooling down (approx. 5 °C below the desired intake temperature), the fan is turned off so that the A/C doesn’t have to fight that, too.

So the complete solution consists of a regulated fresh air ventilation plus an A/C kicking in when things get too warm, fed into a climate-separated closed 19″ rack. All ventilation parts are industrial-grade geared towards low noise, the A/C is a standard mobile indoor unit, duct-taped to direct the air flow to the rack front and it’s temperature sensor relocated to the rack front (rather than sensing room temperature).

Both fans, noise-reducing air pipes, some installation material and the “mobile A/C” sum up to about 600 Euros. Add another 800 Euros for the custom control and you still are several times below the costs you’d face when you’d try to solve these requirements with standard components from typical data center suppliers. And unbelievably, the savings on energy required to drive the A/C more than compensated these costs, already in the first 12 months. Now that’s what I call “SMB-friendly green IT”.

Posted in Uncategorized | Leave a comment

Buying by brand name doesn’t save you from cheap products…

We’re working on customer premises quite regularly, carrying lots of documents and code with us on “portable media”. Of course all that is encrypted and we’ve set things up to decrypt & automount those file systems when logging on to the Linux hosts on-site.

Recently, we’ve switched from “USB sticks” to SSDs – 1.8 inch drives are small enough to carry around, and Delock has a nice 1.8″ USB 3.0 external disk case with a pouch to go with our SSDs. Hopefully the SSDs will prove to be less limiting and more reliable than the thumb drives – first tests have been quite promising, as was to be expected considering the much higher costs.

But when we ordered new units to compliment our initial test device, we noticed something that we so far had only suffered from when buying rather cheap USB sticks: All units resolved to the same, udev-generated entry in /etc/disk/by-id!

Quick tests showed that udev uses the type information of the disk, but the serial number as reported at the link layer – thus it’s the serial number of the USB enclosure, not that of the disk.

Of course, we’ve contacted Delock’s support staff – and were told that “the devices are of course using cheapo USB chipsets” and that they see no possible change to the behavior. They’d report it to their development, but wouldn’t want us to hold our breath. Of course we won’t, there’s no use in turning blue due to suffocation, is there?

Update: We’ve received another response from DeLock, in form of a tool to update the serial number reported by the disk enclosure! Once we’ve completed our tests, I’ll report the results.
Update 2: While we haven’t been able to find out if we may re-distribute the tool, I can happily confirm that the tool does it’s job – we’ve been able to change the serial number of the enclosure, making each and everyone unique. The program is for “MS Windows” only (not Linux), and is much more than a simple “change the serial number” tool. One of the effects is that the resulting serial number will be one higher than the start value you enter.

(By the way, getting in contact with Delock wasn’t as easy as I made it sound: The message sent via feedback form on their site got no timely response and calling the phone number resulted in some “unavailable” message. But our admin found a number to call that at last got him in contact with their support staff.)

Why is this important to us anyhow? The technical background is quite easy: Every “traveling developer” has its own personal disk. The encrypted partition is presented to the rest of the system as a (usually uniquely named) /dev/disk/by-id/…-partX device, so that pam_mount can pick that up (per per-developer configuration) and use an encryption key from some other source to decrypt that specific partition.

At many sites, only a single developer is logging on to a development machine at any point in time. No problem there.

But at the more sophisticated installations, the USB devices are simply handed on to a central development super-host, where *all* developers log in concurrently. Without changing the distribution’s udev ruleset, all encrypted partitions would resolve to a single, common id (constructed out of the SSD device description, the USB enclosure’s serial number and the partition name, like “/dev/disk/by-id/usb-TS32GSSD_18S-M_0000000000000033-0:0-part7”), making it impossible for pam_mount to decrypt different devices for different developers. Bummers.

How to handle the situation? We cannot return the devices, so we’ll have to stick with them. We’ll have to see if we can create a logistical situation (handing out these disks to only those developers not working on a common develop server) or can influence all target admins to change the udev rules.

If someone from Delock stumbles over this message: Hey, you have done good in getting a positive reputation, even in the professional market. Don’t spoil it easily… those few cents you saved on manufacturing such devices may cost you big money when losing customers!

See the update a some paragraphs above: We’ve received another response from DeLock, in form of a tool to update the serial number reported by the disk enclosure! My trust in DeLock is re-established!

Posted in Uncategorized | Leave a comment

Linux Pacemaker, dependant resources and live migration

It’s been a while and I’ve been thinking about new topics for our blog… it’s not that we wouldn’t have enough to lament about, but that’s more for a cool beer and some chit-chat, rather than to annoy you.

Nevertheless, a subject resurfaced the last days, which had me wondering and doubting my skills quite some time: Running a Linux Pacemaker cluster to create higher availability for our Xen VMs, I’d like to combine two basic features:

  1. Create resource dependencies between VMs, and
  2. use “live migration” of VMs.

“1.” is really helpful if you separate services between multiple VMs, like running MySQL in one VM and the database-using applications on separate VMs. We have a complete infrastructure of VMs running in our cluster, with user machines, development servers, compiler platforms, multi-machine test beds… not to forget the virtual base infrastructure like databases, DNS, DHCP, LDAP, proxies, code repositories, issue tracking and so on.

“2.” is the way to go if you want to send a Xen machine (Dom0) into maintenance mode, without interrupting the services. This is “standard practice” and supported by Xen, once you’ve enabled it, and Pacemaker claims to support this, too. (It’s noteworthy that from our experience, live migration of Xen DomUs is indeed not for the faint of heart: For one, the actual migration may take up to a couple of minutes. And even if completed, not all services inside the VM may have survived the move – MySQL is actually one of the candidates that seems particularly picky. But that’s a separate topic for a separate entry.)

Since all this is said to be supported, why am I taking your time with stating the well-known? Because the combination of these two features won’t work the way you may have expected.

Take the following scenario:

  • two Xen servers (xen1 and xen2), both running Pacemaker and joined in a cluster
  • two VMs (domU1 and domU2) up & running, i.e. domU1 on xen1 and domU2 on xen2
  • both VMs defined as cluster resources with live migration enabled
  • domU1 defined as required by domU2 (technically speaking, an order constraint like <rsc_order first=”domU1″ id=”someid” kind=”Mandatory” symmetrical=”true” then=”domU2″/>

Now let’s assume we want to update something on our cluster node and want to do things manually, for the sake of demonstration. We’ll start with xen2 by migrating domU2 to server xen1, take xen2 offline and do our thing. Once xen2 is back online, we migrate domU2 back to xen2, then migrate domU1 to xen2 and take xen1 offline. Last step would be to migrate domU1 back to xen1. Or so one might think.

Live migrating domU2 is no problem at all, everything works as expected, so we can update xen2, bring it back online and restore the initial situation. But when migrating domU1, things go awkward (making me at first believe to have goofed somehow setting xend up or configuring the cluster): Instead of just invoking xend in order to migrate the DomU to the other node, Pacemaker will first stop domU2! And once it’s stopped, domU1 will be live migrated and domU2 is brought back alive.

This is completely against the intended design of our cluster… after all, reason for the live migration is the uninterrupted operation of all resources. So why this contra-productive behaviour? At first I believed this to be a bug in Pacemaker and hoped for fixes in upcoming releases, but the I found a thread from the beginning of this year. In it’s conclusion, David Vossel points out that a more or less complete re-engineering of the corresponding part of Pacemaker is required to fix this… something that for sure isn’t going to happen very soon.

So for those of you in the same pit as we are: There’s an open enhancement request on clusterlab’s Bugzilla where you can follow the progress on this subject. Or, since no progress was recorded after initial analysis in April this year, you’d better somehow voice your opinion…

Posted in Pacemaker, Xen | Leave a comment