So it’s that time of the decade again, when everyone gets their hands dirty and start preparing for the arrival of a new supercomputer.
She has a name
The new supercomputer is named after the first Norwegian woman with a Ph.D in mathematics. Mary Ann Elizabeth (Betzy). She was popular and academically strong woman. Read the whole story about Betzy (in Norwegian). The employees at the Norwegian Metacenter were the ones that decided on her name.
Preparations before the new machine arrive
You might imagine that it is a relatively easy task to replace one supercomputer with a new one. After all, the infrastructure for serving the one we already have (Vilje) is working quite good.
We have cooling, we have power, we have the machine room ready to roll out the old equipment and connect the new one. Unfortunately this is not the case. The preparations for the previous machine, our beloved Vilje, started nine years ago, and in that timeframe a lot of things have changed when it comes to technology. We need to start from scratch.
Two computers at the same time
At the same time that we prepared for Betzy to come, we got a request from Sigma2: Will you be able to keep Vilje running during the installation phase of the new supercomputer? A tricky question, as it means we need to facilitate two supercomputers at the same time. Our current supercomputer is running without support, which means if it goes down, deliberate or not, it could be difficult to power it on again. If someone said that VIlje is being kept together with chewing gum, gaffer tape and affectionate love, it would not be far from the truth.
All kinds of questions needed answers, and problems demand solutions:
Do we have enough power for two machines? No, we need to install additional transformer to facilitate at least one megawatt extra.
Do we have enough cooling? No, we need to expand our current facility and do some serious plumbing, tearing down walls etc.
How about datacenter space? No,we need to find another room since we are going to have Vilje running during the install phase
Refurbishment of the room
Everything has to happen in the right order. We have found a machine room. We chose the room where we already have lots of Sigma2 equipment and other technical stuff.
This means that we need to remove our own equipment and the Idun linux/GPU cluster. But we cannot empty the room before our autumn courses are done. At the same time we cannot stop building the cooling and power infrastructure, so we end up doing all things at the same time. Building pipes, installing power cables, tearing down the ceiling, while still operating infrastructure in the room without interruption. This is where we are right now: A machine room which looks more like a construction site struck by a tornado, than a clean data center for supercomputing.
Right now it seems like chaos, and some of us are starting to wonder, how it can possibly come together. Then you remember that we have been here before, eight years ago, sixteen years ago and so on, and every time all the pieces have been put in the right spot at the right time, thanks to dedicated people who really care about the end result. It’s a kind of magic.
This blog entry was written by senior engineer Einar Næss Jensen, at NTNU
-The IT Department