Every silver lining has a cloud

“What’s that, there?” I ask Philippe, pointing to one of the black boxes, its LEDs flashing in some unholy semaphore.

“That’s a bit of iTunes,” he says.

Computer rooms today have come a long way from the Stonehenge-like mainframe environments of the Sixties and Seventies. Enter a data centre today and you will see rack upon rack of black frames, measured in U's - a U is one inch, the space between each bolt hole on the frame. False floors and rising columns contain kilometres of copper and fibre optic wiring; air conditioning ducts run in every direction, their efforts resulting in a constant, all-pervading noise level.

Heating engineers are the masters here, designers of alternating warm/cold corridor set-ups that keep the ambient temperature at 16 degrees. As illustrated by the amount of power drawn by data centre environments, power is the big bottleneck. Some have been looking into ways of dealing with this, not least the idea of using the heat generated by computers to heat houses, an idea being developed by German startup¹ Cloud and Heat. In London and New York, data centres are situated in the basements of tower blocks, their heat routed to upstairs offices.

There’s hardly a person to be seen: such environments, once built, need little manual intervention. Access is restricted to a few deeply technical, security-vetted staff, whose entry to the facility — with hand and retina scanners, scanners, glass-walled acclimatisation chambers, lightless lobbies and serious-looking security personnel — looks like it is modelled on some evil megalomaniac’s underground headquarters. In reality the kingpins of these empires are a bit more mundane. The companies running many such environments, such as colocation company Equinix, often have no idea what is running in their facilities, while their tenants, people like Philippe, the young, smiling, bespectacled French CEO of BSO Network Solutions, look more like they have just walked out of accountancy school than harbouring any big ideas to take over the planet.

The black box containing “a bit of iTunes” could just have easily been running the web site for the BBC, or Marks and Spencer's, or some computer running on behalf of an unnamed customer. Tomorrow, the same computer server may be allocated to a completely different set of tasks, or even a different client. What makes it all possible is one of the oldest tricks in the computer engineer’s book. It’s worth explaining this as it helps everything else make sense.

When we think about computers, we tend to consider them in terms of a single ‘stack’ of hardware and software: at the bottom are the computer chips and electronics; on top of this runs an operating system (such as UNIX or Microsoft Windows). On top of that we have our software — databases and sales management software, spreadsheets, word processors and so on. So far so good — but in practice, things are done a little more cleverly. Inside the software that makes computers tick, an imaginary world is constructed which allows for far more flexibility in how the computer is used.

The reasoning behind creating this imaginary world came relatively early in the history of computing. The original computers that emerged post-war were geared up to run ‘batch’ jobs — literally, batches of punched cards were inserted into a reader, the program was run and the results generated. The problem was that only one job could be submitted at once, leaving a queue of frustrated people waiting to have their programs carried out. One can only imagine the frustration should there be a bug in a program, as a failure meant going to the back of the queue!

After a decade or so, computer technicians were starting to consider how to resolve this issue. Two solutions were proposed: the first in 1957 by IBM engineer Bob Berner², who suggested that time could be shared between different programs, with each being switched in and out of memory in a way that multiple programs could appear to be running simultaneously. A few years later and also at IBM came a different idea: recalled³ systems programmer Jim Rymarczyk, how about pretending a single mainframe was actually multiple computers, each running its own operating system and programs?

The two models — time sharing and virtualisation — continued in parallel for several more decades, with the former being used on smaller computers in order to make best use of limited resources, and the latter being preferred for mainframes such that their massive power could be divided across smaller jobs. As computers became more powerful across the board, by the turn of the millennium both models started to appear on all kinds of computer. Fast forward to the present day and it is possible to run a ‘virtual machine’ on a mobile phone, which will already be making best use of time sharing.

While this may not appear to be much of a big deal to anyone outside of computing, it has had a profound impact. If what we consider to be a ‘computer’ exists only in software, then it can not only be switched on and off at will, but it can also be moved from one real computer to another. If the physical computer breaks, the virtual ‘instance’ can be magically restarted on a different one. Or it can be copied tens, or hundreds of times to create an array of imaginary computers. Or it can be backed up — if anything goes wrong, for example if it gets hacked, then it becomes genuinely simple to restore. Prior to virtualisation, a constant frustration for data centre managers was ‘utilisation’ — that is, owning a mass of expensive computer kit and only using half of it. With crafty use of virtual machines, a larger number of computer jobs can run on a smaller amount of hardware, dramatically increasing utilisation.

Virtualisation removed another distinct bottleneck to how computers were being used. Back in the Sixties and Seventies, computers were far too expensive for many organisations, who tended to make use of computer bureaus to run their occasional programs — one of the reasons why time sharing was seen as such a boon, was to enable such companies to run more efficiently. As computers became more generally affordable, companies started to buy their own and run most, if not all of their software ‘in-house’ — a model which pervaded until the advent of the World Wide Web, in the mid-Nineties. Back in the day, Web pages were largely static in that they presented pre-formatted text, graphics and other ‘content’. Quite quickly however, organisations started to realise they could do more with their web sites — sell things, for example. And others worked out they could offer computer software services, such as sales tools, which could be accessed via the Internet. Some of the earliest adopters of this model were existing dial-up data and resource library providers — all they had to do was take their existing models and make them accessible via the Web. Other, startup companies followed suit — such as Salesforce.com, which quickly found it was on to something.

By 2001 a large number of so-called ‘application service providers’ (ASPs) existed. A major issue with the model was that of scale: a wannabe ASP had to either buy its own hardware, or rent it — it needed to have a pretty good idea of what the take-up was going to be, or face one of two major headaches: either it could realise it had over-estimated demand and be stuck with a huge ongoing bill for computer equipment, or it could have under-estimated and be unable to keep up with the number of sign-ups to the service. While the former would be highly discomfiting, the latter could spell disaster. E-commerce companies, such as rapidly growing online bookseller Amazon, were struggling with the same dilemma of resource management.

For a number of very sensible engineering reasons, Amazon and others were reliant on ‘lots of smaller computers’ rather than the ’small number of big computers’ model. Racks of identical, Intel-architecture servers known as blades were being installed as fast as they could, with resource management software doing its very best to shift processing jobs around to make best use of the hardware. Such software could only take things so far — until, that is, such servers finally become powerful enough for virtualisation to became a viable option. Virtualisation unlocked the power of computer servers, enabling them to be allocated in a far more flexible and responsive fashion than before. As a result, less hardware was needed, bringing costs down considerably.

You might think the story ends there, but in fact this was just the beginning. The real breakthrough came in 2002, when the engineers running Amazon’s south African operation realised that the company’s computers could also host virtual machines belonging to other people. With virtualisation, the model simply became that customers paid for the CPU cycles that they actually used. Almost overnight, the dilemma of ‘how much computer resource should I buy’ was removed from every organisation that ever wanted to build a web site, or indeed, run any kind of program at all. Based on the fact that the Internet is often represented as a cloud on corporate presentations, the industry called this model ‘cloud computing’.

Today we are seeing companies start from scratch with very little equipment, due to the pay-as-you-go model which now extends across most of what any company might need. One such business, Netflix, has sent shock waves around the media industry; remarkably however, the company only has 200 employees. How can this be? Because it is almost entirely running on Amazon’s network of computers — fascinatingly, in direct competition with Amazon’s own LoveFilm hosted film rental services. On the back of such increasingly powerful capabilities come the kinds of online services we are all using day to day – massively parallel search (Google), globally instant microblogging (Twitter), social networking (Facebook), customer databases (Salesforce) and so on. While Twitter’s interface might be simple for example, the distributed compute models making possible the parallel, real-time search of petabytes of data by millions of people are nothing short of staggering.

One area that the cloud shows huge promise is in the use of computers for research. Research funding doesn't scale very well - in academia, one of the main functions of professors and department heads is to bid for pockets of finance. Meanwhile, even for the largest companies, the days of unfettered research into whatever takes a scientist or engineer's fancy are long since over.

Much research has a software element - from aerodynamic experiments on Formula 1 rear wings that we look at in the next section, to protein folding and exhaustive antibody comparisons, there's no substitute for dedicating a few dozen servers to the job. Such tasks sometimes fall into the domain of High-Performance Computing but at other times simply having access to hardware resources is enough - as long as the price is right.

For a researcher, the idea of asking for twenty servers, correctly configured, would have been a problem in itself: no budget, no dice. Even if the money was available however, the kit would have to be correctly specified, sometimes without full knowledge of whether it would be enough. Consider the trade-off between number versus size of processors, coupled with quantity of RAM: it would be too easy to find out, in hindsight, that a smaller number of more powerful boxes would have been more appropriate.

Then come the logistical challenges. Lead times are always a challenge: even if (and this is a big 'if') central procurement is operating a tight ship, the job of speccing, gaining authorisation and checking the necessary contractual boxes can take weeks. At which point a purchase order is raised and passed to a supplier, who can take several more weeks to fulfil the order. It is not unknown for new versions of hardware, chipsets and so on to be released in the meantime, returning the whole thing to the drawing board.

Any alternative to this expensive, drawn-out yet unavoidable process would be attractive. The fact that a number of virtual servers can be allocated, configured and booted up in a matter of minutes (using Amazon Web Services’ Elastic Compute Cloud (EC2⁴), say) can still excite, even though the model, and indeed the service, has existed for a few years. Even better, if the specification proves to be wrong, the whole lot can be taken down and replaced by another set of servers - one can only imagine the political and bureaucratic ramifications of doing the same in the physical world.

The absolute cherry on the top is the relative cost. As one CTO of a pharmaceutical company said to me, “And here's the punchline: the whole lot, to run the process, get the answers and move on - cost $87. Eighty seven dollars,” he said, shaking his head as though he still couldn't believe it. Unsurprising that a virtuous relationship is evolving between use of cloud resources and the increasingly collaborative nature of research, spawning shared facilities and tools such as those of the Galaxy project and Arch2POCM respectively.

Equally, it becomes harder to justify cases where the cloud is not involved. For example BBC’s Digital Media Initiative was to create a digital archive for raw media footage – audio and video – so that it could be accessed directly from the desktops of editors and production staff. This was planned save up to 2.5 percent of production costs saving millions of pounds per year. In practice, the “ambitious” project got out of control. It was originally outsourced to Siemens in 2008 but was brought back in house in 2010. Two years later, in October 2012 the BBC Trust halted the project and kicked off an internal review. And subsequently, the corporation’s Director General, Tony Hall, canned the whole thing⁵. In the event, the project cost £98.4 million over the period 2010 – 2012.

Would it be too trite to ask whether things would be different if the Beeb could have benefited from cloud computing, wouldn’t it? Surely an unfair question, given that five years ago cloud models were still in their relative infancy? While big-budget IT projects may still have been the default course in 2008, by the time the project first hit the ropes in 2010 the potential of the cloud was much clearer. The core features of cloud would seem to be tailor-made for the needs of broadcast media management – as NBC found when it streamed 70 live feeds of Olympic footage for direct editing by staff based in New York. Meanwhile, at the time of Margaret Thatcher’s funeral, the Beeb was forced to transfer videotapes by hand using that well-known high speed network – the London Underground.

The overall consequence of such experiences is that the ‘cloud’ has grown enormously, becoming a de facto way of buying computer resource. Processing has by and large become a commodity or a utility; indeed, if the number of computer servers in the world were a nation, it would be the fifth largest user of electricity on the planet. Cloud computing also has huge potential implications for emerging economies, for whom technology investment is a challenge. However it is not some magic bullet, and the network is generally still seen as a blocker to progress. The concept behind Data Gravity ⁶- that the money is where the data happens to be — is predicated on the fact that large quantities of data are so difficult to move.

All the same it should be seen as a major factor in giving the world a platform of massively scalable technology resources which can be used for, well, anything. In 2011 Cycle Computing's Jason Stowe announced the Big Science Challenge, offering 8 hours of CPU time on a 30-thousand core cluster to whoever could come up with a big question. “We want the runts, the misfits, the crazy ideas that are normally too big or too expensive to ask, but might, just might, help humanity,” says the release. “No idea is too big or crazy.” Which is a suitable epithet upon which the future of computing will be built.