Opening the barn doors - open data, commodity code

At 4.53PM on Tuesday, 12 January 2010, an earthquake measuring 7.0 on the Richter scale struck Port-au-Prince, the capital of Haiti. The event was a low blow — the country was already struggling1 with the effects of abject poverty, political unrest and a series of natural disasters. But nothing prepared the islanders for the effect of the earthquake. Thousands of buildings collapsed almost immediately2, roads were damaged beyond repair and electrical power was completely lost, leaving the city and surrounding area vulnerable to the encroaching darkness. Many thousands died and still more were buried; survivors dug at the rubble with their bare hands, in often-vain attempts to free these still trapped. Initial estimates of 50,000 dead continued to rise.

The world community watched from afar as sporadic reports started to reach them, checking phones and social media for updates from loved ones. As so often, many people including a diaspora of Haitians sat paralysed, in the knowledge that they could do little other than check news feeds and send money to aid agencies. One group in particular realised it didn’t have to sit on its hands in horror, however. The day after the earthquake, members of the Open Street Map online community turned their conversations3 from talk about GPS issues, cycleways and multi-polygon tags, towards how their expertise might help those dealing with the earthquake’s aftermath — the truth was, maps about Haiti had never been that good. “You have likely heard about the massive quake that has hit Haiti. OpenStreetMap can contribute map data to help the response,” wrote software developer 4Mikel Maron. Replied geologist Simone Gadenz, “Dear all OSMappers interested in the Haiti_EQ_response initiative. Is there any coordination?” Mike responded by instigating a conversation on the group’s Internet Relay Chat (IRC) page.

Over the days that followed, the OSM community got to work, building an increasingly clear picture of the devastation and its consequences. In the three days that followed, hundreds of people made some5 800 changes to the mapping data, initially gleaning data from Yahoo! imagery and old CIA maps, and then from newly taken, higher resolution aerial photos which provided not only better detail on roads and geological features, but also the locations of emerging camps of dislocated people — as requested by aid agencies, who were themselves co-ordinating efforts via another online community, the Google group CrisisMappers6.

As the resulting maps were far richer and more generally accessible than those available before the earthquake, they quickly became the main source of mapping information for groups including not only local organisations and aid agencies and Search and Rescue teams, but also the United Nations and the World Bank. As a consequence, the efforts of distant expertise resulted in a number of lives saved. The mapsters’ efforts did not end there: a year after the earthquake, a local group, Comunité OpenStreetMap de Haiti was set up7 to continue the task of developing better maps by, and for the Haitian people, with particular focus on aiding the response to the Cholera outbreak that occurred following the earthquake.

It’s not just Haiti that has benefited from the life saving efforts of groups such as Open Street Map. The crises in Somalia and Gaza, the nuclear disaster in Fukushima and others have all benefited from similar initiatives. “The incredible efforts following the Haiti earthquake demonstrated a huge potential for the future of humanitarian response,” wrote8 mapper Patrick Meier at National Geographic. But what exactly needed to be in place in order for such a group as OSM to even exist? The answer lies in the notion of open data, or more broadly, open computing. This has a long heritage: when maverick engineer Lee Felsenstein, who we met in the last chapter, first used the term ‘open’, he was reacting to the nature of computing at the time. Computers were monolithic and impenetrable, owned and rented out by equally monolithic and impenetrable corporations from his perspective.

No wonder that he and his colleagues worried about being slaves to the machine. This policy did of course grate with anybody who was quite happily making money out of selling software, not least Bill Gates. Microsoft was always different, as illustrated by the letter he sent to the Homebrew Club back in 1976. “As the majority of hobbyists must be aware, most of you steal your software. Hardware must be paid for, but software is something to share. Who cares if the people who worked on it get paid?” he wrote⁠.9 But little did Lee or anyone else know at the time that the dominance of grey corporations would wane, in response to a new breed of upstarts who took it upon themselves to take it to the man. His own friend and colleague Steve Wozniak for example, who co-founded Apple with Steve Jobs; or Scott McNealy of Sun Microsystems; or indeed Bill Gates, all of whom decided they could do a better job than the entrenched, incumbent computer companies. “They were just smart kids who came up with an angle that they have exploited to the max,” writes10 Bob Cringely.

By the mid 1980’s, a key battleground was the Unix, with many computer hardware companies including Sun, Hewlett Packard and IBM offering similar, yet subtly different, proprietary versions of the operating system. For some, including MIT-based programmer Richard Stallmann, enough was enough. Deciding that access to software was tantamount to a human right, Stallman set about creating an organisation that could produce a fully free version of Unix. “GNU will remove operating system software from the realm of competition,” he stated in his 1985 manifesto11. “I have decided to put together a sufficient body of free software so that I will be able to get along without any software that is not free.”

Stallman and his merry men’s efforts were licensed under the GNU General Purpose License, which required12 any software that was created from it to be licensed in the same way. In addition, any software released to the general population had to be provided as both binaries and source code, so that others could also understand and modify the programs. And thus, the term ‘open source’ came into existence. While progress was slow due to lack of funding and general mainstream disinterest, GNU’s efforts resulted in a nearly complete version of Unix, with all the programs associated with it apart from one: the ‘kernel’. That is, the very heart of the operating system, without which nothing else can function.

In 1991, when Scandinavian student Linus Torvalds started developing a Unix-like system for the IBM PC as his university project, the potential impact of his efforts eluded both the young Finn and the proprietors of the major computer companies of the time. Neither did Linux appear as much of a competitive threat three years later, when the software was first officially released. By coincidence, 1994 was also the year that Swedes Michael Widenius and David Axmark kicked off a project to build an open source relational database package, which they called MySQL. And, by another fortuitous coincidence, it was the year in which the W3C World Wide Web standards consortium was formed. Tim Berners Lee’s first Web server, a NextStep machine, had run a variant of Unix — and the OS was the logical choice for anyone else wanting to create a Web server. Demand for Unix-type capabilities increased rapidly as the Web itself grew, but from a pool of people that were not that willing or able to fork out on a proprietary version. Over time, more and more eyes turned towards Linux.

This would have been less of an event had it not been for another, parallel battle going on in the land of the corporates. Intel processors — traditionally the used in personal computers, were becoming more powerful as predicted by Moore’s Law. As a consequence Sun Microsystems, Hewlett Packard and others were losing out servers based on Intel or AMD hardware. These were cheaper simply because of the different business models of the companies involved: the former used to charge as much as they could, whereas the latter were looking to sell a greater volume. The overall consequence was that the Web had a much cheaper hardware base and a free, as in both13 speech and beer, operating system. Almost overnight, Linux (and other open source Unix versions) moved from the realm of hobbyists to having genuine mainstream appeal — as did software built on the platform, such as MySQL. The scene was set for more: in 1996 the Apache Web server followed, with the Perl scripting language soon after. And thus, the LAMP stack was born.

Still the corporations fought, but as they did so they unwittingly played into the hands of what was essentially becoming a commodity. Microsoft was also interested in becoming the operating system of choice, developing its embrace, extend and extinguish tactics14 to damage the competition. In 1998 Netscape, itself crippled by Microsoft, decided to open source its browser software, forming Mozilla and creating Firefox as a result. It is no surprise that added impetus to the open source movement came from Microsoft, or at least the reaction to its competitive dominance. The company had been taken to the cleaners by the European Union and the US trade commission, due to repeated breaches of its monopolistic position.

But perhaps the biggest damage was self-inflicted as, try as it might, it could not slow the momentum of the growing open source movement. The challenge was exacerbated as Microsoft’s competitve targets — such as IBM, Hewlett Packard and indeed, Sun Microsystems — realised they could use open source not only as a shield, but as a sword. Even as they began to adopt Linux and other packages to remove proprietary bottlenecks on its own businesses, they used open source to undermine Microsoft’s competitive advantage. The fact that much funding of Linux came from IBM has to be scrutinised, not least because there was plenty to be had from selling consulting services. As wrote Charles Leadbetter in his book We Think15, “Big companies like IBM and Hewlett Packard make money from implementing the open source software programmes that they help to create.”

With every boom there is a bust. And indeed, as the company who had done the best out of the dot-com also became one of the most significant victims of the dot-bomb, it turned to Open Source for its salvation. Sun Microsystems spent a billion dollars acquiring a number of open source companies including MySQL, as then-executive Jonathan Schwartz put the company’s hat fully in the ring. It was not enough to save the company, which fell by the wayside as the bottom fell out of the e-commerce market. Bt still, it helped cement the position of open source as software that was suitable for mainstream use. To quote Charles Leadbetter : “Google earns vast sums by milking the web’s collective intelligence: never has so much money been made by so few from the selfless, co-operative activities of so many.”

The overall consequence was that software could indeed be delivered free, but this has had commercial ramifications. Open source is no magic bullet, but rather a metric of how software (and indeed hardware architecture, as illustrated by Facebook’s efforts to open source computer designs) is commoditising. Not every open source project has been a success — indeed, ‘open-sourcing’ code has sometimes been seen as a euphemism for a company offloading a particular software package that it no longer wants to support. But many open software projects pervade. As a result of both corporation and community attitudes, of competitive positioning and free thinking, we have seen a broad range of software packages created, to be subsumed into a now-familiar platform of services. The LAMP stack is now the basis for much of what we call cloud computing, aiding the explosion in processing that we see today. And when Doug Cutting first conceived Hadoop16, he released it as an open source project, even though he was working for Yahoo! at the time — without which it may never have achieved its global levels of success.

The main lesson we can learn from open source is that proprietary software cannot do everything by itself — and nor should it, if the model creates an unnecessary bottleneck for whatever reason. To understand ‘open’, you first have to get what is meant by ‘closed’ that is, proprietary, locked away, restricted in some way — which brings us to another area that has gone from ‘closed’ to ‘open’ models, that of data. As we have already seen, we have been creating too much of the stuff, far more than any one organisation can do anything with. At the same time, data has experienced similarly communitarian desires as were experienced in the computer hardware and software domains. In this case, responsibility lies with the scientific community: back in 1942, Robert King Merton developed a number of theories on how science could be undertaken, notably the principle that scientific discovery (and hence the data that surrounded it) should be made available to others. “The institutional goal of science is the extension of certified knowledge,” he wrote17. “The substantive findings of science are a product of social collaboration and are assigned to the community. They constitute a common heritage in which the equity of the individual producer is severely limited.”

Such an attitude continues to the present day in the scientific community — the UK’s EPSRC sees it as very important that, “publicly funded research data should generally be made as widely and freely available as possible in a timely and responsible manner,” for example. However it has taken a while for broader industry to catch up. Indeed, it wasn’t until 2007 that an activist-led consortium met in Sebastopol, California to develop ideas specifically aimed at ‘freeing’ government data. Among them was publisher and open source advocate Tim O’Reilly, law professor Lawrence Lessig, who devised the Creative Commons licensing scheme for copyrighted materials, and Aaron Swartz, inventor of the Really Simple Syndication (RSS) scheme. Together, they created a set of principles that ‘open’ data should be18: complete, primary, timely, accessible, machine-processable, nondiscriminatory, nonproprietary and license-free. The goal was to influence the next round of presidential candidates as they kicked off their campaigns, and it worked: two years later, during President Obama’s first year in office, the US government announced19 the Open Government Directive, and launched the data.gov web site for US government data.

As the closed doors to many data silos were knocked off their hinges, one can imagine the data itself heaving a sigh of relief as, finally, it was released into the wild. Almost immediately however, the need for a standardised way of accessing such data became apparent. The Extensible Markup Language, XML was a logical choice; but over time interest moved to another, slightly simpler20 interchange format known as JSON, as originally used by Javascript for Web-based data transfers. And so, it became understood that anyone wanting to open access their data should do so through by providing a JSON-based application programming interface, or API. Such interfaces became first de facto, and then de rigeur for anyone wanting to create an externally accessible data store.

The consequence of doing so has been dramatic. When Transport for London opened up information on its bus routes, app developers were able to create low-cost mobile applications which stimulated use of the buses. Says Martin Haigh at the UK Land Registry, “We used to sell data, but now we just make it accessible.” Experienced more broadly, this positive feedback loop has led to a fundamental shift in how software creators perceive their worth. Just about every modern web site that has anything to do with data, from sports tracking sites such as Strava or MapMyRun, to social networks like Facebook or Twitter, and indeed resource sharing sites like AirBnB and Uber, offers an API to enable others to interact with their ‘information assets’. Indeed, this business shift even has a name — the API Economy describes not only the trend, but also the opportunity for new business startups that can capture part of what has become an increasingly dynamic market. And the expectation — it would now be perceived as folly to launch any such online service without providing an ‘open’ API. We have not seen the end of it, as the drive towards increasingly interconnected, ‘smarter’ devices generates still more data, much of which is stored and then made widely accessible to the community — such as Xively, which maps use of electricity and other resources.

While neither software nor data asked to be open, they each have reasons to be so. In software’s case, the dual forces of commoditisation and commercialisation required a balance to be struck between communal benefit and corporate gain; and for data, the public drive for transparency coupled with a business reality that third parties can exploit data better than any one individual. The result is an interplay between big business, top down approaches, and start-up, bottom-up approaches. It becomes very easy to create a startup, as the barrier to entry becomes very low to do so. Indeed, as the Law of Diminishing Thresholds recognises, it becomes very easy to do just about everything you might want.