GENEVA, SWITZERLAND — Twenty years ago this month, anticipating the enormous information-management challenges of projects such as the then-nascent Large Hadron Collider, CERN researcher Tim Berners-Lee submitted a curious paper to his boss: a proposal to use hypertext to connect text files on individual computers. The goal was to form an information network with links by which people could easily navigate between multiple digital documents. “Vague but exciting,” the manager, Mike Sendall, wrote on the margins of the paper upon first reading it, but granted Berners-Lee his support to continue exploring. Fortunately, he did, as the proposal has had staggering influence well beyond the European physics research center, laying out the core concepts of what we now know as the World Wide Web.
Last Friday, more than a hundred invitees joined CERN notables and alumni of Berners-Lee’s group — including Robert Cailliau, the first web surfer, Berners-Lee’s mentor, Ben Segal, and Jean-François Groff, Berners-Lee’s assistant and collaborator in the writing of the first hypertext browser and editor (and the first person to ever earn money on the web) — in celebration of the 20th anniversary of Berners-Lee’s proposal submission.
In the 18 years since a small number of people in the scientific community began using that hyperlink system, scientific research has changed profoundly. With computer networks connecting many faster, more numerous, and cheaper processors than ever before, science has been decentralized, allowing researchers in small or remote institutions to share and access the data of the largest institutions.
Tim Berners-Lee. Credit: CERN
The web has transformed science in two ways. “With the web, there has been a paradigm shift in scientific publishing,” says James Gillies, head of communications at CERN and coauthor of How the Web Was Born. He points to the arXiv database as an example. Anyone with interest, an internet connection, and a browser has access to more than a half million articles in physics, mathematics, computer science, quantitative biology, quantitative finance, and statistics. According to Gillies, the arXiv circumvents the traditional journal approach and has undoubtedly played a role in the burgeoning open-access publishing model.
In the past five years, the web has also democratized information publishing in society, says Gillies. Science used to be conducted without the public’s direct participation. When people called themselves “scientists,” there were standards that the scientific community expected would be upheld. Laypeople did not participate in the discussion. “Today, with the web, everyone has the ability to express themselves and to publish information as if it were truth,” notes Gillies. “It has gotten significantly more difficult to differentiate science from the opinions of others who are writing.
“As the person responsible for CERN’s communication, I’ve been in the front line of this with the mass delusion on the web that CERN’s current project [the LHC] might be dangerous — that’s absolute nonsense, of course, but people have not yet learned how to evaluate the mass of information online,” Gillies observes.
At the celebration, Berners-Lee invited four experts to share the stage with him in describing what the future holds for the web. In essence, the future features vastly more, and more distributed, processing power and orders of magnitude more data. “We have the potential to do much more than only share documents,” says Berners-Lee. At the very root of science, of insight, and perhaps what we call “truth,” there is data. It is databases, no longer documents, that Berners-Lee is now working to reconcile. Berners-Lee wants to use the web to connect information that he describes as being locked in databases, to make databases more collaborative, in effect, breaking down data “silos.”
Several projects highlight the power of what Berners-Lee is calling “linked data.” One example profiled at the event by Christian Bizer, researcher at Freie Universität Berlin, is DBpedia. DBpedia is a community effort to extract structured information (information that is available in a consistent format in the upper right corner on each entry) from Wikipedia and to make the same information available as a database on the web. Another illustration of linked data is the Real Time Monitor for tracking the processing power usage of networked science research around Europe. The visualization, developed as part of Europe’s Enabling Grids for E-Science (EGEE) project, permits the user to virtually watch processor and network usage on a map of Europe as jobs are being sent and performed.
Tom Scott, digital editor of BBC Earth, described yet another project that’s indicative of the web’s future course. Instead of making humans more computer literate, Scott says, we should be working to make “computers more human-literate.” The goal is to develop systems whereby everything about a person is interconnected to everything in which that person is interested. The next stage is to expand this capability to all people on the planet, regardless of their geography or device, in a system that becomes an all-inclusive social graph. “We can use the web to not only deliver content,” explains Scott, “but also to let people discover more content, and to mash content together to create new stories.”
At age 20, the web is very young. Much remains to be written and discussed, as well as to be revealed, about its impact on science and society at large. Like the original proposal, Berners-Lee’s vision of the information network of the future will require the development of new protocols and policies (security of information, privacy for users and their data, and scalability of systems processing such massive amounts of data in real time) before it meets the designs of its architects and the needs of users. These, and unanticipated new challenges — scientific, social, and economic — will be the focus of global resources for years to come.
Originally published March 17, 2009