L L E T I N
Berners-Lee: The Semantic Web Web of Machine-Processable
by Steve Hardin
Steve Hardin is
associate librarian in the Cunningham Memorial Library at
Indiana State University, Terre Haute, IN 47809; email: email@example.com
Tim Berners-Lee, inventor of the Web and director of the World Wide Web Consortium (W3C), says the Semantic Web represents a way to overcome some of the cultural boundaries on the World Wide Web. Berners-Lee discussed his ideas during the keynote session at the American Society for Information Science and Technology’s Annual Meeting in Providence, Rhode Island, on November 15, 2004.
Berners-Lee began by noting the theme of the conference, Managing and Enhancing Information: Cultures and Conflicts. To him, cultures and conflicts are just the eternal tension. Building standards is the way to make technical infrastructure. And easing the conflict in interoperable data is the Semantic Web – it lets us work in this world we haven’t all been to.
He elaborated that culture is the commonality of concepts and languages. Put a boundary around people, and by the end of the day they’ll have the beginnings of their own shared language. A group is defined by languages and boundaries, Berners-Lee said. We hope to have the ability to move information across barriers in an unprecedented way, but remember, he said, that those barriers are there for a reason. Some people need privacy. Boundaries inhibit communication between groups, and conflicts arise between them.
Berners-Lee also discussed what he called “the dangerous myth of top-down design.” The concept of a modular system is that you design it by breaking it down into parts. But you should build it to be a part of something bigger. Each module should respect other modules, he said.
There are problems with data, he noted. There are many different applications, with different syntaxes, unknown semantics and different languages. In other words, there are culture gaps. So the challenge is to design systems, which interact even though they share only a few concepts. That has led to the Semantic Web. The goal is to do for knowledge retrieval what the hypertext WWW did for hypertext. The idea is to create a web of machine-processable information.
The Semantic Web (SW) really started in metadata, Berners-Lee said, with the development of a resource description framework or RDF. When you use RDF, you model things and model relationships. You say – as they’ve been saying in XML and other approaches for ages – “my car has a unique license number.” Semantic data is much more reusable.
On the Semantic Web, everything has a Uniform Resource Identifier (URI). You don’t say, color; you give a Web address that provides an example of it. You can take RDF data and merge it with other things. You can set up subject and object nodes using the same URIs.
RDF lets you create semantic links. Defining certain concepts allows you to reference them for other computers to understand. Applications are connected by concepts. Our views of everything are constantly changing. He’d like to be able to say, “These photos were taken during this event,” and give everyone at that event access to them. But we’re not there yet.
The SW sets up fractal Web concepts by working across boundaries of scale – persona, group, global. Society is a fractal tangle, anyway, Berners-Lee said.
The URI is a global identifier. When putting data on the Web, you must use Semantic Web standards. URIs for things lead to information about them. URIs for properties and classes lead to ontologies about them. Think what can be said about things. Provide SPARQL (a query language for RDF) access for large data sources.
Will we actually be able to improve on history? Anyone can connect to anyone. But the ability for people to send problems to you more quickly doesn’t mean you can solve them more quickly. We’re not limited to tree-like systems, he said. Multiple connections are possible on the Web. There’s structure on many scales. The social topology is being changed – we used to have geographical boundaries. It was easier to do things at the local level and refer them up to national and world groups. Today, though, you can immediately discuss things on the international level. The stuff you pull out of a clogged drain, for example, is very heterogeneous – it’s very mixed, but also very powerfully blocking your drain. Berners-Lee said that’s a good analogy for society.
When processing data, he advised, be aware of where it comes from. It’s disgusting that the information society hasn’t solved the problem of spam. He said there should be an “Oh, yeah?” button on your PC which makes your computer go out and ask why you should believe something included in spam. RDF systems are being built more and more with an awareness of the problem. Be aware of where information is and isn’t going. Knowing this helps you determine who should tell things without infringing on someone’s privacy.
Berners-Lee said society must have a free, open corpus of information – a record – as the basis of civilization. He’s concerned about this because corporations and political parties find it easy to create strangleholds on records and news. Then they can control what people believe. There should be metadata about articles, and, ideally, articles themselves. There should be multimedia creativity, a “creativity commons.”
He recommends keeping the layers independent. Separate markets for hardware, software, network and content are more efficient. Nowadays, when you buy hardware, you get software already installed on it. It offers you trials to various Web services. When you search the Web, who’s controlling what you’re finding? When people control that, they can make you believe whatever they want. An ISP can control where you buy your shoes by controlling which packets drop when traffic gets tight. All of us have a duty in this country to watch what goes on, he said, so that we always have a choice when we buy, and we have open, honest elections. We need to inform younger people to watch these things too, he concluded.
In response to a question, Berners-Lee said one of the differences between the World Wide Web and the Semantic Web is that a person using the SW is actually partaking in a human community. Machines can be agents; they can go out and find all the weather information or financial information. You can create rules that tell the computer how to do things. But how do you take the half-formed ideas I have and distribute them without me coming to conclusions about them? That’s what the SW is about.
Another questioner asked when the semantic sharing of information would start occurring. Berners-Lee responded that it’s happening now. Things like the “Friend of a Friend” project (http://www.foaf-project.org/) represent an ontology which helps put people in context. Industry people will be wary of it until they see its use. He said he’d like to see publicly available databases housing information such as census data, the periodic table and lists of chemicals.
Another audience member noted that at the Annual Meeting’s previous plenary session, JC Herz said that any system that requires people to enter metadata will fail. How much metadata will people need to input on the Semantic Web? Berners-Lee responded that most of this data is in databases, in proprietary file formats. Connections haven’t been well made. The Web is about doing your own bit and benefiting from everyone else’s doing the same thing, he said.
Slides from Berners-Lee’s presentation may be viewed at www.w3.org/2004/Talks/1115-asis-tbl.
Copyright © 2005, American Society for Information Science and Technology