Bulletin, June/July 2006

Toward Terminology Services: Experiences with a Pilot Web Service Thesaurus Browser

by Douglas Tudhhope and Ceri Binding

Both authors are with the Hypermedia Research Unit at the University of Glamorgan, Wales, United Kingdom. Douglas Tudhope can be reached by email at dstudhope<at>glam.ac.uk. Ceri Binding can be reached at cbinding<at>glam.ac.uk

Dublin Core recommends controlled terminology for the subject of a resource. Knowledge organization systems (KOS), such as classifications, gazetteers, taxonomies and thesauri, provide controlled vocabularies that organize and structure concepts for indexing, classifying, browsing and search. For example, a thesaurus employs a set of standard semantic relationships (ISO 2788, ISO 5964), and major thesauri have a large entry vocabulary of terms considered equivalent for retrieval purposes. Many KOS have been made available for Web-based access. However, they are often not fully integrated into indexing and search systems and the full potential for networked and programmatic access remains untapped.

The lack of standardized access and interchange formats impedes wider use of KOS resources. We developed a Web demonstrator (www.comp.glam.ac.uk/~FACET/webdemo/) (Binding & Tudhope, 2004) for the FACET project (www.comp.glam.ac.uk/~facet/facetproject.html) (Tudhope et al, In press) that explored thesaurus-based query expansion with the Getty Art and Architecture Thesaurus. A Web demonstrator was implemented via Active Server Pages (ASP) with server-side scripting and compiled server-side components for database access, and cascading style sheets for presentation. The browser-based interactive interface permits dynamic control of query term expansion (Figure 1). However, being based on a custom thesaurus representation and API, the techniques cannot be applied directly to thesauri in other formats on the Web.

General programmatic access requires commonly agreed protocols, for example, building on Web and Grid services. The development of common KOS representation formats and service protocols are closely linked. Linda Hill and colleagues argued in 2002 for a general KOS service protocol from which protocols for specific types of KOS can be derived. Thus, in the future, a combination of thesaurus and query protocols might permit a thesaurus to be used with a choice of search tools on various kinds of databases. 

Service-oriented architectures bring an opportunity for moving toward a clearer separation of interface components from the underlying data sources. In our view, basing distributed protocol services on the atomic elements of thesaurus data structures and relationships is not necessarily the best approach because client operations that require multiple client-server calls would carry too much overhead. This would limit the interfaces that could be offered by applications following such a protocol. Advanced interactive interfaces require protocols that group primitive thesaurus data elements (via their relationships) into composites to achieve reasonable response. 

SKOS API and Schema
The Simple Knowledge Organization System Application Programming Interface (SKOS API) (www.w3.org/2001/sw/Europe/reports/thes/skosapi.html) is a recent development that addresses some of these issues. It defines a core set of methods for programmatically accessing and querying a thesaurus based on the SKOS-Core RDF schema (www.w3.org/2004/02/skos/) (Miles et al., 2005). The API is an interface designed to provide programmatic access to thesauri and other simple knowledge organization systems (SKOS) via the Web, provided they are represented according to the SKOS-Core schema. While issuing Web service calls, the API itself remains independent of such concrete implementation details. 

The SKOS API builds on previous protocols (see Binding & Tudhope for a review of CERES, Zthes and ADL). Briefly, one set of SKOS calls returns a concept(s) with its details via an ID, a preferred label or matching a keyword or regular expression. Another call returns a list of supported semantic relations for the given thesaurus. Another set of calls returns concepts connected by a specified relation or all immediately connected concepts. Importantly, it is possible to get a set of concepts connected by a relation up to a given path length.

Pilot SKOS API Browser
We developed a pilot PC-based (.NET) Web service client application as an initial experiment with the SKOS API, a rich client browser displaying details for thesaurus concepts. It made SKOS API calls to the remote SWAD-Europe DREFT Web services server at the (Bristol University) Institute of Learning and Research Technology (ILRT). The DREFT server was a temporary demonstrator. 

The browser acted on GEMET (GEneral Multilingual Environmental Thesaurus), held on the DREFT server in SKOS format – see www.eionet.eu.int/gemet.

Some SKOS API calls return thesaurus relationship data. These calls would have allowed an interface similar to the FACET Web demonstrator, where hierarchical and associative relationships are visualized and can be navigated. However, due to limitations imposed by local requirements on the DREFT server configuration at the time, these API calls were disabled. SKOS API calls that involved string matching of concept terms also were not available at the time, so matching of user search statements with controlled terminology was not possible. 

Therefore the browser only utilized a small subset (two) of the possible SKOS API calls: getConcept and getAllConceptRelatives. These calls do not return relationship information so the browser could only display the immediately semantically related concepts in a linear list of concepts that have some direct relationship to the displayed concept. 

User Interface
The user interface (Figure 2) comprised a single screen. “Back” and “Next” buttons provided functionality similar to Web browser history buttons – they navigate back and forth through a sequence of previously viewed concepts. This was relatively fast since previously retrieved concepts were cached locally during a session.

Directly below these buttons, the “Concept Reference” section held a drop-down list of concept identifiers for all previously viewed concepts and a “Go” button, which displayed details for the concept represented by the selected identifier. Some initial GEMET identifiers were provided at start-up, or an identifier could be typed directly into the box if known. The identifiers formed convenient “jumping in” points to initiate a browsing session, in the absence of any comprehensive concept search facilities. As each new concept was viewed, its identifier was added to the drop down list for future re-selection.

It is important to note that although concept identifiers were structured as uniform resource identifiers (URIs) they were not necessarily real, accessible Web locations – the browser application used Web service API calls to retrieve data using the selected concept identifier as a unique reference to the resource required.

The “Concept Details” section consisted of three separate boxes. The first box displayed the preferred term (in bold) for the currently selected concept. Any non-preferred terms for the currently selected concept were also displayed in square brackets. The middle box displayed any scope notes for the currently selected concept. The lower box listed concepts with a direct relationship to the currently displayed concept. Clicking any concept in this list replaced the “current concept” with the details for the new concept – effectively allowing the user to browse through the concept space. Concepts previously visited were indicated by color.

The status bar at the bottom of the form (blank in Figure 2) gave general feedback on the status of current operations – in particular indicating when a call was being made to the server. In operation we found that although most of the time server response was very good, occasionally a call took some time to return data (10 seconds or more) and the feedback was useful.

Based on early experimentation it was clear that caching (local storage) of concepts would be beneficial to prevent unnecessary repeated server calls. The current design of the SKOS API dictates that two separate server calls are necessary for the display of a concept and its directly related concepts. Bearing in mind the relatively static nature of the data being viewed (the underlying thesaurus data is unlikely to change on the server during a session), local caching of concepts for the lifetime of the user session was viewed as a sensible strategy. The implementation of concept caching made a significant difference to the apparent speed of operation, enhancing the overall user experience.

Concepts are always retrieved from the local cache for display. Any request for a concept first consults the cache. If the concept already exists, it is displayed; otherwise it is retrieved from the server and added to the cache, then displayed. Concepts in the “previous” and “next” browsing history are guaranteed to exist already in the cache.

We intend to continue this line of research and explore Web service clients with richer functionality that utilize more API calls. Future issues for thesaurus and KOS protocols include provision of more complex services, such as semantic expansion, more advanced natural language functionality, cross-mapping provisions and data-dependent filters such as the number of postings associated with a concept. 

This small pilot suggests that the SKOS API can provide a reasonable basis for programmatic access to thesauri and related KOS. It builds on work in the NKOS (Networked Knowledge Organization Systems/ Services, http://nkos.slis.kent.edu/) and digital library communities on scenarios for KOS-based interactive services such as attempting to match user vocabulary with controlled terms, browsing or selecting a concept or set of concepts according to KOS relationships. The API could be used in combination with different kinds of search systems. When considering large-scale, augmented KOS, or ontologies, then general semantic query languages, such as SPARQL (Simple Protocol and RDF Query Language), may be appropriate for applications requiring logic-based reasoning. For more general search-based applications, the SKOS API could be used in combination with query APIs, such as SRW/U (Search/Retrieve via URL/Web Service), Google or Verity. However, we would argue there is also a need for a specialized KOS query API to take full advantage of applications with KOS-indexed metadata.

With regard to evolution of the API, we recommend that such protocols return relationship information with all calls. Those calls that return a list of concepts related by a specified relationship, such as broader, should return a structured list, identifying the level of expansion from the initial concept, rather than an undifferentiated list (as at present). A single call should be available for the display of a concept and its directly related concepts.

As initial exploration of programmatic access to KOS with the SKOS API, we developed a pilot PC-based (.NET) Web service client demonstrator application. This was tested on a remote server, which used a different technology platform. Supported by concept caching, it generally achieved a fast enough response for reasonable interaction and suggests that the SKOS API can support client applications of this type. However further refinement of the API is required.

Given foreseeable bandwidth limitations, KOS-specific protocols continue to be needed for efficient services. Future API designs should support the further integration of KOS into rich but responsive mapping and query terminology services.

For Further Reading
Binding, C., & Tudhope, D. (2004). KOS at your service: Programmatic access to knowledge organization systems. Journal of Digital Information, 4 (4). Article No. 265, 2004-02-0. Retrieved April 15, 2006, from http://jodi.tamu.edu/Articles/v04/i04/Binding/.

FACET Project, University of Glamorgan. Retrieved April 15, 2006, from www.comp.glam.ac.uk/~facet/facetproject.html.

FACET Web Demonstrator. Retrieved April 15, 2006, from www.comp.glam.ac.uk/~FACET/webdemo/.

Hill, L., Buchel, O., Janée, G., & Zeng, M. (2002). Integration of knowledge organization systems into digital library architectures. In Proceedings of the 13th ASIS&T SIG/CR Workshop: Reconceptualizing Classification Research. 

Miles, A., Matthews, B., Wilson, M., & Brickley, D. (2005). SKOS Core: Simple knowledge organization for the Web. In Proceedings of the International Conference on Dublin Core and Metadata Applications (DC-2005), Madrid (pp.5-13). 

NKOS Network. Networked Knowledge Organization Systems/Services. Retrieved April 15, 200, from http://nkos.slis.kent.edu/.

SKOS API. Retrieved April 15, 2006, from www.w3.org/2001/sw/Europe/reports/thes/skosapi.html.

SKOS Core. Retrieved April 15, 2006 from www.w3.org/2004/02/skos/.

Tudhope, D., Binding, C., Blocks, D., & Cunliffe, D. (In press). Query expansion via conceptual distance in thesaurus indexed collections. Journal of Documentation.