GEM: Using Metadata to Enhance Internet Retrieval by K-12 Teachers

by Stuart A. Sutton and Sam G. Oh

The National Library of Education (NLE) Advisory Task Force identified lesson plans and teacher guides as a top priority area in which NLE should apply library and information science expertise to improve the organization and accessibility of the substantial, but uncataloged, collections of such material that are already available on various federal, state, university, non-profit and commercial Internet sites. In association with NLE and the U.S. Department of Education, the ERIC Clearinghouse on Information and Technology at Syracuse University agreed to spearhead a one-year project to develop an operational framework to provide the nation's teachers with "one-stop/any-stop" access to the thousands of lesson plans, curriculum units and other Internet-based educational resources. In general, these valuable resources are difficult for most teachers to find in an efficient, effective manner. The goal of the Gateway to Educational Materials (GEM) project is to alleviate substantially this resource discovery problem.

In November, 1996, a stakeholders' meeting was held at Syracuse University to frame the development of a common gateway to lesson plans on the Internet. In the months following the stakeholders' meeting, the group of participants served as a Working Group to develop and refine the work begun at Syracuse. The plan was for the Working Group to develop the foundation for GEM through Phase 1 of the project. In Phase 2, a broader constituency in the form of the GEM Consortium will be engaged in the full-scale deployment of the GEM standard through its application to educational resources across the Internet.

The project's initial goal was to develop a common gateway to lesson plans. However, one of the early Working Group decisions was to broaden the scope of the project to include not only the description of lesson plans but all Internet-based educational materials.

The five major tasks addressed by the GEM project are to

In the following paragraphs, we will discuss the first three of these tasks.

GEM Profile and Controlled Vocabularies

From the outset, the Working Group wanted to develop the various aspects of GEM around emerging standards for networked information discovery and retrieval (NIDR). The working group decided that the GEM profile would assume as its base referent the Dublin Core element set (DC) and the Warwick Framework due to their growing national and international recognition, acceptance and support.

The general goals of GEM and DC are similar; however, in many ways, they are not congruent. DC is designed to serve NIDR through a fielded surrogate supposedly simple enough to be applied to resources by authors and Internet providers untrained in the complexities of cataloging necessary to the creation of more richly structured surrogates (e.g., the MARC record). While its simplicity serves coarse-grained NIDR across a broad range of networked information, DC is ill-equipped for more fine-grained NIDR of resources necessary to particular discourse or practice communities such as the nation's K-12 teachers. GEM is intended to serve NIDR needs of this constituency along a continuum that begins with what is achievable with a simple, unqualified, fielded surrogate as set out by the DC "minimalists" to a surrogate coming closer to (but never reaching) the richly structured surrogate. In addition, GEM assumes that the profile will be applied by a range of organizations with a higher level of commitment and expertise than that assumed by DC.

Recognizing that the 15 base elements of DC would not serve all purposes, one of the underlying assumptions of its founders was that it would be extensible in two fundamental ways: (1) additional elements could be added to meet the needs of specific domains and (2) its elements could be enriched through the use of qualifying "schemes" and "types." GEM pursues an enriched structure through both of these mechanisms. To the 15-element DC base package, the GEM Working Group added an 8-element, domain-specific GEM package which makes it possible to capture the following domain-specific information about the resource being cataloged: (1) whether the target audience for the resource has special needs or forms a discrete demographic group (Audience), (2) the nature of any identified pedagogical methods employed (Pedagogy), (3) indicators or resource quality (Quality and Quality Indicator), (4) academic standards mapped to the resource (Standards) and (5) other resources necessary to the successful use of the resource by a teacher (Resource Needed). Also, information regarding grade or educational level of the target audience of the resource is captured (Master).

GEM Syntax

To deploy GEM metadata, a cataloging module was developed by the Working Group's project team that gathers GEM profile information for a resource through a simple graphical interface with pull down menus for accessing controlled vocabularies. As the metadata for the resource is gathered, it is prepared to be written out as HTML meta tags in a content-overload syntax of the form

meta NAME="ElementName" CONTENT="(Scheme=SchemeName) (Type=TypeName) TextOfContent"

Extensions to the module are under development that will make it syntax-independent, allowing GEM metadata to be written out in other forms including PICS. These meta tags may then be embedded directly into the HTML-tagged resource being described or they may reside separately in cases where embedding is either not desired or not possible (e.g., where the resource is being cataloged by an agency other than the one that created the resource or if the resource is a graphic, audio or video file).

Prototype Interfaces

While it is assumed that any number of existing and yet-to-be-developed programs will be able to collect and manipulate GEM metadata, the Working Group has developed a harvesting application for building a Union Catalog. The syntax of the Union Catalog is based loosely on the templates developed by the Internet Engineering Task Force (IETF) Working Group on Internet Anonymous FTP Archives (IAFA). In the project's Phase 1, two prototype interfaces to this Union Catalog of GEM metadata have been developed. At Syracuse, a search and browse environment has been built using PLWeb, a full-text search engine by Personal Library Software. At the University of Washington, a relational database driven interface has been developed. Since the database work at the University of Washington will be the basis of the Union Catalog's future development and deployment, the remainder of this article will focus on its theoretical foundations and its current use as a database-driven prototype interface to GEM.

Conceptual Schema

The conceptual data modeling approach is taken in implementing the GEM database because it helps us identify important entities and understand relationships among entities. It can also improve communications among various people (e.g., users, managers and system designers). Among many conceptual data models, the Entity-Relationship (ER) model has been used as a standard for the conceptual modeling of industry databases. (See Figure 1.)

A brief explanation of the ER model is in order. An entity is denoted by a rectangular box with its name written inside. The relationship type can be specified either above or below the line linking the two entities. For example, a resource (MASTER) can "be based on" more than one SOURCE. The phrase "be based on" is a description of the relationship type and would appear either above or below the connecting line. We omit relationship type specifications here in order to keep the diagram as simple as possible. The arrows (specifying many) and the straight lines (specifying one) describe the cardinality which expresses the specific number of entity occurrences associated with one occurrence of the related entity. For example, a resource (MASTER) can have minimally one (straight line) and maximally many (arrow) subjects (SUBJECT). Also a related site (RELATION) belongs to minimally one (straight line) and maximally one (straight line) resource (MASTER) which means that a related site belongs to only one particular resource (MASTER).

In addition to basic descriptive data such as the title of the resource and its Web location (MASTER), a wide range of other information deemed useful for NIDR are defined in DC. When cataloging the educational resources at a Web site, a number of relevant dates can be associated with that resource (Date). Information about other resources that bear some significant relationship to the resource being described can be recorded (Relation). Also the resource being cataloged can be derived from one or more sources (Source) and a particular source can be the basis for more than one resource. The resource can also be written in more than one language (Language) and a particular language can be used in many resources. A cataloger can assign many subjects to a resource in the form of keywords or as postings from a controlled subject vocabulary (Subject and Keywords) and either a subject or a keyword can be assigned more than one resource. The resource may have either, or both, a spatial or temporal aspect (Coverage). Lastly, there can be a number of people or organizations involved in the creation and publication of a particular resource (Responsibility) and a particular person or an organization may create more than one resource.

The logical schema on the next page corresponds to the conceptual schema presented above and describes the attributes defined for each entity. When there are many-to-many relationships (e.g., Master and Responsibility; Master and Keyword; Master and Language, etc.) between two entities, a bridge table is necessary to keep track of activities between them. For example, a particular keyword in the "Keyword" table can be used to describe many different resources, so it has to be linked to the particular resource. The "MasterKeyword" table provides this bridging. Attribute names are defined in the context of the table. The attribute "type" in the "source" table should be understood as SourceType.

Based on the above conceptual and logical schemas, a prototype system was developed at the University of Washington. Microsoft active server page (asp) technology was employed in developing the system. The asp technology lets one work with any Open Database Connectivity (ODBC) compliant database (e.g., Oracle, Sybase, Access, etc.).

The search interface of the prototype system has a provision for browsing an alphabetized listing of assigned keywords or subjects. The prototype currently allows users to search by title, broad and narrow subject terms, keywords, quality level and grade, but other search criteria can be easily added as needed. It also allows users to browse by keywords (and subjects) using the right side of the window. The keyword and subject lists are created dynamically from the back-end relational database.

If a user submits a query or browses, output screens show titles and keywords that are instantly clickable. If the user clicks on one of the keywords listed for a resource, he or she is taken to the short display of all titles to which the clicked keyword has been assigned. If a user clicks a title, the system displays more detailed information on the resource. A carefully cataloged resource will provide the user with sufficient information to determine whether or not to visit the actual resource on the remote server. If a user chooses to visit the resource, all he or she needs to do is click on the title in the full display.


The use of metadata to enhance NIDR is an area of growing interest as the World Wide Web grows exponentially and resource discovery and retrieval grow more problematic. At the same time, as President Clinton's policy focus on education exemplifies, the need for access to Web-based materials by the nation's teachers is also growing at a rapid pace. Two primary roles of GEM seek to meet the needs of educators through (1) the development and wide deployment of the GEM standard in the form of a metadata profile, an accompanying set of controlled vocabularies and a well-defined set of practices in their application to Internet-based educational materials and (2) the development of a GEM-based union catalog of educational materials on the Internet. The development of the profile, preliminary vocabularies and the definition of practices were the work of the project's Phase 1. The project is perched on the edge of wide deployment through the GEM Consortium in the coming Phase 2.

Stuart A. Sutton is associate professor in the School of Information Studies at Syracuse University.
Sam G. Oh is assistant professor in the Graduate School of Library and Information Science at the University of Washington, Box 352930, Seattle, WA 98195-2930. He can be reached by phone at 206/543-1889 or by e-mail at
Table Attributes
Master {MasterID, SID, SDN, GEMversion, Duration, Format, Description, ResourceType, RightsURL, Title, GradeBegin, GradeEnd, Compliance}
Responsibility {ResID, ResType, Name, Role, Affiliation, Contact, Email, Postal, Phone, Fax, HomePageURL}
MasterResponsibility {MasterID, ResID}
Coverage {CoverageID, Scheme, Type, Text}
MasterCoverage {MasterID, CoverageID}
Source (SourceID, Type, FormattedData, Text}
MasterSource {MasterID, SourceID}
Language {LanguageCode, Text, Scheme}
MasterLanguage {MasterID, LanguageCode}
Subject {SubjectID, GeneralSubject, SpecificSubject}
MasterSubject {MasterID, SubjectID}
Date {Type, Scheme, Date, MasterID}
Audience {AudienceID, Scheme, Type, Text}
MasterAudience {MasterID, AudienceID}
Pedagogy {PedagogyID, Type, Text}
MasterPedagogy {MasterID, PedagogyID}
Quality {QualityID, Scheme, Authority, Scale, Category, DetailURL}
QualityIndicator {Criteria, Value, QualityID}
MasterQuality {MasterID, QualityID}
Keyword {Keyword, Scheme}
MasterKeyword {MasterID, Keyword}
ResourceNeeded {Resource, MasterID}
Standard {StandardID, Authority, Correlation, Text, Scheme, Code, Topic, Grade, Main, Subordinate, MasterID}
Legend: Primary key is underlined and Foreign Key is boldfaced.