User Evaluation: Summary of the Methodologies and Results for the Alexandria Digital Library, University of California at Santa Barbara


Linda L. Hill, Ron Dolin, James Frew, Randall B. Kemp, Mary Larsgaard, Daniel R. Montello, Mary-Anna Rae, and Jason Simpson


Alexandria Digital Library Project, University of California, Santa Barbara, California


Abstract

The Alexandria Digital Library (ADL) is one of the six digital library projects funded by NSF, DARPA, and NASA. ADL’s collection and services focus on geospatial information: maps, images, georeferenced data sets and text, and other information sources with links to geographic locations. Throughout the project, user feedback has been collected through various formal and informal methods. These include online surveys, beta tester registration, ethnographic studies of ADL users and potential users, target user group focus sessions, and user feedback comments while using the interfaces. This paper briefly describes the evaluation studies conducted and what was learned about user characteristics and about the study approaches themselves. User reactions to the ADL interface and to the functionality and content of ADL are summarized. Finally, the value of these findings to design and implementation decisions is considered.

INTRODUCTION

Work in the area of digital libraries has gained momentum over the last decade (Fox et al, 1995). Much of this work, perhaps related to the term digital, is involved with computer science issues such as databases, networking, storage, scalability, and image processing. However, it is not mere coincidence that the term library is included in the phrase digital library. Digital libraries are expected, eventually, to provide at least those services provided by Traditional libraries, and much more. Thus, the study of digital libraries also includes such areas as economics, security, ethnographic and sociological studies, and usability. Regardless of the power of the system, it is successful only to the degree to which the vast majority of its intended users are able to use its intended functionality. Therefore, the six projects in the NSF/DARPA/NASA Digital Library Initiative (Fox et al, 1995, pp.57-64); National Science Foundation, 1993) have put serious effort into evaluating emerging prototypes in relation to users. This paper discusses the evaluation work within the Alexandria Digital Library (ADL) Project that has involved interaction with users and potential users of ADL.

Beginning with the first Rapid Prototype (Alexandria Digital Library Prototype CD, 1995) and continuing with the current Web Prototype (Andresen et al, 1995), ADL has worked toward providing a system that facilitates the search and retrieval of any information that is geographically referenced. This includes both digital and hardcopy formats and such document types as maps, aerial photographs, satellite images, and texts (e.g., books, articles, government reports, and research papers). Most of the development to date has been driven by the need to demonstrate core functionality.

Figures 1-5 illustrate the system architecture and the user interface for the beta web version of ADL that was the focus of the user evaluation studies discussed here. The user is presented with three approaches to finding information in ADL. He or she can use the Map Browser to zoom the search window over the map into the area of interest. The Gazetteer can be used to find place names or types of features. The footprints of selected places (footprints are latitude and longitude descriptions of location and can be either points, lines, bounding boxes, or polygons) can be displayed in the Map Browser in order to orient the Map Browser search window (or finding the location of a place can itself be the end result of the search). The geographic area of interest (search window in the Map Browser) can then be used to search the catalog to find holdings with overlapping or enclosed geographic footprints. On the Catalog page of the interface, a user can also specify other query parameters using any of the attributes of the ADL metadata schema. The ADL metadata schema incorporates all of the fields in the Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, plus a few fields from USMARC required for describing hardcopy items. In the result sets that are returned, digital items are represented by browse graphics. The metadata for selected items can be viewed in full. All items have geographic footprints that can be displayed in the Map Browser. Footprints from the Gazetteer and from the result sets can be displayed in the Map Browser at the same time so the user can see the spatial relationships.

Figure 1 also shows the system view of ADL that may not be apparent to the user. Much of the development work has gone into this underlying architecture and the "glue" between the components. Regardless of the "klunkiness" of the interface or overall system design, this function-driven approach served the purpose of proving that the system could perform requisite tasks and bringing to the surface problematic components or points of integration. Throughout this process, evaluative tasks were being performed. The system is currently being redesigned with the hope of fully incorporating the perspective of the intended users into a "user-centered design" (Sugar, 1995).

Digital libraries go beyond what might be considered standard information retrieval (IR) in that they may include, for example, post-retrieval activities such as long-term storage of user profiles and results, integration of results with processing, and a mixing of collections and processes in a user’s work space and the central library. User-centered design for a digital library must include not only systems evaluation but also an understanding of the process of information seeking and use (Marchionini, 1995).

There are obvious problems with evaluating a system that is under development and changes over time. However, this can be dealt with through version control and careful planning. What is perhaps more interesting is the interplay between the design and the user feedback. Clearly with new software systems there is a degree of education involved; the users must understand the capabilities in order to evaluate them. This would be considered part of the standard "usability techniques." However, as a research project, we are attempting to tailor the design to the information use patterns and feedback of users. Thus the process may be thought of as a cycle in which the implementers build certain functionality, users are introduced to the potential of the new features, the users realize new potential (some of which may not be part of the system), and the implementers modify the design. How do we evaluate a system whose functionality is determined in part by user feedback, while the feedback is used to measure, in part, how well the system performs? We make the comparison to the "self-evident door handle" (Norman, 1988, pp.87-92) which, when correctly designed, makes its use obvious. We are not building something as simple, nor as functionally static, as a door handle. For a digital library – specifically, the interface to a digital library – it is an iterative process of discovering the conceptual models and affordances (Norman, 1988, pp.9-12) that work for users.

Alexandria is not primarily an evaluation research project nor does it necessarily attempt to develop new models of IR behavior. Its focus is more on: 1) incorporation of geospatial searching into existing IR models as an extension of those models; 2) computer science research topics; and 3) the implications of user studies on the design and implementation of spatial digital libraries.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

WHY WE CHOSE THE STUDIES OF USERS THAT WE DID

There are many evaluation methods available. Evaluation studies range from more specific information about a few users, as in videotaping, to less specific information about a lot of users, such as the beta test registration information.

 We chose methods based on:

Members of the User Evaluation and User Needs Analysis Teams come from diverse backgrounds including computer science, information science, education, sociology, and library science. As is common in such a group, it has taken some time for us to understand each other's academic background and orientation, as well as the problems each is attempting to address. As this understanding has grown, the geographical topics, evaluation methods, specifications and requirements, and even cognitive Information Retrieval (IR) models have grown into an integrated group-wide conception of the development task. This allows us to do a better job of incorporating the evaluation results into the implementation, directing evaluation to answer specific questions, such as integrating Geographic Information System (GIS) concepts into IR models.

We chose the following methods:

The first four of these studies are being done at the University of California, Santa Barbara and are reported on here. Dr. Barbara Buttenfield (Buttenfield, babs@colorado.edu) is doing the log analysis work at the University of Colorado, Boulder. She has nine months of transaction logs incorporating over 134,000 transactions and is using them to illustrate graphically the navigation paths users choose through ADL (Buttenfield & Kumler, 1996). It is part of our plan to integrate the direct user involvement studies with the log analysis studies in the next stage of analysis.

We view our choice of methods to be complementary to each other. The demographics from the beta tester registration provided the background for the online survey data and told us something about the people who showed an interest in using our system. The survey obtained more detailed information from a subset of those users about their reactions to the interface. The analysis of the user comments from the survey and other sources supported the findings of the videotaped and audiotaped sessions. The Target User Groups, which focused on user needs as expressed through user scenarios and user requirements, gave us an idea of how the task environment affects user expectations of such a system.

BRIEF DESCRIPTION OF EACH METHOD

Beta Tester Demographics

The ADL Beta Tester program began during the spring/summer of 1996. At that time, interested persons were able to sign-up with ADL to gain access to the library. People had to fill out a short Web Access Request Form in order to receive the username and password, which enabled them to access the full functionality of ADL on the Web. The original intent of the Web Access Request Form was to provide the information needed to decide who would be chosen to be a beta tester. Therefore, the responses to the questions were free form; that is, not bounded by a limited set of choices and not verified. Subsequently, it was decided to let everyone become a beta tester who requested access.

The Request Form contains five mandatory fields and a set of optional fields. The required fields are Name, Email, Organization, Occupation, and Referral. None of these is controlled in any way by a domain or accuracy check. Everyone who has submitted the form with a valid email address has received a confirmation notice with the username and password.

There were 2287 beta tester registrations available to us when we analyzed the data. The responses were analyzed manually to see what could be learned about the set of interested users who wanted to test the ADL web interface. An analysis of the access logs indicate that beta testers were from 906 different IP addresses, with a total of 1,340 sessions between August 12, 1996 and February 4, 1997. Since the usage logs before August 12th were lost as well as the logs for fourteen days after August 12th, the full count of the number of accesses by beta testers is not possible. A smaller number of beta testers (109) submitted the online survey form (see description of the survey below).

A high-level summary of the results of the analysis of the beta tester data from the Access Form is that

Beta Tester Survey

The ADL User Feedback Survey is a web-based online survey. It is a six-part questionnaire that combines open-ended with Likert-scale forced-choice questions. Questions about the user's background were also included, such as his or her familiarity with computers and with the type of information ADL is designed to retrieve. Most of the multiple-choice questions were formatted with five possible answers: strongly agree, agree, neutral, disagree, strongly disagree, no opinion. These questions were randomly formatted in a positive or negative sense to remove the tendency of users to give all answers the same and encourage a more careful reading of the statements. Each major section of the survey included an open-ended question with room for a free-text response.

There were three primary goals of the survey. The first was to provide a mechanism for acquiring detailed and directed feedback about users' experiences with the interface. In particular, it provided quantitative data to complement the qualitative data generated by some of our other evaluation methodologies. The second was to learn something about the ADL community of users. Finally, it was hoped that the survey would allow us to study relationships between users' experiences with the system and their background characteristics.

There were 96 usable surveys (completely or nearly completely filled out) from 109 surveys submitted. Demographically, the group is:

They are, as a group, frequent users of computers, libraries, geographical data, the web, and online catalogs.

Using the incomplete usage logs that are available, it appears that between 40% and 70% of the beta testers actually used ADL. If it is therefore assumed that between 1000 and 1600 beta testers actually used the system at least once, the survey responses that were submitted represent approximately 6% to10% of the active beta testers and approximately 4% of the total number who signed up to be beta testers. The low response rate and uncertain extent of any non-response bias suggest caution in generalizing the results. Also, most respondents filled out this survey after no more than one or two sessions of exposure to ADL.

Factor analysis using multivariate data reduction techniques (e.g., Mulaik, 1972) produced six factors that account for 58.5% of the variance in the original data. These are shown here, in order by their effect on the variance of responses.

Factor Label
Variance accounted for
Statistical strength
1. Overall ease of use
19.7%
weak but statistically significant approval
2. Overall appeal
9.0%
strong and statistically significant approval
3. Terminological clarity
8.6%
strong and statistically significant approval
4. Overall usefulness
7.7%
not statistically significant different from neutral
5. Overall performance
6.9%
not statistically significant different from neutral
6. Navigational clarity
6.6%
strong and statistically significant approval

These interpretations accord quite well with the summary of reactions to the individual items. The results paint a mixed picture: they point to the existence of important difficulties with the current implementation but also provide support for some of the directions that have been taken. Average reaction to ADL may be described as "mildly approving."

Virtually no relationships were found between any user characteristics and reactions to ADL. It is critical to realize, however, that our sample of respondents is very non-representative of the population, whether that is the entire population of people in the country or just the eventual population of users of systems like ADL.

The only significant relationship is to the respondents’ sex, with females being less approving of ADL than are males. Female respondents were found to have higher rates of library use, fewer computer accounts, and used ADL a bit more often before filling out the survey, but none of these background variables were significantly related to their overall approval index. Given the small number of females (19), this finding should be considered very provisional in any case.

The comments to the open-ended questions paint a somewhat more negative picture of the ADL than do the Likert-scale responses; the great majority of the comments express problems or difficulties. This is consistent with the idea that people are more motivated to comment when they encounter problems than when something works. The comments are being evaluated as part of the ethnographic studies.

Ethnographic Studies

A team from the UCSB Graduate School of Education conducted ethnographic studies to inform the ongoing development of the web interface and its underlying library development. These studies describe and analyze user activities and interactions both in the physical workspace of the Map and Imagery Laboratory (MIL) in the UCSB Davidson Library and in the virtual workspace of the ADL web interface. A domain analysis of the user feedback obtained through the web interface and the online survey was also done.

Audiotaped Sessions of People Using the Map and Imagery Laboratory

Given that one goal of the Alexandria Digital Library (ADL) is to create the electronic equivalent of MIL's services and collections, and because the design of ADL originated in the actual practices of MIL, one of the studies is from audiotaped recordings of reference interviews conducted by MIL staff. Visitors to MIL were informed of the study and invited to participate. If they agreed, the staff person tape-recorded the reference interview, which resulted in thirteen recorded and transcribed sessions.

Two distinct modes of interaction between user and reference staff were discovered. One mode was for the user to pose a question and depend on the reference person to expand and develop the question and recommend appropriate resources. The other mode was for the user to direct the reference interview, tapping the knowledge of the staff person as needed. On analysis, these modes were expanded into four different patterns based on two domains of knowledge: task knowledge (along the continuum of familiarity with spatial information and its uses) and system knowledge (along the continuum of experience with the Map and Imagery Laboratory). The four patterns are:

    1. Limited Proficiency—user depends on reference staff to frame question and guide outcome.
    2. System Proficiency—user knows questions and depends on reference staff to select appropriate resources for desired outcome.
    3. Task Proficiency—user and reference staff frame question together for desired outcome.
    4. Full Proficiency—user directs the framing of the question to procure the desired outcome.
Videotaped Sessions of People Using ADL

Concurrent with the collection and analysis of the MIL reference interviews was the videotaping of users interacting with Alexandria Digital Library's web interface. As with the users of MIL, there were significant differences in how people interacted with ADL depending on the user's background knowledge in several areas. Unlike the MIL experience, however, users did not have the equivalent of a reference librarian to whom they could turn to reshape a question or to be guided toward a successful outcome. In the eight videotaped sessions of people’s interactions with ADL it became clear that success with the system is affected by the user’s background knowledge. This knowledge includes the following criteria: System Knowledge (computer platforms, WWW and browsers, programming and interface design, previous use of ADL, library search strategies) and Task Knowledge (maps as representations of geographic information, other kinds of geo-spatially referenced data). Users’ reactions were related to their backgrounds and previous experience with ADL. Only one group of users (the three people who were responsible for developing the ADL web interface prototype) could be said to have all of the knowledge required for greatest success with the system. These users did not get frustrated; they were able to play with the system and discover alternative methods for reaching the data they knew existed.

Analysis of User Feedback from Beta Testers and Comments Included in Survey Responses

The Education Team also conducted a domain analysis of the feedback comments made by beta testers while they were using the web prototype and of the comments that were submitted as part of the online survey of beta testers. The methodology was first to categorize the comments from the survey and the online feedback (using the categories of the survey itself). Three broad types of comments/request were found relating to: the purpose of ADL; the language, functions, and processes that need better identification and explanation; and the requests for additional data or additional functionality. Although many problems were identified, users recognized the complexity of the problem that ADL is battling to make understandable--that is, useful, predictable, and efficient--while still providing sophisticated functionality. Many of the problems identified may be seen partly as resulting from ADL's failure to express its purpose and potential adequately in the interface to a community of users with a broad range of expectations. The intended audience of the prototype and the knowledge and skills needed to use ADL successfully are not obvious. It is clear that many of the users studied did not understand what ADL was trying to do and where it is in the development process.

Target User Groups (Focus Groups)

The ADL User Needs Analysis Team decided to use a focus group technique to gather information from potential users of ADL – largely the same group that currently uses the Map and Imagery Laboratory. The scope of the potential user base, according to the NSF Cooperative Agreement, is any user who is searching for georeferenced information. To provide focus to this open-ended charge, three groups were chosen for Target User Groups:

Two Target User Group (TUG) meetings were organized and held during the year, one on August 23, 1996 (full day, prefaced by two-hour introductory meeting on August 12) and the second on January 13, 1997 (half day). At least three people from the local area were recruited for each TUG. They were asked to represent their user groups rather than focus exclusively on their personal task environments. They had varying degrees of acquaintance with the current ADL web interface. The purpose of the sessions was not to critique the current interface but rather to find out what these users would do with a system like ADL if it were all they would like it to be.

Each TUG session began with an overview of what was to be accomplished in t he session, followed by discussions in the individual focus groups, and then by a plenary session to compare and contrast the focus groups work. For the all-day meeting in August, each group identified user requirements in the areas of Content, Search, Retrieval, Processing, and Interface Design. They also selected (or added) user scenarios from a list that best represented the information tasks of their group. Following the meeting, team members categorized the requirement statements into groupings related to implementation. During this process, the statements were edited for clarity and statements were added to cover more fully the categories that were created. The second TUG meeting, which included many people who were not at the previous meeting, validated many of the points made during the first TUG meeting.

Three major results came from the TUG sessions: (1) characterizations of the three target user groups, (2) design issues, user requirement statements, and scenarios and (3) a model TUG session plan for others to use.

 

LESSONS LEARNED

The challenge now is both to identify what we have learned about ADL users (potential and current) that is useful for design and implementation and what we have learned about the evaluation techniques themselves.

First, we look at four questions:

  1. What have we learned about our users?
  2. What have we learned about the evaluation and user study approaches?
  3. What have we learned about the ADL interface?
  4. What have we learned about the functionality and content of ADL?

What have we learned about our users?

Most of the "early adopters" (the beta tester group) of ADL on the web are a highly educated, information-savvy, worldwide group of Internet users, as indicated by the beta tester demographics and the survey responses. The survey responses indicate a range of ages from 17 to 70 years, with a mean of 35-36 years and almost a 4-to-1 ratio of male to female. We are handicapped by drawing any inferences from this survey data to our larger user group since surveys were only returned by approximately 4% of the beta user population. Likewise, we cannot find significant correlations between user characteristics and survey responses because (1) the number of surveys returned was small and (2) the survey population lacked sufficient representation of user characteristics such as age, formal education, and library use. The exception to this is that there is a significant difference in the overall negative/positive responses to survey questions based on sex; females were more negative. But this is based on only 19 female, so we can only note it as interesting; we cannot make more general inferences.

Our Target User Group (TUG) activities gave us a better picture of the characteristics of three user groups in relation to the use of geospatial information and data. The three TUGs - earth scientists, information specialists, and educators - indicated through their choice of the user scenarios that best represented them and their statements of user requirements that they have very different task environments and expectations of a system such as ADL in terms of functionality and content.

The groups can be clearly distinguished from one another.

Whereas the TUG activities allowed us to understand user expectations, the ethnographic studies focused on users in action - while they were using either the traditional library or the ADL web interface. From the MIL reference interviews and the videotaped interactions with the web interface to ADL, we learned that the experience and background knowledge users have affect the way they interact within either the physical workspace or the virtual one. In the physical workspace, users are able to situate themselves within any part of the environment that they find familiar, such as the reference desk and sign-in book they see upon entering the facility. From this point, at least in the physical workspace of the MIL, users are able to interact with a reference staff person who facilitates the searching and retrieval process. The data show that all the users, regardless of where they are on the task/knowledge proficiency continuum, work well with this facilitated model.

In the virtual workspace of the web interface to the ADL, there is no person with whom the user may interact. At this point there is not even a virtual one, although there are the options to select the comment button or to email the system administrator. The users' background and experience are even more important to the interaction with ADL, since it is up to the users to determine how to search the system within the given structure and to use that structure to select what they want to retrieve. Moreover, successful interaction with the web interface to ADL assumes that the users have knowledge of a broad spectrum of background information and skills: multiple computer platforms, web browser, library search strategies, maps as representations of geographic information, and geo-spatially referenced data and their uses. We also found that knowledge of programming and interface design and previous experience with ADL supported successful interaction immensely, though it was not assumed initially that users would come to the system with such knowledge.

In both the physical and virtual workspaces users expected to learn from the interaction, to be educated about the process, and to learn additional strategies for future interactions. In the physical workspace of MIL, this process was implicit in the interaction, with the reference staff working with the user to find the best way to accomplish the task. In the virtual workspace of ADL, the learning process was not as completely supported and users expressed their frustration at various points throughout their interaction with the system as a result. Commonly users would wonder aloud, or write in their comments, what they were doing "wrong" or what they didn't understand, suggesting that they felt the difficulty resided more within themselves than it did within the system. More proficient users tended to identify problems with the system.

The language of the interface (i.e., the terminology used in the ADL interface) was not a problem for the ‘early adopters’ who filled out the beta tester survey. It was, however, identified as a problem through the ethnographic studies, both in the videotaped sessions and in the user feedback through the web interface. This difference can probably be explained by the computer/system skills of the beta testers and their knowledge of geospatial information and data versus the comparatively uninitiated subjects of the ethnographic studies.

The effect of personal characteristics on reactions to ADL could not be derived from the studies undertaken, except for the slight indication of a sex-related reaction cited earlier.

What have we learned about the evaluation and user study approaches?

We are interested at this point in our evaluations in the return on investment (ROI) of the various studies of users. Did the studies yield a commensurate quantity and quality of information that can be used for digital library system modeling and development? Can we use our limited evaluation resources in better, more effective ways? What basis of understanding do we have now to guide what we do next? A related concern is how to structure incentives in such as way to initiate and keep user participation. In other words, the ROI for users who participate in our studies must also be a concern.

Dr. Judy Weedman’s study on the process of involving clients in application-oriented research (Weedman, 1996) found evidence of a basic instability in the collaboration between users and designers that stemmed from the incentives for participation of the two groups. From the system designer perspective, the investment in user studies must be beneficial to the design process. In order to get the necessary feedback for requirements analysis:

Designers must convey what it is they need to know about users’ work; users must identify the relevant dimensions and communicate them in ways that give direction to technological innovation. Users are led by the constraints of their experiences to conceptualize improvements in terms of current work; designers are led by their knowledge of possibilities and the desire to contribute to advances in their field to seek more fundamental changes. Consideration of any significant change requires users to extrapolate from current work practice to an imaginary future work practice. The time required to converge on mutual understandings from these two different starting points is extensive, and therefore costly. These costs are particularly difficult for users because the investment does not lead directly to scientific gain in their own discipline.
Weedman adds that "[a] second stage when resource investment is heavy and asymmetrical is that of testing. Users have difficulty with the need to repeatedly stress the system when the reward is identifying problems…." This leads us to evaluate our study methods on the basis of the investments that were made by both the designers and the users and on the perceived benefits that were received. The following discussion of each of the study methodologies looks at the ROI perspective and changes that could be made to maximize the benefits on both sides.

When the beta tester registration process was conceived, it was not anticipated that so many beta testers would be accepted; the analysis of the data from so many testers was not considered when forming the open-ended questions. If this type of registration is implemented again, the analysis of the data will be considered when the questions are created and only those questions that will yield useful information for ADL will be asked. Registration information will be much more useful also if individual use patterns on the ADL system can be linked to the registration information. Under these circumstances, the self description of the user’s country will give us a picture of the worldwide distribution of users, but we should investigate whether there are automatic ways of collecting this information based on analysis of web traffic patterns. It will be useful to ask about language proficiency to give us an indication of the strength of need for a multi-lingual interface. It will be useful also to see if user occupation has a strong correlation to patterns of use. This information will allow us to expand our understanding of the characteristics of different groups of users that we are getting only from our Target User Group activities now. Learning from first-time users how they discovered ADL (the referral source) can be used to target outreach and judge the effectiveness of various avenues for the presentation of ADL to attract users. It also satisfies a curiosity that ADL staff has about how word of their project is spreading within the potential user community.

To make the registration data useful, however, multiple choice questions must be provided. The list of valid choices should be carefully chosen to provide detail of the most practical value to ADL design. For example, high-level geographic regions can be listed for selection with a box to enter the country name if we continue to collect geographic location by self-description. The categories developed during the course of the current analysis can be provided for referral source, organization, and occupation, with a box for a more specific entry. The provision of bounded lists for selection will enable a more accurate count for analysis and may also make the registration form easier complete. For the user, a return on investment option for us to investigate is posting the cumulative statistics of registered users online so individuals can see how they fit into the whole.

The survey itself has been shown to be a useful and generally well-written instrument. It is undoubtedly a bit too long. Given the difficulty several respondents had in getting the system to produce anything in response to query attempts, such a detailed survey seems even less appropriate. Also, the reverse-sensed questions (i.e., approval or disapproval) may have been confusing to some respondents, as in disagreeing with statements of disapproval. An important conclusion with respect to the survey is that it will be used best for comparative analysis, such as tracking differences in reactions by different classes of ADL users or tracking the effects of changes to the interface. The absolute level of approval or disapproval is somewhat less informative, given uncertainty about respondents' baseline tendencies to approve or disapprove, to agree or disagree.

The Likert-scale questions provided quantitative data to complement the qualitative data generated by some of the other evaluation methodologies, such as several open-ended questions in the survey. Quantitative data are often easier to interpret, more easily generalizable, and more useful for analytic comparisons. We believe that the quantitative items paint a more accurate impression of the overall or average reactions to the ADL. On the other hand, open-ended questions allowed respondents to make comments on the ADL system in their own words and with greater detail and explanation, providing a richer picture of how some respondents reacted to the system and also more directed suggestions for change.

The results from the survey provide hints of characteristics that may be true of a portion of the ADL user population. The quantitative data from the Likert-scale survey questions provide a profile that can be used as a benchmark for future survey results; the qualitative data can be further mined for insights into the ways in which the interface or the system worked or didn’t work from the users’ perspectives. However, it is clear the survey approach either needs to be administered so as to obtain data from a targeted set of users, or to obtain a statistically random sample of the user population (or subset) from which inferences can be made from the data collected. To have any hope of getting cooperation from users for this survey structure, the incentives for participation will have to be carefully considered. The survey also will have to be focused on a carefully selected set of questions that will give both a snapshot of user reactions and longitudinal patterns through the repeated use of the questionnaire (for example, on a quarterly basis or with each new releases).

The ethnographic studies provide a view of everyday practices that may sometimes appear so commonplace as to be invisible (Evertson & Green, 1986), and triangulation across different sets of data, with different analytical lenses, makes visible what may go unnoticed (Sugar, 1995). The data were collected within circumstances that were practical and real to the users, circumstances that were situated within contexts that affect the quality of the interaction. The context of the interactions in the MIL studies includes a sense of purpose (the difference between asking a user to experiment versus observing a user who has his or her own purpose for using the system), and the human connection (not just wanting assistance, but knowing that one is being heard and one's actions make sense and have meaning). The context of the interactions with the ADL web interface captured actual use of the interface – a view on which designers depend to understand how their "product" performs when used by others. Given that these studies reveal situated uses of a system under development as well as user interaction in the library environment that the system is seeking to augment or emulate, the ROI is judged to be very high. The end result is insights into potential and real uses unobtainable in any other way. The participants in the audio and video taping sessions were glad to have the opportunity to play a part in the development of ADL.

The Target User Group participants appreciated the opportunity to be involved also. They indicated that they were learning something useful and enjoyed being incorporated into the design process. Despite this enthusiasm for participating, it was still difficult to get them all together for each session; we had a different group each time we gathered. The difficulties with keeping the same people coming to each session can be traced partly to the inevitable problems of scheduling. This was particularly difficult with teachers. The conclusion is that we need to make special arrangements to involve teachers, on their own turf and at times that they can meet outside of their classroom obligations.

The Target User Groups were also small and local; therefore, the information we gathered from them is interpreted as case studies and not as truly representative of the large user groups that they were selected to represent. They could only be expected to represent themselves in the final analysis. Still, the TUG activity was very productive in giving us a basic set of user scenarios and user requirement statements which the ADL staff used to focus subsequent discussions within the team and with the ADL Design Review Panel which met in February 1997. These are user-grounded requirement statements and scenarios and thus valuable information for system designers.

The development of a model for Target User Group meetings was another outcome of this activity. The model now needs to be tested in other locations to see if it is usable outside of the ADL environment and to observe whether complementary or contradictory patterns emerge.

What have we learned about the ADL interface?

The select group of users who returned beta tester surveys was very favorably impressed with web interface, finding it "wonderful" and "stimulating." They liked the tutorial and the ease of exploring the system. The terminology used in the interface was not a problem for them. They judged the screen layouts to be well done. The Map Browser, the Catalog, the Footprints, and the exit procedures all got high marks. They agreed that the needs of experienced users were taken into account. The free-text comments contained more negative comments than was indicated by the Likert scale responses.

In the TUG sessions, the interface ideas that surfaced (even though it was not a specific topic scheduled for discussion) covered ideas for a future interface as well as reactions to the current web interface. Reactions to the current interface led to requests for better tutorials, better ability to track where you are in the system, better quality control of the metadata, and simplified pages - especially the first page. Ideas for the future included user profiles, applet forms, context notes, and query by example.

Other good ideas for enhanced search capability and search help that came from the groups included more support in terms of customized screens, canned queries, intelligent assistance, and more explicit directions about what to do next. The idea that the system needs to be backed up by personal attention to help users work through their problems received general support. Simpler search methods (natural language searching and the use of icons for types of data) were requested. They also requested the ability to sort the result sets and to have support for iterative searching - that is, moving easily from a previous query to a new one.

It is apparent from the more experienced users' interactions and comments during videotaping sessions that the interface does provide functions and services that allow access to ADL's holdings. However, even for the more proficient users there are problems. For example, among the developers, there was confusion about the consequences for the ADL interface of hitting the "back" and "stop" buttons on the web browser. For the limited proficiency users, the web pages contained too much information, the options to customize were confusing, and it took too long to reload pages once a command was sent. For the less proficient users just getting to the data was problematic. It was difficult to discern the purpose of the ADL, what holdings were actually available, or what could be done with the data even if the user was able to search and retrieve a particular set. For all levels of use, there was the consistent request for the human connection, a way for the user to address another person. Users also suggested an 800 number helpline, an email address, FAQs, and improved context-sensitive help (the help pages in the currently available system sometimes do not match the pages to which they are supposed to be linked).

Many of the problems identified may be seen partly as resulting from ADL's failure to express its purpose and potential adequately in the interface. The intended audience of the prototype and the knowledge and skills needed to use ADL successfully are not obvious. It is clear that many of the users studied did not understand what ADL was trying to do and where it is in the development process.

 

What have we learned about the functionality and content of ADL?

For content, each target user group had a different selection of resources that they would like to see in ADL. Historical coverage was mentioned in some way in each group. Both the Earth Scientists and the Information Specialists mentioned aerial photographs. The Education Group, whose members didn't know as much about what is available, mentioned less technical sets of information. The content list of the Earth Scientist group confirmed the choice of sets to add to the library already made by the ADL staff.

Ideas for retrieval functions focused on: the delivery of an end product to users (printed copies and electronic copies); the ranking of retrieved sets by some parameter; being able to use the results of previous searches to build new queries or combine result sets; and on being able to visualize the result sets and browse items in the sets.

The TUG discussion on processing focused on: various customization capabilities, such as extracting portions of an image or combining information from various sources to build a map; the availability of GIS functionality and modeling software; format conversion; and on the architectural design problem of where this type of software should "live" - at the server, at the client, or at a third-party vendor.

To all but the most experienced users of ADL, the functionality is not readily apparent. Depending on the users' proficiency level, they will access more or less of the functionality, experiencing greater or lesser frustration. Those who are able to access a variety of the system's functions are sometimes even more frustrated than those who do not know the functions exist. This problem may be resolved as ADL is able to load more complete datasets and increase the overall contents of the system.

The lack of extensive content in ADL is a major problem for users trying to make the system work for them. For now, ADL is a research project which is building a testbed for the purpose of developing and testing new digital library functionality. Until the means for loading more content can be found for identifiable user communities, the interface must do a better job of informing users of this limitation.

The immediate implications from these findings are that the ADL project needs to put a high priority on increasing the contents of the system and on providing a redesigned web interface to access those contents and to inform users about the limitations of content. For digital library design in general, the findings suggest that, although digital libraries provide a different environment and support new uses, they are still perceived as social environments and therefore should maintain a sense of person-to-person connection.

 

APPLICATION TO DESIGN AND IMPLEMENTATION OF ADL

Usefulness of evaluation activities to design

A relatively small team of scientists and engineers design and implement the successive releases of ADL. From this team's point of view, the most valuable contribution of the evaluation activities is to connect the design and implementation team with the ADL user community.

The Target User Groups help focus the system design on the needs of specific communities; in the absence of such focus, ADL would wind up trying (unsuccessfully) to be "all things to all users." This input is critical in establishing and refining the design for each successive iteration of ADL.

The beta tester demographics and survey results tell us whether the system is reaching its target communities, and can potentially tell us how satisfied those communities are with ADL. It also provides invaluable feedback on specifics of the current ADL implementation (e.g., whether a particular dialog or sequence of operations is usable).

The ethnographic studies are especially valuable to the ADL designers and implementers for the "Aha!" experiences they provide. This effect is difficult to quantify but undeniably powerful -- it is one thing to read a survey result of "weak dissatisfaction" with a particular system component, and quite another to watch a videotape of a frustrated user wrestling with that component. It has also been extremely helpful to have educators evaluate and criticize the language (instructions, explanations, etc.) embedded in the ADL interface.

 

Evaluation cycle with changing prototypes

As a research system, ADL is a necessarily a moving target. We have found that, everything else being equal, technology will drive the system much more aggressively than user needs, if for no other reason than that technology is both intelligible and innately attractive to the engineers building the system. User needs, on the other hand, require both preprocessing and intermediation to be expressed as engineering criteria.

This "technology push, requirements drag" phenomenon tends to drive ADL towards a more rapid development cycle than the evaluation methodologies are comfortable with. For example, it is extremely difficult to track how one learns to use ADL over time if the user interface, or the probability of finding what one is looking for, are constantly changing.

The flip side of this phenomenon can also lead to user frustration. As of this writing, we face a situation where the ADL user interface is being redesigned with substantial input from the evaluation activities. Meanwhile, users from whom these evaluations were collected grow increasingly frustrated with the current interface. We are reluctant to divert resources to "patching" an interface that we are about to discard.

The solution, only partially achievable in a research environment, is to strive to bring the development and evaluation cycles into rough synchronization. The implementers must resist the urge to constantly "fiddle" with the user-visible portions of the system, and the evaluators must structure their activities to provide timely, regular feedback to the implementers.

 

Designing for target user groups

As a research system, ADL exists more to demonstrate capabilities than to serve communities. However, we have taken the position from the beginning that the best way to demonstrate a digital library is to build one that is as close to operational as possible. This implies a target user community, which in turn has two major impacts on the system design.

First, a user interface for a particular community will necessarily be different from one that is designed to simply showcase the system's capabilities. The latter kind of interface is relatively easy for engineers to build by themselves, whereas the former requires input from the target user community, almost always mediated by a nontrivial process of translation between the user and engineering domains.

Second, the library's content will be quite different if it is selected to satisfy the needs of a particular community. Again, engineers can easily load the library with data that exploit specific system capabilities (e.g., image manipulation), but they will likely have no clue as to what data are important to particular external users.

The major advantage of designing for a particular user community is that it gives the designers some goals that are easier to define and engineer than "make the system look good," and it gives the system a high probability of success, at least with the target community, if those goals are met. The major disadvantage is that the system could wind up embedding idiosyncrasies of the target community (e.g., be exclusively oriented towards remotely-sensed imagery), and sacrifice easy generalization to the needs of other communities.

 

How-to: development processes that work for interface design

Designing good user human/computer interfaces is still as much an art as a science. The primary issue we face is the tremendous semantic gulf between the evaluators (educators, psychologists, and information specialists) and the implementers (engineers and computer scientists). To help bridge this gulf, we have selected as our chief user interface designer an artist with extensive hypermedia experience. The designer collects ideas in one-on-one sessions with both evaluators and implementers, uses these ideas to draft interface storyboards, and then presents the storyboard to the interface design team (both in person and online) for critiques and revisions.

The value of having a designer, who can render issues in a common visual framework, cannot be overestimated. It is much easier for the disparate groups involved in our interface design to agree on picture, than on a textual specification rife with overloaded or ambiguous terminology.

 

CONCLUSION

The results of these evaluation studies are being factored into the development of a new web interface for ADL. When this is completed, a new round of evaluation activities will begin based on the findings reported on here and in the more detailed internal reports. User needs analysis continues, with an extra effort toward getting the participation of classroom teachers. In addition, members of the recently established ADL Design Review Panel will provide us on-going dialogue and testing during the remainder of the project.

 

ACKNOWLEDGEMENTS

Funding from NSF, DARPA, and NASA under NSF IR94-11330 supports this work. We would also like to thank other members of the Alexandria Digital Library Project for their cooperation with our studies and the users who have made the studies possible.

REFERENCES

Alexandria Digital Library Prototype CD. (1995). Redlands, CA: Environmental Systems Research Institute, Inc; Alexandria Digital Library, UCSB.

Andresen, D., Carver, L., Dolin, R., Fischer, C., Frew, J., Goodchild, M., Ibarra, O., Kemp, R.B., Kothuri, R., Larsgaard, M., Manjunath, B.S., Nebert, D., Simpson, J., Smith, T.R., Wells, A., & Zheng, Q. (1995). The WWW prototype of the Alexandria Digital Library. Proceedings of the International Symposium on Digital Libraries, August 22-25, Tsukuba, Japan (pp.17-27). Tsukuba, Japan: University of Library and Information Science.

Buttenfield, B.P. (babs@colorado.edu). Personal communication. Please contact her directly for more information about the ADL log analysis results.

Buttenfield, B.P. & Kumler, M.P. (1996). Tools for browsing environmental data: The Alexandria Digital Library Interface. Proceedings of the Third International Conference on Integrating Geographic Information Systems and Environmental Modeling. Santa Fe, New Mexico, January 21-26, 1996. Also available at http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/buttenfield_babs/babs_paper.html.

Evertson, C. & Green, J. (1986). Observation as inquiry and method. In M. C. Wittrock (Ed.), Handbook of Research on Teaching. 3rd ed. New York: Macmillan.

Fox, E. A., Akscyn, R. M., Furuta, R. K. & Leggett, J. J. (Eds.). (1985). Digital Libraries (Special section). Communications of the ACM, 38(4), pp.23-96.

Marchionini, G. (1995). Information seeking in electronic environments. Cambridge University Press.

Mulaik, S.A. (1972). The foundations of factor analysis. New York: McGraw-Hill.

National Science Foundation. (1993, September). Research on Digital Libraries, Program Guideline NSF 93-141. Washington, D.C.: National Science Foundation. http://www.nsf.gov/pubs/stis1993/nsf93141/nsf93141.txt.

Norman, Donald A. (1988). The psychology of everyday things. New York, Basic Books.

Spradley, James P. (1980). Participant Observation. New York: Harcourt, Brace, Jovanovich College Publishers.

Stake, Robert E. (1995). The Art of Case Study Research. Thousand Oaks: Sage Publications.

Sugar, W. (1995). User-centered perspective of information retrieval research and analysis methods. In M. Williams. (Ed.). Annual Review of Information Science and Technology, >


Transfer interrupted!