User Criteria in Relevance Evaluation: Toward Development of a Measurement Scale

Linda Schamber and Judy Bateman
School of Library and Information Sciences
University of North Texas
P.O. Box 13796, Denton, TX 76203
Tel: (817)565-3568 Fax: (817)565-3101

E-mail: Schamber@lis.unt.edu
Bateman@lis.unt.edu

Abstract

Although many information scientists recognize the importance of the criteria underlying information users' relevance judgments, the field lacks a generally accepted technique for collecting data on users' judgments based on these criteria. This paper presents preliminary findings of a long-term project intended to develop a measurement instrument based on users' relevance criteria. The primary objective of the project in its exploratory stage was to identify simple terms and groups of terms that clearly and consistently describe users' concepts of criteria. As the list of criterion concepts was refined, a secondary objective was to explore how users employed the criteria in making evaluations. In the first validation test, users were asked to sort 119 criterion terms from previous studies into conceptually related groups, based on their own general perceptions of information seeking. Their responses provided direction for reducing the number of terms to 83. In the second validation test, users were asked to group the 83 criterion terms conceptually and to rank their relative importance as applied to their own information problem situations. In addition to presenting results of these initial efforts, the paper describes conceptual and methodological challenges in long-term development of the instrument.

Introduction

Information scientists have long acknowledged the fact that many factors contribute to human judgments of relevance in evaluating the effectiveness of information retrieval. Yet the field lacks a generally accepted technique for collecting data regarding the criteria or reasons that underlie relevance judgments, particularly judgments by end-users.

The goal of this long-term project is to develop a relatively simple measurement scale, based on user criteria, that will yield results applicable to the study of user evaluations in any type of information seeking and use environment. The project consists of a series of validation tests in which a set of user criterion terms is progressively clarified and consolidated into smaller sets of terms through conceptual analyses of responses from subsequent groups of different

users. This paper describes the first two tests, which were conducted to determine how users interpret criterion terms drawn from previous user-based relevance studies. These exploratory tests revealed conceptual and methodological challenges involving the semantic ambiguity of terms and verified the importance of situational context in scale development and use. Recognizing and dealing with these challenges has moved the project forward considerably and will affect the final design of the scale.

Background

Relevance Criteria

Since the beginnings of information science, relevance has served as a fundamental criterion for evaluating the effectiveness of information retrieval (IR). Much debate has revolved around definitions of relevance, the range of factors contributing to relevance judgments, and techniques for employing relevance judgments in quantifiable measures. For our purposes, relevance is defined as end-users' perceptions of the potential of certain information to resolve their problems in the context of their information seeking and use situations. Relevance judgments are defined as users' decisions to accept or reject specific information items at a certain time. Relevance criteria are defined as the factors or reasons that contribute to users' relevance judgments.

In the past decade, a number of researchers collectively have elicited more than 100 criteria directly from end-users describing their perceptions of information sources in the context of their own information problem situations. Even casual examination of their findings reveals conceptual overlaps among the majority of user criteria, with concepts such as "aboutness" or subject similarity, accuracy, and availability occurring in every study. These overlaps confirm a longstanding belief in the existence, at some broad level, of a finite range of criteria--perhaps a dozen or fewer--that are applied by users regardless of type of situation (see Schamber, 1994). A measurement scale based on common user criteria potentially would yield far more useful data than the simple judgments of relevant/not relevant traditionally employed in most IR evaluation research. The data could be applied, for example, in pinpointing areas for improvement in content and presentation of IR output.

Criteria from three previous studies were chosen for initial validation testing. In all three studies, users' descriptions of their information problem situations and decisions were elicited through some form of open-ended questioning and their responses analyzed in order to identify criteria.

Schamber (1991) interviewed professional users of weather information employed in aviation, electric power utilities, and construction. She asked them to evaluate their information sources, which included weather information systems, mass media, themselves (personal experience), and other people. Analysis of their responses yielded 10 criteria at a broad summary level for evaluating information, source, and presentation, and 21 criteria at a narrower detail level (see also Barry & Schamber, 1995).

The other two studies involved faculty and students in several disciplines who had requested searches for text-based information in an academic library setting. Su (1993) conducted a post-search interview in which she asked users to explain their ratings of the overall success of the search. Her analysis of their explanations yielded 26 success dimensions or criteria. In addition to criteria for retrieval output, these included criteria for quality of IR interaction.

Barry (1994) conducted a post-search interview in which she asked users to evaluate retrieved items individually and explain their evaluations. Her analysis identified 23 criteria in seven categories: information content of documents, the user's previous knowledge, the user's beliefs and preferences, other information and sources within the information environment, sources of documents, document as physical entity, and the user's situation.

These three studies seemed to offer the most comparable and thoroughly defined criteria elicited directly from a wide variety of users in different types of information problem situations. The set of criteria selected for the first tests was not intended to be comprehensive, but rather to provide reasonable initial coverage of criteria across information settings. In every test, respondents will be encouraged to add criteria of their own.

Scale Development

In information science and related fields, many techniques exist for evaluating IR system performance, including a subset of techniques involving human (intermediary and end-user) relevance judgments and relevance-based ratio measures (e.g., recall, precision). In recent years, researchers have tended to focus more on end-users' evaluations of information content, products, and services. Unfortunately, no relevance-based evaluation technique to date is readily applicable across information settings, especially everyday settings that do not involve formal IR systems. Further, simple relevance judgments per se (or any general judgments of usefulness or satisfaction) are insufficient indicators of the many criteria users employ in accepting or rejecting IR system output.

Judgments based on user criteria, however, raise fascinating questions about the relative importance of various criteria, tradeoffs in criterion importance, interrelationships among criteria, relationships between criteria and tangible document and system features, and relationships between criteria and users' information problem situations and stages of information seeking. These questions strongly suggest the need for a valid and reliable criterion-based instrument for collecting evaluation data in any type of information setting. The potential for using relevance criteria in the development of such an instrument was demonstrated by Wang (1994), who developed a flexible model of the document selection process based in part on 11 user criteria drawn from the studies by Barry, Schamber, and others. In Wang's model, final document selection is based on decision rules and tradeoff principles such as elimination (finding reasons to reject) and multicriteria (acceptance based on more than one criterion).

Ideally, a truly user-based instrument will be derived from users' own criteria and validated throughout its development by subsequent groups of different users. This process will not be easy. Researchers in the original user criteria studies were considerably challenged by the semantic ambiguities of natural language when interpreting, categorizing, and coding users' responses for analysis. It is difficult to predict how subsequent users will interpret and organize the criteria, especially outside the context of the original problem situations, without many rounds of testing.

On the other hand, quantifying user judgments is not difficult. The benefits of ranking, category rating scales, and continuous scales for assessing relevance have been promoted since the earliest relevance research in the 1960s (see Schamber, 1994). A particularly useful continuous-scale measurement technique is magnitude estimation, which is said to be more reliable than category scaling and easier to administer and analyze (see Eisenberg, 1988; Rorvig, 1988). The potential of magnitude estimation for instrument development was demonstrated by Janes (1991) in his motion index, used to record rapid changes in relevance judgments over time. Janes employed a magnitude estimation technique in which respondents made a mark on a 100mm line indicating their judgments of the degree of relevance as progressively more elements of citations were shown to the respondent. It is reasonable to suppose that judgments based on relevance criteria can be measured in the same way.

This paper presents the results of the first two tests in the long-term project. The goal of the project is to develop a user-validated measurement scale based on user criteria for relevance evaluation. It is expected that the final scale will:

Research Questions

Development of the user criterion scale will require progressive refinement through many validation tests with actual users. The primary objective of the first two stages of the project was to identify simple terms and groups of terms that clearly and consistently describe users' concepts of criteria. A secondary objective was to explore how users employed the criteria in making their own evaluations. The research questions were:

  1. How consistent are users in interpreting the meanings of criterion terms?
  2. How consistent are users in grouping criterion concepts?
  3. What patterns are evident in actual uses of criteria in relevance evaluations?

Methodology

The exploratory portion of the project involved two validation tests in which users were asked to interpret and sort criterion concepts selected from studies by Barry (1994), Schamber (1991), and Su (1993). The concepts were presented as stimuli in the form of single terms, phrases, and short sentences, all of which are referred to below as "terms."

User Test 1

The stimulus set consisted of 119 criterion terms (Table 1), each printed on a separate card. The cards were randomly ordered the same way for each respondent. The respondents were 31 graduate students in information/library science. They were asked to interpret the conceptual meanings of the criterion terms with regard to their own general information-seeking process and not to a specific information-seeking situation.

The test consisted of three parts: (1) a sorting task in which respondents sorted the cards by placing cards that represented the same concept into the same pile, with no limit on the number of piles; (2) a naming task, in which they labeled each card pile with the term that best described the concept represented by the pile as a whole; and (3) an interview in which they were asked to explain how they had sorted the criteria, why they had put particular criteria together, and what concept each pile represented. Subsequent questions probed for responses regarding the criteria most important to the respondent, those easiest to sort and most difficult to sort, and any additional comments about the difficulty of the tasks. Interview responses were recorded on paper and on audiotape. It was emphasized that there were no time limits on the tasks, and no right or wrong ways to group and label the cards or answer the questions.

User Test 2

Based on responses in User Test 1, the stimulus set of 119 terms was reduced to 83, with many edited to shorter or clearer phrases or single terms (Table 2). The new cards was randomly ordered the same way for each respondent. The respondents were 28 graduate students in information/library science. They were asked to interpret and apply the criterion concepts in the context of their own information-seeking problem situations: specifically, their class assignment to write a term paper on a current trend or issue in the field. Several weeks before the paper was due, they were assigned to submit a topic proposal with a citation list of potential sources.

When students submitted their topic proposals, they were asked to perform five tasks: (1) relevance judgment, in which they chose the one citation from their topic proposal they expected to be the best source for their paper; (2) explanation, in which they stated why they selected this as the best source, where they found it, and whether they had actually read it yet; (3) evaluation, in which they selected cards with criterion terms that applied to this source; (4) sorting, in which they grouped the selected cards in piles with others that seemed to represent the same concept (putting the card with the most appropriate term on top); and (5) ranking, in which they numbered the piles in order of the importance of the criteria in determining choice of the citation. Data consisted of written responses, including open-ended explanations, and numbered card piles.

After the paper was completed, students in one class section were asked to select the best source from the citation list in their papers. This source may or may not have been the same they selected from their topic proposal. On a two-page printout of the 83 criteria, they selected, grouped, and ranked the criteria that applied to this source. Eleven respondents completed the post-assignment tasks; seven completed both pre- and post-assignment tasks. Data consisted of the marked-up criterion pages and written responses.

Results

The results of the exploratory tests involved user interpretations of criterion term meanings in the grouping tasks, their general approaches to grouping criteria, and, for Test 2, their selection of criterion concepts for making evaluations.

Criterion Term Meanings

Research question 1 concerned users' consistency in interpreting criterion term meanings. In order to develop an instrument based as completely as possible on user perceptions, we took most of the stimulus terms for Test 1 (Table 1) almost verbatim from the original studies, where they were elicited directly from users in real information-seeking situations. As expected, however, Test 1 respondents were highly inconsistent in interpreting the meanings of the terms out of the context of the original studies. For example, lightweight was interpreted as either a superficial presentation of information--the original meaning--or a physical feature of a book or computer. Similarly, I can track it was interpreted as either tracking information on a computer screen--the original meaning with regard to weather information--or as tracking information through the use of bibliographic references. Respondents often expressed confusion about terms and even suggested deleting certain terms from the list. We used responses such as these in editing the Test 2 term set (Tables 1 and 2).

In Test 2, respondents were generally consistent in grouping certain terms as synonyms. Table 3 shows the five major groups, including the frequency with which all terms appeared in a given group and the frequency with which some terms appeared in a given group. No one term in these groups was consistently selected as the most appropriate term. The meanings of the terms tended to represent characteristics such as currency that could be determined easily by users. Beyond these major groups, a few terms were handled in unexpected ways. For example, several respondents used both in-depth and overview to describe the same source instead of interpreting them as mutually exclusive concepts. Other responses, such as the grouping of interpretive and descriptive as synonyms, apparently reflected unfamiliarity with these distinct types of research approaches as mentioned by users in the original studies. One respondent selected applied and added "to my topic" to the card.

Criterion Term Groupings

Research question 2 concerned users' consistency in grouping criterion concepts. Test 1 respondents displayed considerable variation in their approaches to grouping terms conceptually. Several sorted on the basis of how well the criteria described media formats such as a computer or a book. One created a card pile he labeled "describes a camera." Several respondents sorted by subject-oriented use contexts, such as politics, history, life, books, education, and computers. One created only three piles of cards: criteria used to select recreational materials, criteria used to select research materials, and "unsortable." An entirely different approach was taken by some respondents who grouped criterion concepts as subjective or objective. These respondents often had an "unsortable" or "I don't know" pile of criterion cards that they could not fit into this scheme.

Despite instructions to place positive and negative forms of the same concept (e.g., familiar and unfamiliar) in the same concept piles, several Test 1 respondents placed them in separate piles. Some even searched for antonyms, as did the respondent who said, "I don't have a term to match lightweight, I don't have heavy." Also, because the criteria were presented as phrases (it is clear, he/she is well known, I like it), some respondents created "it," "he/she," and "I" piles and one had "1st party" and "3rd party" piles.

Test 2 respondents were far more consistent in their approaches to conceptual sorting. Table 3 shows the major groups, pertaining generally to concepts of aboutness, currency, availability, clarity, and credibility. A majority of the respondents selected at least some terms in the top two groups, with 71% choosing aboutness concepts and 64% choosing currency concepts.

Despite differences in conceptual approaches to the sorting task in Tests 1 and 2, the number of term card piles remained the roughly the same. In Test 1, with 119 terms, the number of piles ranged from 3 to 17, with an average of 10 per respondent. In Test 2, with 83 terms, the number of piles ranged from 1 to 20, with an average of 7 per respondent.

Criterion Use Patterns

Research question 3 concerned patterns in uses of criteria in relevance evaluations. In Test 2, respondents were asked to select criteria they applied in evaluating the potential value of a specific document they cited in their class paper topic proposals. Because the criterion scale development project is still in an exploratory stage, results are reported here only to provide an idea of the kind of data the scale potentially can deliver.

The number of respondents who selected certain criteria (Table 2) ranged from 28 (100%) for about my topic to 0 for 10 criteria. A dozen criteria were selected by 14 (50%) or more respondents. The 10 criteria not selected at all either represented negative qualities, such as cluttered, or seemed unlikely to have pertained to the source being evaluated, such as interactive. In some cases respondents favored one of a pair of mutually exclusive criteria, as did the 5 who apparently understood and selected the research approach theoretical versus the 10 who selected applied. Two additional criteria generated by respondents concerned the inclusion in documents of interviews and of case studies.

One area that remained problematic in Test 2 was negative applications of criteria. Because many Test 1 respondents did not place positive and negative forms of the same concept (e.g., familiar and unfamiliar) in the same concept piles, we deleted most negative forms from the Test 2 term set. We then instructed Test 2 respondents to write "not" on a card (e.g., not familiar) if they had applied the criterion in a negative sense. Few chose to do so, possibly because the instruction was still confusing or because they simply had no negative evaluations of "best" documents. One respondent did add "not" to confusing and to theoretical.

Criterion rankings generally followed selection frequencies (Table 2) and criterion concept groups (Table 3). Concepts of aboutness were consistently ranked highest, with 23 respondents (82%) selecting about my topic or relevant or pertinent as a primary reason for selecting the source. This is not surprising in view of the fact that respondents were asked to evaluate the potential best source for their topical papers; it is also consistent with the findings of past research. The concept of currency, represented by the terms current, recent, and up-to-date, was also important: 8 respondents ranked this concept in their first or second group and 8 respondents ranked it in their third group. This was consistent with the class paper assignment, which required that sources be published within the last five years. Other criteria that appeared in highly ranked groups were available and/or easy to get and/or accessible; readable and/or well-written and/or understandable; reliable and/or credible; and expert and/or well-known.

Discussion

Generally, the results of the grouping tasks were helpful in clarifying users' perceptions of the meanings of individual criterion terms as well as conceptual similarities among terms they grouped together. These understandings were enhanced by users' generation of new criteria and their open-ended explanations. The average number of term groups per respondent (10 in Test 1, 7 in Test 2) tended to confirm the expectation that the final measurement scale will consist of a dozen or fewer broad criteria that can be applied in any type of information setting. The results of the ranking tasks confirmed findings of past research that concepts of aboutness are most important in users' relevance evaluations.

Perhaps the greatest challenge in any relevance research, from the researcher's point of view, is disentangling semantic and conceptual problems from methodological problems (see Schamber, 1994). Overall, the results of our exploratory user tests of relevance criterion concepts were successful in helping us perceive various matters through users' eyes, notably the challenge of interpreting the meanings of criterion terms and that of performing certain tasks. Further, our results, like those of previous studies, suggest that the same factors that affect relevance judgments per se also affect performance of relevance judgment tasks. This is most evident in the importance of situational context and user knowledge in relevance evaluation.

The difference between sorting relevance criterion concepts outside the context of a personal information problem situation in User Test 1 and dealing with them within such a context in User Test 2 clearly affected respondents' task performance. Test 1 respondents tended to "made sense" of the criteria by creating their own contexts for sorting the terms. Often these contexts were not suggested by our task instructions nor by our description of the previous studies that served as sources of the criterion terms. On the other hand, most Test 2 respondents performed the sorting task with relative ease and generally appeared to have more confidence in their decisions. Interestingly, several Test 1 respondents seemed to be bothered by what they perceived as highly subjective criteria such as he/she is prominent and I like it. One stated, "The criteria make a lot of assumptions and value judgments of an author's work." Another said, "It validates my viewpoint--I don't like this because it is possible my viewpoint isn't right." A few dismissed these more affective criteria, saying they would not use them. One said,"My favorite doesn't have a place in my search" and "I like it makes no difference [to me]." Test 2 respondents, however, applied such concepts frequently and often ranked them as important in selecting their information source.

A less conspicuous observation concerned the importance of user knowledge and experience. This was most evident in Test 2 evaluations based on the criteria. Unlike many respondents in the original two studies conducted in academic libraries, respondents in this study were not expert scholars or researchers. Thus their recognition of terms for research approaches apparently varied, from those who grouped interpretive and descriptive as synonyms to those who selected either theoretical or applied.

Finally, with respect to methodology, although the first two tests suffered from ambiguities in the criterion term set and awkwardness in the task instructions, they did serve their exploratory purpose in allowing us to identify and begin to address specific problems in developing the user criterion scale. Subsequent tests will further refine the criterion term set and explore various techniques for collecting evaluation data based on the criteria. As of this writing, Test 3 has been administered and data analysis begun. The stimulus set consists of 55 criterion terms. Respondents were 46 graduate students in information/library science who were asked, after completing their final class papers, to choose the best information source for their papers and apply the criteria in evaluating it. Preliminary results show that each criterion concept was selected at least once and several more were suggested. In Test 4, soon to be administered to students in a field outside information/library science, the stimulus set will consist of 40 criterion terms, evaluations will involve magnitude estimation, and the results will be statistically analyzed. Further tests will be administered to users in various non-academic settings, including everyday information problem situations.

Conclusion

Information scientists have always acknowledged that human relevance evaluation behavior is difficult to observe, understand, and measure. It is hoped that the outcome of this project, a user criterion measurement scale, will be useful to researchers who seek to fill the many gaps that remain in the field's understanding of relevance. The first two validation tests have addressed classic conceptual and methodological challenges in such a way as to provide guidance for further development of the scale. The results serve to demonstrate the ability of a user criterion scale to yield findings that both confirm and help explain findings of studies using relevance judgments alone. Subsequent tests will continue to refine the methodology, edit the set of criterion terms, and validate the criterion concepts with users in a variety of information environments.

References

Barry, C. (1994). User-Defined Relevance Criteria: An Exploratory Study. Journal of the American Society for Information Science, 45 (3) , 149-159.

Barry, C. & Schamber, L. (1995). User-defined relevance criteria: A comparison of two studies. (1995). Proceedings of the 58th Annual Meeting of the American Society for Information Science, 32, 103-111.

Eisenberg, M. (1988). Measuring Relevance Judgments. Information Processing & Management, 24 (4) , 373-389.

Janes, J. W. (1991). Relevance Judgments and the Incremental Presentation of Document Representations. Information Processing & Management. 27 (6) , 629-646.

Rorvig, M. E. (1988). Psychometric Measurement and Information Retrieval. In: Williams, Martha E., ed. Annual Review of Information Science and Technology, Vol. 23. Amsterdam, The Netherlands: Elsevier Science Publishers for the American Society for Information Science, 157-189.

Schamber, L. (1991). Users' Criteria for Evaluation in a Multimedia Environment. In: Griffiths, Jos‚-Marie, ed. ASIS '91: Proceedings of the American Society for Information Science (ASIS) 54th Annual Meeting, 28, 126-133.

Schamber, L. (1994). Relevance and information behavior. In Martha E. Williams (Ed.), Annual Review of Information Science and Technology, 29, 3-48.

Su, L. T. (1993). Is Relevance an Adequate Criterion for Retrieval System Evaluation: An Empirical Inquiry into the User's Evaluation. ASIS '93: Proceedings of the American Society for Information Science (ASIS) 56th Annual Meeting, 30, 93-103.

Wang, P. (1994). A Cognitive Model of Document Selection of Real Users of IR Systems. Unpublished doctoral dissertation, University of Maryland, 1994.

Table 1: User Test 1 Criterion Terms

About my topic He/she has style [1] Portable [1]
Accessible I hate it [1] Precise
Accurate I have heard of it [1] Prestigious
Ambiguous I can edit it [1] Prominent
Applied I have immediate access [1] Proven
Appropriate I have control [1] Quality work [1]
Available I have a background in it Readable
Big/small picture [1] I know it firsthand [1] Recent
Boring [1] I am aware of this [1] Relevant
Broad I have input [1] Reputable
Clear I agree with it Saved time
Cluttered I already have it Saved effort
Color [1] I have power [1] Specific
Complete I can zoom in/out [1] Speculative [1]
Comprehensive I know the author Summary
Concise I know the publication Theoretical
Confusing I know the source Thorough
Consistent I can track it [1] Too long
Controversial I like it Trivial
Convenient I thought of it [1] Trustworthy [1]
Credible I can request it [1] Understandable
Current Important Unfamiliar [2]
Cursory In-depth Unique
Descriptive Incorrect [1] Unreliable [2]
Detailed Inexact [1] Usable
Difficult to get [1] Interactive User-friendly
Easy to get Interesting Vague
Enjoyable Interpretive Validates my viewpoint
Expensive Introductory Well-known
Explanation [1] It has methodology
First-rate [1] It is my favorite [1]
Focused Lightweight [1]
Free Local [1]
Geographic area [1] Misleading
Has procedures [1] Narrow
Has examples New to me
Has illustrations Novel [1]
Has techniques Only source
Has bibliography Original
Has time periods [1] Outdated [1]
Has tables Overview/closeup
Has graphics [1] Overview
Has references Pertinent
He/she is expert Poorly written [1]
He/she has personality [1] Popular
Total terms=119

1. Deleted for User Test 2
2. Changed to positive terms (familiar, reliable) for User Test 2

Table 2: User Test 2 Criterion Frequencies

Term Frequency Term Frequency
About my topic 28 General or specific 5
Appropriate 24 I already have this 5
Current 24 Introductory 5
Relevant 24 Precise 5
Pertinent 21 Theoretical 5
Recent 21 Accurate 4
Usable 21 Complete 4
Descriptive 20 Prominent 4
Available 16 Thorough 4
Readable 16 Well-known 4
Up-to-date 16 Consistent 3
Interesting 14 Controversial 3
Understandable 13 Enjoyable 3
Accessible 12 Familiar 3
Credible 12 Free 3
Easy to get 11 Has tables 3
Focused 11 I don't have a background in this 3
Overview 11 Saved effort 3
Applied 10 Summary 3
Comprehensive 10 Too long or short ( [1]) 3
Detailed 10 Confusing ( [1]) 2
I like it 10 I agree with it 2
In-depth 10 Unique 2
Provides proof 10 Validates my viewpoint 2
Important 9 Cursory 1
Provides examples 9 I know the author 1
User-friendly 9 It is the only source 1
Clear 8 Popular 1
Has bibliography 8 Prestigious 1
I know the publication 8 Ambiguous 0
New to me 8 Cluttered 0
Convenient 7 Expensive 0
Describes methodology 7 Has illustrations 0
Expert 7 Interactive 0
Interpretive 7 Misleading 0
Reliable 7 Original 0
Reputable 7 Saved money 0
Saved time 7 Trivial 0
Concise 6 Vague 0
Describes techniques 6
I know the source 6
Provides history 6
Well-written 6
Broad or narrow 5
Total terms=83
n=28 respondents
Data represents number of respondents who selected criterion for making evaluation

1. "Not" added to criterion term by respondent

Table 3: User Test 2 Major Criterion Groups

Group ConceptCriteria in GroupAll Terms in GroupSome Terms in Group
AboutnessAbout by topic320
Appropriate
Pertinent
Relevant
Usable
CurrencyCurrent1618
Recent
Up-to-date
AvailabilityAvailable45
Accessible
Convenient
Easy to get
ClarityClear37
Readable
Understandable
CredibilityCredible25
Expert
I know the publication
I know the source
Prominent
Reliable
Reputable
Well-written

Total terms=83
n=28 respondents
Data represent number of respondents who placed terms in each conceptual group.