
EDITORIAL 

In This Issue Bert R. Boyce 
RESEARCH 

Assessment of the Effects of User Characteristics on Mental Models of Information Retrieval Systems Xiangmin Zhang and Mark Chignell
Published online 15 February 2001 In this issue we begin with Zhang and Chignell who use the Repertory Grid Technique (RGT) to extract user's mental models of information retrieval systems in order
to study the effects on these models of four characteristics: educational and professional status, first language, academic discipline, and computer experience. Each of 64 subjects rated nine retrieval system concepts
as to three attributes (form/process, targeted/not targeted, and specific to IR system/applicable to all information systems) yielding 27 variables for analysis. A factor analysis yielded nine factors with an eigenvalue
greater than one, which accounted for 68% of the variation from the original ratings. The first factor appeared to be concerned with the purposefulness of querying; the second, applicability of data organization; the
third, the function of querying; the forth, applicability of querying; the fifth, applicability of browsing; the sixth, function of data structure; the seventh, purposefulness of browsing; the eighth, function of the
document; and the ninth factor, the purposefulness of data structure. Analysis of variance and Tukey tests were applied to the subjects factor scores. Educational and professional background, discipline, and computer
experience all had significant effects on the factor scores representing the mental models, language did not. Student an information professional scores differed widely on factors 1 and 3. Graduates differ from other
students on factors 2 and 6. The user's discipline shows significant differences on factors 1, 2, 3, and 7, and computer experience has differences on 1, 2, and 7. Overall information professionals and students have
strikingly different models. Science students see browsing as a targeted activity but humanities students do not. Language does not seem to affect mental models of information retrieval systems. 
Modeling the Retrieval Process for an Information Retrieval System Using an Ordinal Fuzzy Linguistic Approach E. HerreraViedma
Published online 15 February 2001
HerreraViedma, believes that quantitative weights computed from term occurrence are appropriate for the characterization of documents, but not for queries or the
estimated relevance levels for ranking of retrieved documents, where human understanding argues for qualitative expression. Terms for queries are ranked in seven symmetric ordinal classes by searchers, or by an
importance weight or by a weight indicating how many documents should be returned for that term. An RSV is computed for each document for each ordered representation of the query. These are then aggregated by the search
system for final evaluation of documents. The aggregation is carried out by linguistic implication functions which provide varied definitions of disjunction and conjunction depending upon the relative importance of the
logical subexpressions of the query. Users will need to determine which, or how many of the ordering schemes to use. 
Discovering Term Occurrence Structure in Text Abraham Bookstein and T. Raita Published online 15 February 2001
Bookstein and Raita observe that
term occurrences tend to clump in texts. That is to say, if a term's occurrence is observed in adjacent text segments, the expected number of random clumps will be exceeded. Strongly clumped terms have retrieval
value, and if text is partitioned to minimize clumping strength such stretches of text are likely to be content homogeneous. Linear clumping strength is measured by the ratio of the expected value of clumps formed to
the observed value. The standard deviation will express the degree of nonrandomness or clumping. Condensation clumping views the problem as a distribution of terms (balls) into text segments (urns) and the ratio of the
expected number of segments containing the term to the observed number as the clumping measure. The common retrieval measure, inverse document frequency, can be rewritten in these terms with little difference
between the two when the probability the segment contains the term is small. The standard deviation of the condensation clumping measure will allow an expression of the degree of nonrandomness, but is complex to
compute. The use of an approximate value at least as large as the standard deviation simplifies the process. The two measures diverge as segments are merged together with linear clumping decreasing and condensation
clumping increasing. Using the same general model a measure is constructed using the gaps between segments with term occurrence, where the text is considered to be wrapped in a circular fashion. More
generality is achieved, but it appears that performance is very similar to the previous measures. 
Optimal Query Expansion (QE) Processing Methods with Semantically Encoded Structured Thesauri Terminology Jane Greenberg
Published online 22 February 2001
Greenberg looks at the automatic expansion of queries using thesaurus terms in varying relationships with entry terms, based on a binary relevance evaluation of
initial return by end users, as opposed to interactive expansion where the system provides a list of possibilities based on the initial return and the user chooses expansion terms. Using ten queries collected from MBA
students, the ProQuest Controlled Vocabulary, and the ABI/Inform database on DIALOG, she mapped each query to the thesaurus terms as a base, and created four expansions: synonyms, narrower terms, related terms, and
broader terms. Relevance judgements were made on the basis of topical matching (aboutness) by the contributors of the queries reviewing the Union set of the responses to the query forms where each retrieved list was
limited to a length 15 or less citations. The automatic expansions separately took all synonyms, all narrower terms, all broader terms, and all related terms. For interactive expansion users chose from a alphabetized
union list of the terms in thesaurus records for query terms. These selections were then incorporated in the query expansion by the searcher. Users chose from all groups but took over half of the suggested synonyms and
broader terms, and over a quarter of the narrower and related terms. Synonyms and narrower terms augmented recall without a significant loss in precision in both automated and interactive searching, which argues for
their use in automated expansion since less effort is required. Broader and related terms improved recall the most but would not be useful in automatic expansion if high precision is a goal. However, they, and
particularly related terms, are seen as excellent candidates for use in interactive expansion. 
Evaluating Internet Resources: Identity, Affiliation, and Cognitive Authority in a Networked World John W. Fritch and Robert L. Cromwell
Published online 8 March 2001
The filters in print media that provide authority are not available on the Internet so that authorship and thus accountability are uncertain. Determining true
authorship and affiliation are likely to be the most significant need in establishing cognitive authority of a site. Fritch and Cromwell suggest the assessment of documents, authors, institutions and affiliations
separately followed by integration of the results while indicating confidence in decisions on a separate scale. In their example, confirming the connection of the domain name to the assumed sponsor via the Whois search
is a first step. Looking for author statements and affiliations to other sites is the second. The identification of overt and covert links may disclose bias. 
