Journal of the Association for Information Science and technology


 Bert R. Boyce




 Authors as Citers over Time
Howard D. White
 Published online 30 November 2000
We begin with White's look at recitations (citations to a person more than once in an author's career) in order to gather a profile of the author's interests, a citation identity in the form of the set of authors cited. This is in contrast to the set of all authors with whom a given author has been co-cited, the author's citation image. The image is determined by others citing habits; the identity by one's own. By forming an author's oeuvre on DIALOG and then ranking her cited authors continuously, one forms the author's identity. By selecting all papers that contain the author as a cited author and then ranking the cited authors in this set, one produces the author's image.
   Citation identities were created for eight information scientists using only first or sole author papers. Documents with no references were eliminated from consideration. The limited set of ISI journals and their time of coverage limits accuracy as does the lack of credit for other than first author pieces. Homonymic names and multiple names for the same person are common problems. The eight author sets are Bradfordian and individualized with 3% to 8% self citation. Three citing styles are apparent: scientific, with heavy recitation; bibliographic essay, with little recitation; and literature review, with many authors and much recitation.













 The Moral Rights of Authors in the Age of Digital Information
J. Carlos Fernandez-Molina and Eduardo Peis
 Published online 29 November 2000





Children's Use of the Yahooligans! Web Search Engine:   II. Cognitive and Physical Behaviors on Research Tasks
Dania Bilal
 Published online 6 October 2000
   Bilal studies the work of seventeen middle school students who searched an assigned non-factual question requiring discussion in Yahooligans! as part of their Science class. The teacher provided ratings of the children's topic knowledge, general science knowledge, and reading ability. A questionnaire administered to the students indicated knowledge of the Internet, and a quiz prior knowledge of Yahooligans! in particular. An exit interview collected data on the experience, and Lotus ScreenCam was used to record the student system interactions. Thirteen of the student's transcribed moves were collected, analyzed and coded. The proper page was found by 69% but most went no further. Of the 69% who initially used keyword search, only one child used natural language phrasing, the rest using single terms or multiple word phrases. Of the 31% who initially browsed subject categories displayed, 23% activated the right category, 8% choosing an inappropriate one. In follow up search browsing exceeded keyword search as the method of choice though all used both. Use of the back command was much lees frequent than on a previous fact search study and the average number of web moves was less frequent. Online help was not
used and 69% exhibited a style of shifting back and forth among links before deciding on what was a relevant page. Prior experience, domain knowledge, topic knowledge and reading ability did not influence success. 













 Usage Patterns of a Web-Based Library Catalog
Michael D. Cooper
 Published online 5 December 2000
Cooper reports the descriptive statistics gathered on the patterns of use of a web-based version of the University of California's system wide library catalog which was monitored and recorded for 479 days. A transaction log of time stamped records of each user interaction was maintained. In real sessions searches are conducted, in tourist sessions a connection is maintained for over a ten second period and no searches are conducted, and in spider sessions an under ten second connection is maintained. True search sessions include pre-search actions, searches, displays, help, and error actions, and all remaining actions classed as other.
   The log kept by the Melvyl http daemon were edited, and tabulated using SAS. In the 2.5 million sessions, there were 3.6 million pre-search activities, 7.4 million searches, 13 million displays, 11 million other activities and 60,000 help requests. Spiders accounted for 27% of the sessions, tourists 11%, and the remaining 62% were real sessions. While tourist and spider sessions are distributed relatively uniformly, real sessions display date and time sensitivity, peaking on Tuesday between two and three PM, and bottoming on Saturday. The average session length is 10.3 minutes, with the length of real sessions gradually increasing over time, and a standard deviation of nearly 18 minutes. Users spend about 25 seconds on pre-search, and 36 seconds on each of the other classed activities,
although each of these is preformed with different frequencies within a session. The catalog and Medlars databases together account for 54% of use, the Magazine database for another 10%: others all less than 10% each. The time spent viewing the results is relatively constant over databases. Help actions are evenly distributed as to time and when normalized by number of uses as to database. .
















The Role of User Profiles for News Filtering
Michael Shepherd, John F. (Jack) Duffy, Carolyn Watters, and Nitin Gugle
 Published online 29 November 2000
   Shepherd et al., after reviewing the research in the area of personalized electronic news filters, attempt to determine if readers will prefer a blend of personal and community filtering, and whether such filtering will exclude articles of interest. Sixty nine subjects were asked to read a personalized edition of a local paper based upon a personal interest survey, and then a normal community edition. Items were selected if their cosine similarity with a community profile plus their cosine similarity with a user profile exceeded a threshold. Weights were assigned to each factor so that treatments of 100% user, 50/50, and 75/25 user and community in each direction, were tested.
   Words were extracted from 219 news items, and after the application of a stop list and stemming algorithm, 8508 stems with inverse document term frequency weights were produced. The centroid of the regular edition was the community profile. Each user marked a printed classification of terms in a tri-part scale of 2 for interest, 1 some interest, and 0 no interest, and added five keywords for terms of some or more interest. The stems of these terms with weights of 1 or 2 were then processed for user profiles averaging 181 words in length. The threshold was set at .33 by trial. Participants were distributed randomly over the four treatments and completed a Likert-type questionnaire on their reading experience. Then the whole community edition was read and a new questionnaire applied. Seventy eight percent of subjects prefer community only filtering. Comments indicate that personal filtering leaves out articles users would like to see. An analysis of variance shows no difference in user preference among the four treatments so that level of blend does not change the overall preference result for community filtering.















Regions and Levels: Measuring and Mapping Users' Relevance Judgments
 Amanda Spink and Howard Greisdorf
 ublished online 1 December 2000
   Looking at distributions of documents judged for relevance Spink and Greisdorf examine the areas between clearly relevant and clearly non-relevant documents. Twenty one users conducted 43 searches and made judgements on 1059 retrieved items. A point on a 77 mm line indicating a range from low to high relevance was marked for each item for an interval measure. Boxes for relevant partially relevant, partially not relevant and not relevant were provided for a categorical measure. Judgements were also characterized in a binary fashion on systematic, topical, pertinence, utility and motivational levels, and additionally users provided a brief written description of why they made the judgements they did. The
previously apparent bi-modal distribution of relevance judgements is confirmed. There is some evidence that topicality is more useful for de-selecting than selecting items, and it appears that including partially relevant and retrieved items with retrieved relevant items can skew precision measures in a positive direction.
   The median of the bimodal distribution of judgements is inversely correlated with the number of items judged since the larger group of non-relevant will pull it down. Since the median correlates with the distribution percentages of relevant items, if normalized by the number of points in the interval scale, the median becomes a possible measure of precision.












 Multidimensional Scaling of Video Surrogates
Abby A. Goodrum
 Published online 6 December 2000
   To study the representativeness of both image based and text based surrogates of video works Goodrum collected 12 ten second unedited clips containing images of water and spliced them randomly into a single tape with five seconds of blue screen between each pair. Five key frames per clip were selected as surrogates and their images analyzed for grey scale properties, color, line length, edge intensities, and angle declinations. Then each of the 78 possible unique pairs of clips were added in random order. Judges picked from the clips and the videos and were asked choose the frame with the highest agreement. Text description came from catalog records acquired along with the clips. Participants numbering 150 were then asked to provide similarity judgements for all pairs of surrogates and the similarity matrix used for multidimensional scaling creating maps for the videos and various forms of surrogates to allow comparisons of similarity to be made.
   The largest number of congruent points occurs with keyframe choices, followed by salient stills and keywords. Image based surrogates are closer to videos than text based surrogates over all, but the value of text increases with specific task constraints.













Digital Libraries, edited by William Y. Arms
Birger Hj\orland








2000 , Association for Information Science