Bert R. Boyce




Rating News Documents for Similarity
Carolyn Watters and Hong Wang

We begin with a look at automatic ordering of news stories for users. The news presentation problem is seen by Watters and Wang as identifying a small number of items of interest that are similar to that item whose current viewing results in a positive reaction from the viewer. News items are represented by a class of news objects consisting of a header with author, publisher, title, and date; a content section including location, date, name and organization; and a behavior section including input, output update or match. Text in an XML like markup language is processed against a stop list, and using capitalization and punctuation, proper names are extracted and categorized into location, date, person, and organization (all by dictionary lookup). Using the sum over the 4 categories of the similarities, computed as the sum of the common terms in the phrase sets, For locations, persons, dates and organizations, divided by the sum of the number of phrases in the smaller of the phrase sets for each category, a similarity value for two news objects is calculated. Users may select the phrases from an existing paper, or allow the system to use all available. Algorithmic selection of dates and locations in a test file was nearly 100%, full name extraction at 93%, and terms compared to a human sample at 90%. Most errors were due to incorrect punctuation and imbedded capitals. Six users kept precision between 93% and 100%, while recall varied between 45% and 100%.












The ``Conduit Metaphor'' and The Nature and Politics of Information
Ronald E. Day

 By ``conduit metaphor'' Day means the classic Shannon and Weaver model of an information system which he finds to be central in both Weaver's and Wiener's formulations of information theory. The implication is that information is measurable, and that informational language will preform intentional and communicative functions. Day sees this as a totalitarian control over meaning and expression which is based upon the rhetorical device of the ``conduit metaphor,'' and thus is more literary and humanistic than scientific. He believes a more broad paradigm of information science could move the discipline beyond its current boundaries and into a more critical stance toward dominant political and social institutions. .









What Is Wrong with Obsolescence?
 Pedro Alvarez, Isabel Escalona, and Antonio Pulgarin

Alvarez and his associates extract the citations for the years 1985 to 1994 to papers in 45 physics journals published in 1985. Since the declining distributions are not exponential there is no time independent aging factor. A selection of journals by smaller aging factor will not insure that the greater coverage journals are selected. Use of probabilities of citation in the Rasch model for each year, gives a ranking that provides elements of both aging and total contribution and is quite different than an aging factor ranking.







Probability Distributions in Library and Information Science: A Historical and Practitioner Viewpoint
Stephen J. Bensman

Bensman gives us a historical look at the development of statistical methods of prime use in library and in information science research emphasizing that the skewed distributions found here are common in other areas, and have been studied particularly by British statisticians. Specifically the normal distribution is not a common phenomena in social or biological areas. Practically one can generally assume that the negative binomial fits LIS data unless the mean's closeness to the variance indicates Poisson. Transformation to log normal form will allow statistical analysis.








When Information Retrieval Measures Agree About the Relative Quality of Document Ratings
Robert M. Losee

Equivalent measures, for Losee, means two different measures are equal for all instances of ordering of one set. Equivalent ordering means the order relation for the two measures holds in all different orderings of two document sets. Since Dice and Jacquard are monotonic they are order equivalent though not equivalent. Precision measures at two levels of recall are not order equivalent for all orderings, although they will be for some orderings. Simple match and Jacquard are neither equivalent or order equivalent. If it is true that the difference between a measure on one ordering and another is other than negative and the difference between a second measure on the same two orderings is other than negative, the measures are order equivalent. Where this equivalent measure function is
false the region of difference may be determined, permitting the study of conditions where changes in different measures' values will correspond. 










Shifts of Interactive Intentions and Information-Seeking Strategies in Interactive Information Retrieval
Hong (Iris) Xie

Xie uses 40 cases randomly selected from a four class stratification of the 150 library uses available in the 1990 Belkin, Saracevic study. Using the collected open ended interviews, unobtrusive observation logs, and transaction logs, Xie identifies information seeking strategies, interactive intentions (changing sub-goals in a search process), and shifts in these during the search process. Long term goals do not normally change during the search process. Interactive intentions do shift. Planned shifts occur when the previous intention was met without problem and the next planned intention comes into play. They occurred most frequently. Opportunistic shifts occur when serendipity tempts the user to follow a different path usually returning to the original way at a later time. Assisted shifts normally occur when the user needs to learn how to better use the systems and so shifts from, perhaps, searching to learning. Alternative shifts occur when previous attempts fail. Information seeking strategies are modified only if shifts of interactive intentions fail.










The Knowledge-Behavior Gap in Use of Health Information
F.X. Sligo and Anna M. Jameson

Silgo and Jameson study the preferences of Pacific Island immigrants to New Zealand for sources of information on cervical cancer screening in a community of 70,000 people in order to determine how the Pacific women's communication network operates in dealing with sensitive health information. One on one semi-structured interviews were recorded to gather information beginning with participants known to the observer, and moving on to those the initial participants suggested until a sample of 20 was completed. Respondents knew it was desirable to have a smear test but were culturally unable to freely act upon or discuss the matter. Education level did not effect responses. Participants felt information directly from their peers was most effective but would not want someone in the peer group to administer the test. The group had a high degree of community connectedness.









Discovering Knowledge from Noisy Databases Using Genetic Programming
Man Leung Wong, Kwong Sak Leung, and Jack C.Y. Cheng

LOGENPRO, a grammar driven data mining program combining an inductive logic approach with genetic programming outperforms other inductive logic programming systems when tested against the chess end game problem, despite the fact that it uses the same noise handling mechanisms to overcome imperfect training sets. This suggests to Wong et alia, that the Darwinian principle is a plausible noise handling mechanism. When used with a database of limb fractures containing age, sex, admission date, length of stay, diagnosis, operation, and surgeon, it is found that between ages 2 and 5 the break is likely to be in the humerus, but in boys between 11 and 13 the radius break is most common. Rules concerning length of stay and type of operation are also discovered.









