Automating Survey Coding by Multiclass Text Categorization Techniques
Daniela Giorgetti and Fabrizio Sebastiani
Published online 21 July 2003

In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.









Network Influences on Scholarly Communication in Developmental Dyslexia A Longitudinal Follow-up
Claudia A. Perry
Published online 25 July 2003

Perry collects co-citation data for the years 1994 to 1998 on 74 Developmental Dyslexia researchers whose co-citation patterns and personally reported interactions she originally studied form 1976 to 1993. The original study indicated discrepancies between sociometric and bibliometric networks of interaction, delays in the emergence of new perspectives and the possibility of the convergence of perspectives facilitated by central researchers. Mapping for the present study was done by multi-dimensional scaling rather than the principle components factor analysis in the earlier study, but both clustering techniques and factor analysis were applied to the new data. Researchers with phonological and with neuroscience perspectives area associated with different co-citation patterns. Research groups grow more distinct over time with the neuroscience-vision subgroup increasing in density, but other sub-groups showing some tendency toward integration. The personal networks differences with the co-citation network persist and the assumption that one reflects the other is not supported.










Nodes of Topicality Modeling User Notions of On Topic Documents
Howard Greisdorf and Brian O'Connor
Published online 25 July 2003

Griesdorf and O'Connor attempt to determine the aspects of a retrieved item that provide a questioner with evidence that the item is in fact on the topic searched independent of its relevance. To this end they collect data from 32 participants, 11 from the business community as well as 21 doctoral students at the University of North Texas each of whom were asked to state if they considered material that approaches a topic in each of 14 specific manners as " on topic" or "off topic." Chi-square indicates that the observed values are significantly different from expected values and the chi-square residuals for on topic judgements exceed plus or minus two in eight cases and plus two in five cases. The positive values which indicate a percentage of response greater than that from chance suggest that documents considered topical are only related to the problem at hand, contain terms that were in the query, and describe, explain or expand the topic of the query. The chi-square residuals for off topic judgements exceed plus or minus two in ten cases and plus two in four cases. The positive values suggest that documents considered not topical exhibit a contrasting, contrary, or confounding point of view, or merely spark curiosity. Such material might well be relevant, but is not judged topical. This suggests that topical appropriateness may best be achieved using the Bruza, et alia, left compositional monotonicity approach.










Bibliographic Index Coverage of a Multidisciplinary Field
William H. Walters and Esther I. Wilder
Published online 28 July 2003

Walters and Wilder describe the literature of later-life migration, a multi-disciplinary topic, and evaluate its bibliographic coverage in seven disciplinary and five multi-disciplinary databases. Multiple database searches and reviews of the references of found items discovered over 500 papers published between January 1990 and December 2000. These were then read to determine if late-life migration was their central focus, and to select those which presented noteworthy findings, innovative approaches, or were covering topics unseen elsewhere, and also were understandable to a broad readership, and generally available. One hundred and fifty five journal articles met these criteria and are the focus of the study. The core journals of sociology, economics, and demography are not major contributors, but three gerontology journals are in the top five. The top two journals have broad coverage, but the others tend to concentrate on one of five themes. The top five journals account for 40 % of papers and the top twelve 70%. Of nine papers cited 30 or more times seven appeared in the top 12 contributing journals. The median article in the study was indexed by six of the twelve databases, and 12% were indexed by more than 7 databases. The correlation between citation and number of databases indexing a paper is very low. Social Sciences Citation Index will 73% coverage. Typical overlap in the 12 databases is about 45%.












Bibliographic and Web Citations What Is the Difference?
Liwen Vaughan and Debora Shaw
Published online 21 July 2003

Vaughn, and Shaw look at the relationship between traditional citation and Web citation (not hyperlinks but rather textual mentions of published papers). Using English language research journals in ISI's 2000 Journal Citation ReportÉ  s É Information and Library ScienceÉ Ě category 1209 full length papers published in 1997 in 46 journals were identified. Each was searched in Social Science Citation Index and on the Web using Google phrase search by entering the title in quotation marks, and followed for distinction where necessary with sub-titles, authorÉ s names, and journal title words. After removing obvious false drops, the number of web sites was recorded for comparison with the SSCI counts. A second sample from 1992 was also collected for examination. There were a total of 16,371 web citations to the selected papers. The top and bottom ranked four journals were then examined and every third citation to every third paper was selected and classified as to source type, domain, and country of origin.

Web counts are much higher than ISI citation counts. Of the 46 journals from 1997, 26 demonstrated a significant correlation between Web and traditional citation counts, and 11 of the 15 in the 1992 sample also showed significant correlation. Journal impact factor in 1998 and 1999 correlated significantly with average Web citations per journal in the 1997 data, but at a low level. Thirty percent of web citations come from other papers posted on the web, and 30percent from listings of web based bibliographic services, while twelve percent come from class reading lists. High web citation journals often have web accessible tables of content.


