Metadata-Based Modeling of Information Resources on the Web
S. Ayse Ozel, I. Sengor Altingovde, Ozgur Ulusoy, Gultekin Ozsoyoglu, and Z. Meral Ozsoyoglu
Published online 29 October 2003

Ozel et al. suggest that small subsets of the Web may be accessed by the creation of so-called "expert advice repositories," which appear to be XML indexes created by experts in the field, who are creating metadata records in an area where topics and meta-link types (roles) have been predetermined by someone. They envision automatic extraction of the concepts and matching of different expert terminology through existing Web accessible ontology. User profiles are also maintained that include repositories favored, detail levels desired, and other information presumably collected from monitoring of system use. They have established a limited implementation on the 200,000 documents in the DPLB bibliography, which appears to allow searches using a small set of established relations.









Does Citation Reflect Social Structure? Longitudinal Evidence From the "Globenet" Interdisciplinary Research Group
Howard D. White, Barry Wellman, and Nancy Nazer
Published online 13 November 2003

White, Wellman, and Nazer investigate the inter-citation patterns of the 16 international interdisciplinary members of a research group established in 1993 to study human development with the hope of determining whether citation is based on whom those who cite know, or upon what they know, i.e., whether the patterns are social or intellectual in structure. The members of the group are acquainted and the study of the 240 possible pairs indicates that half collaborate and read each other's work, and 74% consider themselves friends or colleagues. Inter-citation patterns were studied prior to 1989, from 1989 to 1992, 1993 to 1996, and 1997 to 2000. Co-citation is shown to predict inter- citation; one cites those with whom one is co-cited. As members became better acquainted, citation of one another increased. Inter-citation was not randomly distributed with a core group of 12 pairs predominating. Friends cited friends more than acquaintances, and inter-citers communicated more than non-inter- citers. However, intellectual affinity, as shown by co-citation, rather than social ties, leads to inter-citation.











The Real Stakes of Virtual Publishing The Transformation of E-Biomed Into PubMed Central
Rob Kling, Lisa B. Spector, and Joanna Fortuna
Published online 6 November 2003

Kling, Spector, and Fortuna review the process by which the National Institutes of Health's (NIH) proposed central Web-based electronic archive for all biomedical research papers, supported by scientists by a two-to-one margin, became PubMed, a facility without a preprint server and one in which content is determined by commercial and scientific society publishers. Their method is essentially historical; that is to say, based upon analysis of the documents produced by the process including the proposals themselves, stories concerning the proposals in the scientific press, and postings to two electronic forms, 269 items on NIH's "archive of comments on E-biomed," and 492 on the forum hosted by Sigma Xi, publisher of _American Scientist _. The NIH archive was subjected to quantitative analysis. Official statements of professional societies had greater impact than individual comments. Scientific society officers and publication committee members did not express the opinions of readers and authors as observed. The society's economic interests seem to override the author's wishes for rapid dissemination and their communication channels with NIH are not the public forums and do not reflect public forum opinion although such public comment was clearly a very limited response. The Internet was not here a powerful transforming force, but rather its use was shaped by various groups to support their interests.









Do the Web Sites of Higher Rated Scholars Have Significantly More Online Impact?
Mike Thelwall and Gareth Harries
Published online 28 October 2003

Thelwall and Harries measure Web site impact in terms of the number of external links pointing to a site, in order to determine if higher-rated scholars produce higher-rated Web sites. They cluster together all pages with the same domain name and count only unique links to each domain found by a crawler applied to Web sites of universities in the United Kingdom. High- rated scholars are determined by their institution's place in the United Kingdom's 2001 Research Assessment Exercise, a peer review based study, normalized for staff size. There were 7,493 domains identified with 82,672 links. Spearman's rho was .082, significant at the .1% level, between RAE rankings and inlink counts. Total domains for each university correlate with research productivity at .762, significant at .1%, indicating high productivity universities produce more domains. When normalized by staff size, this falls to .509, still significant at .1%, suggesting higher quality means more domains per staff member. It appears that higher-rated scholars produce more Web content but of only slightly higher quality (impact) and thus online impact is suspect as a quality assessment measure.









Visible, Less Visible, and Invisible Work Patterns of Collaboration in 20th Century Chemistry
Blaise Cronin, Debora Shaw, and Kathryn La Barre
Published online 3 November 2003

Cronin, Shaw, and LaBarre continue their investigation of the place of acknowledgment and co-authorship in learned communication with a study of the literature of chemistry as found in the _Journal of the American Chemical Society _, from which a 2.6% random sample of research papers was drawn from volumes 22 to 121. Extracted acknowledgments were classified as conceptual 18%, editorial 1%, financial 46%, instrumental 34%, moral 0%, and unknown 1%, with 90% inter-coder reliability. Three quarters of the 2,866 papers contained an acknowledgment of some kind, 29% from 1930 to 1939, and 96% from 1990 to 1999. Co- authored papers constituted 88% of the sample, rising from 44% in the first decade to 99% in the last. Fourteen chemists received five or more acknowledgments, six of whom were in the ISI's 10,858 most-cited chemists list. Acknowledgment increases over time, is more intense in chemistry than in psychology or philosophy, and co-authorship is more prevalent. Individual agency appears to be a fading phenomenon in chemistry.











Non-Word Identification or Spell Checking Without a Dictionary
Donald C. Comeau and W. John Wilbur
Published online 28 October 2003

Comeau and Wilbur use a measure of the strength of context of a word, that is, how strongly it associates with other words in a document, to detect misspellings. Misspellings are less frequent and appear to appear randomly, while associated context words appear more frequently with the correct version. The measure and frequency counts are computed for each word and alternative word lists are generated for candidates by choosing those words that differ by an instance of deletion, insertion, substitution, or transposition. A classifier is then trained to use this data to predict misspellings. From MEDINE 40,000 words with low context measures were selected and 2,000 selected randomly were evaluated by judges to determine if they were or were not misspellings. Half were used to train classifiers and half were used as a test set. Data indicate that the more frequently an alternative appears, the more likely the candidate is a misspelling, and the log of the number of times alternatives appear is the most important feature. The number of alternatives was the second most important feature. The log of the frequency of the candidate itself has little impact. The log of the probability of a word appearing in MEDLINE verses in _The Wall Street Journal_ had some effect indicating a misspelling. The use of trigrams was not useful alone, but was helpful in combination with frequency of alternatives. The more common trigrams in a word, the more likely it is misspelled. The context measure of alternative words is not useful. Of the four categorization methods utilized, the CMLS wide margin classifier out-performed the Mahalanobis distance method, a log linear model, and linear boosting with an eleven point average precision of .881.


