In This Issue
Bert R. Boyce





A New Method for Analyzing Scientific Productivity
    John C. Huber
    Published online 14 September 2001

In this issue's examination of author productivity Huber treats publication date as a continuous variable and estimates position by using a paper's page number relative to the maximum page number for the journal publication year from a nine year sample of the Journal of Applied Physics, and The Journal of Experimental Biology. These are supplemented by authors names beginning with ``Ba'' from ten years of the PsycInfo database (where a random number is used to supplement the publication year), a combined set of samples of earlier collected 19th century Physics papers, eight thousand six hundred and forty one patents issued to New York state General Electric employed inventors, composers with new works preformed by professional orchestras for a forty four year period, and 16 years of papers on mathematical logic. The combined distribution of author productivity fits the exponential distribution, and the samples each have a mean productivity of about .05 papers per year although that of inventors is nearly twice that, and composers about half. Author career length is also exponential with most authors having short publishing careers, and very few with long careers. Empirically the samples conform to random production, Poisson distributions over time, and exponential distributions for productivity and longevity. They do not demonstrate cumulative advantage and a very large number of authors produce at a constant rate.













The Non-Gaussian Nature of Bibliometric and Scientometric Distributions: A New Approach to Interpretation
    Ludmila E. Ivancheva
    Published online 14 September 2001

In a second bibliometric paper Ivencheva utilizes the work of Stankov, who claims to have discovered that all natural phenomena follow one general regularity, the Universal Law, which declares energy exchange as directly proportional to absolute time and inversely proportional to space, in order to explain the skewed nature of bibliometric distributions. We thus would see the hyperbolic distribution as a wave process of energy information exchange. For example, author productivity would be interpreted as the energy of a spherical wave whose amplitude corresponds to the number of papers produced by an author in a year. The ``nucleus'' is small because it emits significant energy; while the low productivity space is large because the energy output is low.










Ask-an-Expert Services Analysis
    Joseph Janes, Chrystie Hill, and Alex Rolfe
    Published online 10 September 2001

Janes, Hill, and Rolfe develop and execute a methodology for the evaluation of Web-based services which permit users to question experts for needed information. Ten commercial and ten non-commercial sites were asked 240 questions in ten subject areas believed to be typical of such inquiries. The commercial sample excluded sites that would not accept questions without a charge, chat based services and those with limited subject coverage. They were each asked ten ``fact,'' ten ``source'' and one ``out of scope'' question. The non-commercial sample sites were subject specific, smaller, and were asked one question of each type. Sites were characterized, questions submitted by researchers using pseudonymous identities, and response times and requests for clarification recorded.

Average time to present a question was 4.75 minutes over three entry methods: web form, e-mail, and bulletin board. The overall response rate was 70%, and commercial sites were significantly more responsive. Fact questions had a significantly shorter response time. Average response time was two days, seven hours and 45 minutes. Three sites answered the question asked 90% of the time, two around 70% and the rest between 40% and 60%.












Information Technology and Interests in Scholarly Communication:
A Discourse Analysis
    Neil Jacobs
    Published online 14 September 2001

Jacobs views both information technology and scholarly communication from the viewpoint of Social Construction Of Technology, which stresses the instability and constructed nature of both, and thus he questions both technological determinism in scholarly communication and the study of such communication in terms of its artifacts. Instead the proper focus of research is seen as the interests of the social groups involved. Scholarly communication includes informal networks as well as journals and citations, but also meta-communication that takes place as research on the topic. This can be addressed by way of discourse analysis.

Three analyses are presented from a series of semi-structured interviews held with academic researchers, librarians, and document suppliers as part of the FIDDO project's investigation of UK document delivery options. Category membership was likely to be relevant to responses to questions on technology and scholarly communication.. The category ``researcher'' was used by the librarian and the document provider as an explanatory resource in that the supporting of the researchers interests was a constituent of their own categories. All participants claimed membership in a category, spoke so as to maintain the integrity of that category, and offered accounts that would be accepted as answers.














MetaSpider: Meta-Searching and Categorization on the Web
    Hsinchun Chen, Haiyan Fan, Michael Chau, and Daniel Zeng
    Published online 19 September 2001

MetaSpider preforms post retrieval document clustering and display after preforming the traditional meta-search functions of collating the high rankings subsets of multiple other search engines while eliminating duplicate and non-functional pages. Chen, Fan, Chau, and Zeng report MetaSpider returns are displayed in merged engine ranks, with only those containing the exact phrases actually retrieved for processing. Noun phrases are extracted and displayed with their frequency of occurrence. Documents associated with these phrases may be selected, and if a phrase is not deselected it is used to form a self organizing map of clustered pages where block size indicates term depth and proximity concept relatedness.

An evaluation used six topics from TREC-6 in a comparison with MetaCrawler and Northern Light. Thirty students each did three searches, one on each system. Searchers described pages returned by composing themes, short phrases describing the topics of the pages. These themes were then compared with those earlier produced by judges to create recall and precision measures. Session time, number of documents browsed, and number of switches between lists and documents were also recorded. There was no significant difference in switching, documents browsed, session time, or recall. MetaSpider preformed significantly better than Northern Light in the precision measure.













Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling
    Ronald N. Kostoff, J. Antonio del Rio, James A. Humenik, Esther Ofilia Garcia, and
    Ana Maria Ramirez
    Published online 19 September 2001

In order to determine the impact of specific published research on varied disciplines and to determine research user characteristics, Kostoff, et alia, test the viability of analyzing the free text fields of about 300 Science Citation Index records of papers that cited a fundamental paper on sand pile vibration. Abstracts were collected and a taxonomy of phrases and terms was created by manually analyzing the single words, word pairs and word triples extracted from the records. The same phrases were automatically clustered by the Mutual Information Index (co-occurrence frequency over the squared product of frequencies) for the high frequency phrases; and for the low frequency phrases those phrases associated with each high frequency phrase whose co-occurrence divided by its total occurrence exceeded .5. Extra-discipline basic research papers range from 15% to 25% of total citing papers each year with no evident latency period. There is a four year latency period for applications papers.












Extracting Macroscopic Information from Web Links
    Mike Thelwall
    Published online 19 September 2001

Thelwall investigates whether any of four web link calculations can be shown to correlate with university research productivity as shown by a government research assessment exercise. U.K. university web sites were indexed by a crawler designed for comprehensive coverage of their pages and sub domains. From the results of the crawl lists of all pages linked to at least one page from another U.K. university were extracted with counts of the pages linking to them. For a sample of 25, target pages were classified by information type, and for each university a summary was created of the number of external links from other U.K. universities, to each of the classed types. The results were used to calculate web impact factors, first all back links normalized by full time faculty FTE, then only those back links classified as research related with the same denominator. These were compared with the 1996 official rating exercise with the all link measure attaining a significant .8 Pearson correlation coefficient, and the research only numerator yielding a significant.9. A search was also made using Alta Vista's advanced query syntax to acquire the number of pages to which back links exist, and these used to create web impact factors for the sample universities. The correlation with the external ranking was a significant .78. Using AltaVista page counts for denominators, web impact factors still significantly correlated with the external rankings although with a lower coefficient.













Seeking Explanation in Theory: Reflections on the Social Practices of Organizations that Distribute Public Use Microdata Files for Research Purposes
    Alice Robbin and Heather Koball
    Published online 19 September 2001

Finally a survey, website analysis, and follow up by Robin and Koball of 20 survey research organization's methods in use to limit disclosure of confidential material indicates that despite the availability of extensive research on statistical disclosure limitation methods to minimize such risks, few such precautions are taken. Risk is reduced by data conditioning methods which restrict data by eliminating sensitive variables and using grouping and interval techniques. It may also be reduced by restricting access. Only one of the 20 organizations had instituted any form of restricted access to longitudinal data. Linked administrative data was sometimes suppressed, summarized or injected with error. The work culture of the various organizations and their changing staff meant that strict rules were not normally applied to restrictions on longitudinal data. The language and practice of Statistical Disclosure Limitation is not universally known in survey organization staffs.












Knowledge Management: Classic and Contemporary Works, edited by Daryl Morey, Mark Maybury, and Bhavani Thuraisingham
    John Cullen
    Published online 11 September 2001





Peer-to-Peer: Harnessing the Benefits of a Disruptive Technology, edited by Andy Oram
    Lisa A. Ennis
    Published online 19 September 2001






